Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by Paula Oliveira Branco

2019

A Study on the Impact of Data Characteristics in Imbalanced Regression Tasks

Authors
Branco, P; Torgo, L;

Publication
2019 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2019)

Abstract
The class imbalance problem has been thoroughly studied over the past two decades. More recently, the research community realized that the problem of imbalanced distributions also occurred in other tasks beyond classification. Regression problems are among these newly studied tasks where the problem of imbalanced domains also poses important challenges. Imbalanced regression problems occur in a diversity of real world domains such as meteorological (predicting weather extreme values), financial (extreme stock returns forecasting) or medical (anticipate rare values). In imbalanced regression the end-user preferences are biased towards values of the target variable that are under-represented on the available data. Several pre-processing methods were proposed to address this problem. These methods change the training set to force the learner to focus on the rare cases. However, as far as we know, the relationship between the data intrinsic characteristics and the performance achieved by these methods has not yet been studied for imbalanced regression tasks. In this paper we describe a study of the impact certain data characteristics may have in the results of applying pre-processing methods to imbalanced regression problems. To achieve this goal, we define potentially interesting data characteristics of regression problems. We then conduct our study using a synthetic data repository build for this purpose. We show that all the different characteristics studied have a different behaviour that is related with the level at which the data characteristic is present and the learning algorithm used. The main contributions of our work are: i) to define interesting data characteristics for regression tasks; ii) to create the first repository of imbalanced regression tasks containing 6000 data sets with controlled data characteristics; and iii) to provide insights on the impact of intrinsic data characteristics in the results of pre-processing methods for handling imbalanced regression tasks.

2021

An Analysis of Performance Metrics for Imbalanced Classification

Authors
Gaudreault, JG; Branco, P; Gama, J;

Publication
DISCOVERY SCIENCE (DS 2021)

Abstract
Numerous machine learning applications involve dealing with imbalanced domains, where the learning focus is on the least frequent classes. This imbalance introduces new challenges for both the performance assessment of these models and their predictive modeling. While several performance metrics have been established as baselines in balanced domains, some cannot be applied to the imbalanced case since the use of the majority class in the metric could lead to a misleading evaluation of performance. Other metrics, such as the area under the precision-recall curve, have been demonstrated to be more appropriate for imbalance domains due to their focus on class-specific performance. There are, however, many proposed implementations for this particular metric, which could potentially lead to different conclusions depending on the one used. In this research, we carry out an experimental study to better understand these issues and aim at providing a set of recommendations by studying the impact of using different metrics and different implementations of the same metric under multiple imbalance settings.

2021

Active Learning for Imbalanced Domains: the ALOD and ALOD-RE Algorithms

Authors
Bhattacharjee, M; Kambhampati, HS; Branco, P; Torgo, L;

Publication
8th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2021, Porto, Portugal, October 6-9, 2021

Abstract

2020

Exploring the Impact of Resampling Methods for Malware Detection

Authors
Branco, P;

Publication
IEEE International Conference on Big Data, Big Data 2020, Atlanta, GA, USA, December 10-13, 2020

Abstract

2021

Using CGAN to Deal with Class Imbalance and Small Sample Size in Cybersecurity Problems

Authors
Nazari, E; Branco, P; Jourdan, GV;

Publication
18th International Conference on Privacy, Security and Trust, PST 2021, Auckland, New Zealand, December 13-15, 2021

Abstract

2021

Graph-based Solutions with Residuals for Intrusion Detection: the Modified E-GraphSAGE and E-ResGAT Algorithms

Authors
Chang, L; Branco, P;

Publication
CoRR

Abstract

  • 5
  • 6