Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Interest
Topics
Details

Details

Publications

2020

Artifact Detection in Invasive Blood Pressure Data using Forecasting Methods and Machine Learning

Authors
Wu, M; Branco, P; Chen Ke, JX; MacDonald, DB;

Publication
IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020, Virtual Event, South Korea, December 16-19, 2020

Abstract

2019

Pre-processing approaches for imbalanced distributions in regression

Authors
Branco, P; Torgo, L; Ribeiro, RP;

Publication
NEUROCOMPUTING

Abstract
Imbalanced domains are an important problem frequently arising in real world predictive analytics. A significant body of research has addressed imbalanced distributions in classification tasks, where the target variable is nominal. In the context of regression tasks, where the target variable is continuous, imbalanced distributions of the target variable also raise several challenges to learning algorithms. Imbalanced domains are characterized by: (1) a higher relevance being assigned to the performance on a subset of the target variable values; and (2) these most relevant values being underrepresented on the available data set. Recently, some proposals were made to address the problem of imbalanced distributions in regression. Still, this remains a scarcely explored issue with few existing solutions. This paper describes three new approaches for tackling the problem of imbalanced distributions in regression tasks. We propose the adaptation to regression tasks of random over-sampling and introduction of Gaussian Noise, and we present a new method called WEighted Relevance-based Combination Strategy (WERCS). An extensive set of experiments provides empirical evidence of the advantage of using the proposed strategies and, in particular, the WERCS method. We analyze the impact of different data characteristics in the performance of the methods. A data repository with 15 imbalanced regression data sets is also provided to the research community.

2019

The CURE for Class Imbalance

Authors
Bellinger, C; Branco, P; Torgo, L;

Publication
Discovery Science - 22nd International Conference, DS 2019, Split, Croatia, October 28-30, 2019, Proceedings

Abstract

2019

A Study on the Impact of Data Characteristics in Imbalanced Regression Tasks

Authors
Branco, P; Torgo, L;

Publication
2019 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2019, Washington, DC, USA, October 5-8, 2019

Abstract

2019

The CURE for Class Imbalance

Authors
Bellinger, C; Branco, P; Torgo, L;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
Addressing the class imbalance problem is critical for several real world applications. The application of pre-processing methods is a popular way of dealing with this problem. These solutions increase the rare class examples and/or decrease the normal class cases. However, these procedures typically only take into account the characteristics of each individual class. This segmented view of the data can have a negative impact. We propose a new method that uses an integrated view of the data classes to generate new examples and remove cases. ClUstered REsampling (CURE) is a method based on a holistic view of the data that uses hierarchical clustering and a new distance measure to guide the sampling procedure. Clusters generated in this way take into account the structure of the data. This enables CURE to avoid common mistakes made by other resampling methods. In particular, CURE prevents the generation of synthetic examples in dangerous regions and undersamples safe, non-borderline, regions of the majority class. We show the effectiveness of CURE in an extensive set of experiments with benchmark domains. We also show that CURE is a user-friendly method that does not require extensive fine-tuning of hyper-parameters. © Springer Nature Switzerland AG 2019.