Publications

Publications by Paula Oliveira Branco

2017

Learning Through Utility Optimization in Regression Tasks

Authors
Branco, P; Torgo, L; Ribeiro, RP; Frank, E; Pfahringer, B; Rau, MM;

Publication
2017 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA)

Abstract
Accounting for misclassification costs is important in many practical applications of machine learning, and cost sensitive techniques for classification have been studied extensively. Utility-based learning provides a generalization of purely cost-based approaches that considers both costs and benefits, enabling application to domains with complex cost-benefit settings. However, there is little work on utility- or cost-based learning for regression. In this paper, we formally define the problem of utility-based regression and propose a strategy for maximizing the utility of regression models. We verify our findings in a large set of experiments that show the advantage of our proposal in a diverse set of domains, learning algorithms and cost/benefit settings.

CloseRead Abstract

2018

Resampling with neighbourhood bias on imbalanced domains

Authors
Branco, P; Torgo, L; Ribeiro, RP;

Publication
EXPERT SYSTEMS

Abstract
Imbalanced domains are an important problem that arises in predictive tasks causing a loss in the performance on the most relevant cases for the user. This problem has been extensively studied for classification problems, where the target variable is nominal. Recently, it was recognized that imbalanced domains occur in several other contexts and for multiple tasks, such as regression tasks, where the target variable is continuous. This paper focuses on imbalanced domains in both classification and regression tasks. Resampling strategies are among the most successful approaches to address imbalanced domains. In this work, we propose variants of existing resampling strategies that are able to take into account the information regarding the neighbourhood of the examples. Instead of performing sampling uniformly, our proposals bias the strategies to reinforce some regions of the data sets. With an extensive set of experiments, we provide evidence of the advantage of introducing a neighbourhood bias in the resampling strategies for both classification and regression tasks with imbalanced data sets.

CloseRead Abstract

2018

MetaUtil: Meta Learning for Utility Maximization in Regression

Authors
Branco, P; Torgo, L; Ribeiro, RP;

Publication
Discovery Science - 21st International Conference, DS 2018, Limassol, Cyprus, October 29-31, 2018, Proceedings

Abstract
Several important real world problems of predictive analytics involve handling different costs of the predictions of the learned models. The research community has developed multiple techniques to deal with these tasks. The utility-based learning framework is a generalization of cost-sensitive tasks that takes into account both costs of errors and benefits of accurate predictions. This framework has important advantages such as allowing to represent more complex settings reflecting the domain knowledge in a more complete and precise way. Most existing work addresses classification tasks with only a few proposals tackling regression problems. In this paper we propose a new method, MetaUtil, for solving utility-based regression problems. The MetaUtil algorithm is versatile allowing the conversion of any out-of-the-box regression algorithm into a utility-based method. We show the advantage of our proposal in a large set of experiments on a diverse set of domains. © 2018, Springer Nature Switzerland AG.

CloseRead Abstract

2019

Pre-processing approaches for imbalanced distributions in regression

Authors
Branco, P; Torgo, L; Ribeiro, RP;

Publication
NEUROCOMPUTING

Abstract
Imbalanced domains are an important problem frequently arising in real world predictive analytics. A significant body of research has addressed imbalanced distributions in classification tasks, where the target variable is nominal. In the context of regression tasks, where the target variable is continuous, imbalanced distributions of the target variable also raise several challenges to learning algorithms. Imbalanced domains are characterized by: (1) a higher relevance being assigned to the performance on a subset of the target variable values; and (2) these most relevant values being underrepresented on the available data set. Recently, some proposals were made to address the problem of imbalanced distributions in regression. Still, this remains a scarcely explored issue with few existing solutions. This paper describes three new approaches for tackling the problem of imbalanced distributions in regression tasks. We propose the adaptation to regression tasks of random over-sampling and introduction of Gaussian Noise, and we present a new method called WEighted Relevance-based Combination Strategy (WERCS). An extensive set of experiments provides empirical evidence of the advantage of using the proposed strategies and, in particular, the WERCS method. We analyze the impact of different data characteristics in the performance of the methods. A data repository with 15 imbalanced regression data sets is also provided to the research community.

CloseRead Abstract

2018

2nd Workshop on Learning with Imbalanced Domains: Preface

Authors
Torgo, L; Matwin, S; Japkowicz, N; Krawczyk, B; Moniz, N; Branco, P;

Publication
Second International Workshop on Learning with Imbalanced Domains: Theory and Applications, LIDTA@ECML/PKDD 2018, Dublin, Ireland, September 10, 2018

Abstract

2018

Cost-Sensitive Learning: Preface

Authors
Torgo, L; Matwin, S; Weiss, G; Moniz, N; Branco, P;

Publication
International Workshop on Cost-Sensitive Learning, COST@SDM 2018, San Diego, California, USA, May 5, 2018

Abstract