Publications

Publications by Paula Oliveira Branco

2017

Relevance-Based Evaluation Metrics for Multi-class Imbalanced Domains

Authors
Branco, Paula; Torgo, Luis; Ribeiro, RitaP.;

Publication
Advances in Knowledge Discovery and Data Mining - 21st Pacific-Asia Conference, PAKDD 2017, Jeju, South Korea, May 23-26, 2017, Proceedings, Part I

Abstract
The class imbalance problem is a key issue that has received much attention. This attention has been mostly focused on two-classes problems. Fewer solutions exist for the multi-classes imbalance problem. From an evaluation point of view, the class imbalance problem is challenging because a non-uniform importance is assigned to the classes. In this paper, we propose a relevance-based evaluation framework that incorporates user preferences by allowing the assignment of differentiated importance values to each class. The presented solution is able to overcome difficulties detected in existing measures and increases discrimination capability. The proposed framework requires the assignment of a relevance score to the problem classes. To deal with cases where the user is not able to specify each class relevance, we describe three mechanisms to incorporate the existing domain knowledge into the relevance framework. These mechanisms differ in the amount of information available and assumptions made regarding the domain. They also allow the use of our framework in common settings of multi-class imbalanced problems with different levels of information available. © 2017, Springer International Publishing AG.

CloseRead Abstract

2017

A Framework for Recommendation of Highly Popular News Lacking Social Feedback

Authors
Moniz, N; Torgo, L; Eirinaki, M; Branco, P;

Publication
NEW GENERATION COMPUTING

Abstract
Social media is rapidly becoming the main source of news consumption for users, raising significant challenges to news aggregation and recommendation tasks. One of these challenges concerns the recommendation of very recent news. To tackle this problem, approaches to the prediction of news popularity have been proposed. In this paper, we study the task of predicting news popularity upon their publication, when social feedback is unavailable or scarce, and to use such predictions to produce news rankings. Unlike previous work, we focus on accurately predicting highly popular news. Such cases are rare, causing known issues for standard prediction models and evaluation metrics. To overcome such issues we propose the use of resampling strategies to bias learners towards these rare cases of highly popular news, and a utility-based framework for evaluating their performance. An experimental evaluation is performed using real-world data to test our proposal in distinct scenarios. Results show that our proposed approaches improve the ability of predicting and recommending highly popular news upon publication, in comparison to previous work.

CloseRead Abstract

2017

Exploring Resampling with Neighborhood Bias on Imbalanced Regression Problems

Authors
Branco, P; Torgo, L; Ribeiro, RP;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE (EPIA 2017)

Abstract
Imbalanced domains are an important problem that arises in predictive tasks causing a loss in the performance of the most relevant cases for the user. This problem has been intensively studied for classification problems. Recently it was recognized that imbalanced domains occur in several other contexts and for a diversity of types of tasks. This paper focus on imbalanced regression tasks. Resampling strategies are among the most successful approaches to imbalanced domains. In this work we propose variants of existing resampling strategies that are able to take into account the information regarding the neighborhood of the examples. Instead of performing sampling uniformly, our proposals bias the strategies for reinforcing some regions of the data sets. In an extensive set of experiments we provide evidence of the advantage of introducing a neighborhood bias in the resampling strategies.

CloseRead Abstract

2017

Learning with Imbalanced Domains: Preface

Authors
Torgo, L; Krawczyk, B; Branco, P; Moniz, N;

Publication
First International Workshop on Learning with Imbalanced Domains: Theory and Applications, LIDTA@PKDD/ECML 2017, 22 September 2017, Skopje, Macedonia

Abstract

2017

SMOGN: a Pre-processing Approach for Imbalanced Regression

Authors
Branco, P; Torgo, L; Ribeiro, RP;

Publication
First International Workshop on Learning with Imbalanced Domains: Theory and Applications, LIDTA@PKDD/ECML 2017, 22 September 2017, Skopje, Macedonia

Abstract

2016

A Survey of Predictive Modeling on Im balanced Domains

Authors
Branco, P; Torgo, L; Ribeiro, RP;

Publication
ACM COMPUTING SURVEYS

Abstract
Many real-world data-mining applications involve obtaining predictive models using datasets with strongly imbalanced distributions of the target variable. Frequently, the least-common values of this target variable are associated with events that are highly relevant for end users (e.g., fraud detection, unusual returns on stock markets, anticipation of catastrophes, etc.). Moreover, the events may have different costs and benefits, which, when associated with the rarity of some of them on the available training data, creates serious problems to predictive modeling techniques. This article presents a survey of existing techniques for handling these important applications of predictive analytics. Although most of the existing work addresses classification tasks (nominal target variables), we also describe methods designed to handle similar problems within regression tasks (numeric target variables). In this survey, we discuss the main challenges raised by imbalanced domains, propose a definition of the problem, describe the main approaches to these tasks, propose a taxonomy of the methods, summarize the conclusions of existing comparative studies as well as some theoretical analyses of some methods, and refer to some related problems within predictive modeling.

CloseRead Abstract