Publications

Publications by Luís Torgo

2016

Predicting Wildfires Propositional and Relational Spatio-Temporal Pre-processing Approaches

Authors
Oliveira, M; Torgo, L; Costa, VS;

Publication
DISCOVERY SCIENCE, (DS 2016)

Abstract
We present and evaluate two different methods for building spatio-temporal features: a propositional method and a method based on propositionalisation of relational clauses. Our motivating application, a regression problem, requires the prediction of the fraction of each Portuguese parish burnt yearly by wildfires - a problem with a strong socio-economic and environmental impact in the country. We evaluate and compare how these methods perform individually and combined together. We successfully use under-sampling to deal with the high skew in the data set. We find that combining the approaches significantly improves the similar results obtained by each method individually.

CloseRead Abstract

2017

Resampling strategies for imbalanced time series forecasting

Authors
Moniz, N; Branco, P; Torgo, L;

Publication
I. J. Data Science and Analytics

Abstract
Time series forecasting is a challenging task, where the non-stationary characteristics of data portray a hard setting for predictive tasks. A common issue is the imbalanced distribution of the target variable, where some values are very important to the user but severely under-represented. Standard prediction tools focus on the average behaviour of the data. However, the objective is the opposite in many forecasting tasks involving time series: predicting rare values. A common solution to forecasting tasks with imbalanced data is the use of resampling strategies, which operate on the learning data by changing its distribution in favour of a given bias. The objective of this paper is to provide solutions capable of significantly improving the predictive accuracy on rare cases in forecasting tasks using imbalanced time series data. We extend the application of resampling strategies to the time series context and introduce the concept of temporal and relevance bias in the case selection process of such strategies, presenting new proposals. We evaluate the results of standard forecasting tools and the use of resampling strategies, with and without bias over 24 time series data sets from six different sources. Results show a significant increase in predictive accuracy on rare cases associated with using resampling strategies, and the use of biased strategies further increases accuracy over non-biased strategies. © 2017, Springer International Publishing Switzerland.

CloseRead Abstract

2014

Resampling Approaches to Improve News Importance Prediction

Authors
Moniz, N; Torgo, L; Rodrigues, F;

Publication
ADVANCES IN INTELLIGENT DATA ANALYSIS XIII

Abstract
The methods used to produce news rankings by recommender systems are not public and it is unclear if they reflect the real importance assigned by readers. We address the task of trying to forecast the number of times a news item will be tweeted, as a proxy for the importance assigned by its readers. We focus on methods for accurately forecasting which news will have a high number of tweets as these are the key for accurate recommendations. This type of news is rare and this creates difficulties to standard prediction methods. Recent research has shown that most models will fail on tasks where the goal is accuracy on a small sub-set of rare values of the target variable. In order to overcome this, resampling approaches with several methods for handling imbalanced regression tasks were tested in our domain. This paper describes and discusses the results of these experimental comparisons.

CloseRead Abstract

2016

Resampling Strategies for Imbalanced Time Series

Authors
Moniz, N; Branco, P; Torgo, L;

Publication
PROCEEDINGS OF 3RD IEEE/ACM INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS, (DSAA 2016)

Abstract
Time series forecasting is a challenging task, where the non-stationary characteristics of the data portrays a hard setting for predictive tasks. A common issue is the imbalanced distribution of the target variable, where some intervals are very important to the user but severely underrepresented. Standard regression tools focus on the average behaviour of the data. However, the objective is the opposite in many forecasting tasks involving time series: predicting rare values. A common solution to forecasting tasks with imbalanced data is the use of resampling strategies, which operate on the learning data by changing its distribution in favor of a given bias. The objective of this paper is to provide solutions capable of significantly improving the predictive accuracy of rare cases in forecasting tasks using imbalanced time series data. We extend the application of resampling strategies to the time series context and introduce the concept of temporal and relevance bias in the case selection process of such strategies, presenting new proposals. We evaluate the results of standard regression tools and the use of resampling strategies, with and without bias over 24 time series data sets from 6 different sources. Results show a significant increase in predictive accuracy of rare cases associated with the use of resampling strategies, and the use of biased strategies.

CloseRead Abstract

2013

SMOTE for regression

Authors
Torgo, L; Ribeiro, RP; Pfahringer, B; Branco, P;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
Several real world prediction problems involve forecasting rare values of a target variable. When this variable is nominal we have a problem of class imbalance that was already studied thoroughly within machine learning. For regression tasks, where the target variable is continuous, few works exist addressing this type of problem. Still, important application areas involve forecasting rare extreme values of a continuous target variable. This paper describes a contribution to this type of tasks. Namely, we propose to address such tasks by sampling approaches. These approaches change the distribution of the given training data set to decrease the problem of imbalance between the rare target cases and the most frequent ones. We present a modification of the well-known Smote algorithm that allows its use on these regression tasks. In an extensive set of experiments we provide empirical evidence for the superiority of our proposals for these particular regression tasks. The proposed SmoteR method can be used with any existing regression algorithm turning it into a general tool for addressing problems of forecasting rare extreme values of a continuous target variable. © 2013 Springer-Verlag.

CloseRead Abstract

2015

A Survey of Predictive Modelling under Imbalanced Distributions

Authors
Branco, Paula; Torgo, Luis; Ribeiro, RitaP.;

Publication
CoRR

Abstract