Publications

Publications by Luís Torgo

2019

Constructive Aggregation and Its Application to Forecasting with Dynamic Ensembles

Authors
Cerqueira, V; Pinto, F; Torgo, L; Soares, C; Moniz, N;

Publication
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2018, PT I

Abstract
While the predictive advantage of ensemble methods is nowadays widely accepted, the most appropriate way of estimating the weights of each individual model remains an open research question. Meanwhile, several studies report that combining different ensemble approaches leads to improvements in performance, due to a better trade-off between the diversity and the error of the individual models in the ensemble. We contribute to this research line by proposing an aggregation framework for a set of independently created forecasting models, i.e. heterogeneous ensembles. The general idea is to, instead of directly aggregating these models, first rearrange them into different subsets, creating a new set of combined models which is then aggregated into a final decision. We present this idea as constructive aggregation, and apply it to time series forecasting problems. Results from empirical experiments show that applying constructive aggregation to state of the art dynamic aggregation methods provides a consistent advantage. Constructive aggregation is publicly available in a software package. Data related to this paper are available at: https://github.com/vcerqueira/timeseriesdata. Code related to this paper is available at: https://github. com/vcerqueira/tsensembler.

CloseRead Abstract

2019

On Feature Selection and Evaluation of Transportation Mode Prediction Strategies

Authors
Etemad, M; Soares, A; Matwin, S; Torgo, L;

Publication
Proceedings of the Workshops of the EDBT/ICDT 2019 Joint Conference, EDBT/ICDT 2019, Lisbon, Portugal, March 26, 2019.

Abstract
Transportation modes prediction is a fundamental task for decision making in smart cities and traffic management systems. Traffic policies based on trajectory mining can save money and time for authorities and the public. It may reduce the fuel consumption, commute time, and more pleasant moments for residents and tourists. Since the number of features that may be used to predict a user transportation mode can be substantial, finding a subset of features that maximizes a performance measure is worth investigating. In this work, we explore a wrapper and an information retrieval methods to find the best subset of trajectory features for a transportation mode dataset. Our results were compared with two related papers that applied deep learning methods. The results showed that our work achieved better performance. Furthermore, two types of cross-validation approaches were investigated, and the performance results show that the random cross-validation method may provide overestimated results. © 2019 Copyright held by the owner/author(s).

CloseRead Abstract

2019

Pre-processing approaches for imbalanced distributions in regression

Authors
Branco, P; Torgo, L; Ribeiro, RP;

Publication
NEUROCOMPUTING

Abstract
Imbalanced domains are an important problem frequently arising in real world predictive analytics. A significant body of research has addressed imbalanced distributions in classification tasks, where the target variable is nominal. In the context of regression tasks, where the target variable is continuous, imbalanced distributions of the target variable also raise several challenges to learning algorithms. Imbalanced domains are characterized by: (1) a higher relevance being assigned to the performance on a subset of the target variable values; and (2) these most relevant values being underrepresented on the available data set. Recently, some proposals were made to address the problem of imbalanced distributions in regression. Still, this remains a scarcely explored issue with few existing solutions. This paper describes three new approaches for tackling the problem of imbalanced distributions in regression tasks. We propose the adaptation to regression tasks of random over-sampling and introduction of Gaussian Noise, and we present a new method called WEighted Relevance-based Combination Strategy (WERCS). An extensive set of experiments provides empirical evidence of the advantage of using the proposed strategies and, in particular, the WERCS method. We analyze the impact of different data characteristics in the performance of the methods. A data repository with 15 imbalanced regression data sets is also provided to the research community.

CloseRead Abstract

2018

2nd Workshop on Learning with Imbalanced Domains: Preface

Authors
Torgo, L; Matwin, S; Japkowicz, N; Krawczyk, B; Moniz, N; Branco, P;

Publication
Second International Workshop on Learning with Imbalanced Domains: Theory and Applications, LIDTA@ECML/PKDD 2018, Dublin, Ireland, September 10, 2018

Abstract

2019

Arbitrage of forecasting experts

Authors
Cerqueira, V; Torgo, L; Pinto, F; Soares, C;

Publication
MACHINE LEARNING

Abstract
Forecasting is an important task across several domains. Its generalised interest is related to the uncertainty and complex evolving structure of time series. Forecasting methods are typically designed to cope with temporal dependencies among observations, but it is widely accepted that none is universally applicable. Therefore, a common solution to these tasks is to combine the opinion of a diverse set of forecasts. In this paper we present an approach based on arbitrating, in which several forecasting models are dynamically combined to obtain predictions. Arbitrating is a metalearning approach that combines the output of experts according to predictions of the loss that they will incur. We present an approach for retrieving out-of-bag predictions that significantly improves its data efficiency. Finally, since diversity is a fundamental component in ensemble methods, we propose a method for explicitly handling the inter-dependence between experts when aggregating their predictions. Results from extensive empirical experiments provide evidence of the method's competitiveness relative to state of the art approaches. The proposed method is publicly available in a software package.

CloseRead Abstract

2018

Cost-Sensitive Learning: Preface

Authors
Torgo, L; Matwin, S; Weiss, G; Moniz, N; Branco, P;

Publication
International Workshop on Cost-Sensitive Learning, COST@SDM 2018, San Diego, California, USA, May 5, 2018

Abstract