Cookies Policy
We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out More
Close
  • Menu
About

About

Vitor Cerqueira received his Licenciate degree on Applied Mathematics and MSc on Data Analytics from the Faculty of Sciences, U. Porto, in 2012 and from the Faculty of Economics, also U. Porto, in 2014, respectively. Currently, he is pursuing his Ph.D degree in the doctoral program for Informatics Engineering from the University of Porto.

He is a research fellow in LIAAD, a laboratory for Artificial Intelligence and Decision Support Systems. His main research topic is related to ensemble learning for time series forecasting tasks and actionable forecasting methods. 

Interest
Topics
Details

Details

  • Name

    Vítor Manuel Cerqueira
  • Cluster

    Computer Science
  • Role

    Research Assistant
  • Since

    23rd June 2014
001
Publications

2018

How to evaluate sentiment classifiers for Twitter time-ordered data?

Authors
Mozetic, I; Torgo, L; Cerqueira, V; Smailovic, J;

Publication
PLoS ONE

Abstract
Social media are becoming an increasingly important source of information about the public mood regarding issues such as elections, Brexit, stock market, etc. In this paper we focus on sentiment classification of Twitter data. Construction of sentiment classifiers is a standard text mining task, but here we address the question of how to properly evaluate them as there is no settled way to do so. Sentiment classes are ordered and unbalanced, and Twitter produces a stream of time-ordered data. The problem we address concerns the procedures used to obtain reliable estimates of performance measures, and whether the temporal ordering of the training and test data matters. We collected a large set of 1.5 million tweets in 13 European languages. We created 138 sentiment models and out-of-sample datasets, which are used as a gold standard for evaluations. The corresponding 138 in-sample data-sets are used to empirically compare six different estimation procedures: three variants of cross-validation, and three variants of sequential validation (where test set always follows the training set). We find no significant difference between the best cross-validation and sequential validation. However, we observe that all cross-validation variants tend to overestimate the performance, while the sequential methods tend to underestimate it. Standard cross-validation with random selection of examples is significantly worse than the blocked cross-validation, and should not be used to evaluate classifiers in time-ordered data scenarios. © 2018 Mozetic et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

2017

Arbitrated Ensemble for Solar Radiation Forecasting

Authors
Cerqueira, V; Torgo, L; Soares, C;

Publication
Advances in Computational Intelligence - 14th International Work-Conference on Artificial Neural Networks, IWANN 2017, Cadiz, Spain, June 14-16, 2017, Proceedings, Part I

Abstract

2017

autoBagging: Learning to Rank Bagging Workflows with Metalearning

Authors
Pinto, F; Cerqueira, V; Soares, C; Moreira, JM;

Publication
CoRR

Abstract
Machine Learning (ML) has been successfully applied to a wide range of domains and applications. One of the techniques behind most of these successful applications is Ensemble Learning (EL), the field of ML that gave birth to methods such as Random Forests or Boosting. The complexity of applying these techniques together with the market scarcity on ML experts, has created the need for systems that enable a fast and easy drop-in replacement for ML libraries. Automated machine learning (autoML) is the field of ML that attempts to answers these needs. We propose autoBagging, an autoML system that automatically ranks 63 bagging workflows by exploiting past performance and metalearning. Results on 140 classification datasets from the OpenML platform show that autoBagging can yield better performance than the Average Rank method and achieve results that are not statistically different from an ideal model that systematically selects the best workflow for each dataset. For the purpose of reproducibility and generalizability, autoBagging is publicly available as an R package on CRAN.

2017

Arbitrated Ensemble for Time Series Forecasting

Authors
Cerqueira, V; Torgo, L; Pinto, F; Soares, C;

Publication
Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2017, Skopje, Macedonia, September 18-22, 2017, Proceedings, Part II

Abstract
This paper proposes an ensemble method for time series forecasting tasks. Combining different forecasting models is a common approach to tackle these problems. State-of-the-art methods track the loss of the available models and adapt their weights accordingly. Metalearning strategies such as stacking are also used in these tasks. We propose a metalearning approach for adaptively combining forecasting models that specializes them across the time series. Our assumption is that different forecasting models have different areas of expertise and a varying relative performance. Moreover, many time series show recurring structures due to factors such as seasonality. Therefore, the ability of a method to deal with changes in relative performance of models as well as recurrent changes in the data distribution can be very useful in dynamic environments. Our approach is based on an ensemble of heterogeneous forecasters, arbitrated by a metalearning model. This strategy is designed to cope with the different dynamics of time series and quickly adapt the ensemble to regime changes. We validate our proposal using time series from several real world domains. Empirical results show the competitiveness of the method in comparison to state-of-the-art approaches for combining forecasters. © 2017, Springer International Publishing AG.

2017

Dynamic and Heterogeneous Ensembles for Time Series Forecasting

Authors
Cerqueira, V; Torgo, L; Oliveira, M; Pfahringer, B;

Publication
2017 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2017, Tokyo, Japan, October 19-21, 2017

Abstract