Cookies Policy
We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out More
Close
  • Menu
About

About

Vitor Cerqueira received his Licenciate degree on Applied Mathematics and MSc on Data Analytics from the Faculty of Sciences, U. Porto, in 2012 and from the Faculty of Economics, also U. Porto, in 2014, respectively. Currently, he is pursuing his Ph.D degree in the doctoral program for Informatics Engineering from the University of Porto.

He is a research fellow in LIAAD, a laboratory for Artificial Intelligence and Decision Support Systems. His main research topic is related to ensemble learning for time series forecasting tasks and actionable forecasting methods. 

Interest
Topics
Details

Details

  • Name

    Vítor Manuel Cerqueira
  • Cluster

    Computer Science
  • Role

    Research Assistant
  • Since

    23rd June 2014
001
Publications

2018

How to evaluate sentiment classifiers for Twitter time-ordered data?

Authors
Mozetic, I; Torgo, L; Cerqueira, V; Smailovic, J;

Publication
PLoS ONE

Abstract
Social media are becoming an increasingly important source of information about the public mood regarding issues such as elections, Brexit, stock market, etc. In this paper we focus on sentiment classification of Twitter data. Construction of sentiment classifiers is a standard text mining task, but here we address the question of how to properly evaluate them as there is no settled way to do so. Sentiment classes are ordered and unbalanced, and Twitter produces a stream of time-ordered data. The problem we address concerns the procedures used to obtain reliable estimates of performance measures, and whether the temporal ordering of the training and test data matters. We collected a large set of 1.5 million tweets in 13 European languages. We created 138 sentiment models and out-of-sample datasets, which are used as a gold standard for evaluations. The corresponding 138 in-sample data-sets are used to empirically compare six different estimation procedures: three variants of cross-validation, and three variants of sequential validation (where test set always follows the training set). We find no significant difference between the best cross-validation and sequential validation. However, we observe that all cross-validation variants tend to overestimate the performance, while the sequential methods tend to underestimate it. Standard cross-validation with random selection of examples is significantly worse than the blocked cross-validation, and should not be used to evaluate classifiers in time-ordered data scenarios. © 2018 Mozetic et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

2018

Constructive Aggregation and Its Application to Forecasting with Dynamic Ensembles

Authors
Cerqueira, V; Pinto, F; Torgo, L; Soares, C; Moniz, N;

Publication
Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2018, Dublin, Ireland, September 10-14, 2018, Proceedings, Part I

Abstract

2018

SMOTEBoost for Regression: Improving the Prediction of Extreme Values

Authors
Moniz, N; Ribeiro, RP; Cerqueira, V; Chawla, N;

Publication
5th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2018, Turin, Italy, October 1-3, 2018

Abstract

2018

On Evaluating Floating Car Data Quality for Knowledge Discovery

Authors
Cerqueira, V; Moreira Matias, L; Khiari, J; van Lint, H;

Publication
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS

Abstract
Floating car data (FCD) denotes the type of data (location, speed, and destination) produced and broadcasted periodically by running vehicles. Increasingly, intelligent transportation systems take advantage of such data for prediction purposes as input to road and transit control and to discover useful mobility patterns with applications to transport service design and planning, to name just a few applications. However, there are considerable quality issues that affect the usefulness and efficacy of FCD in these many applications. In this paper, we propose a methodology to compute such quality indicators automatically for large FCD sets. It leverages on a set of statistical indicators (named Yuki-san) covering multiple dimensions of FCD such as spatio-temporal coverage, accuracy, and reliability. As such, the Yuki-san indicators provide a quick and intuitive means to assess the potential "value" and "veracity" characteristics of the data. Experimental results with two mobility-related data mining and supervised learning tasks on the basis of two real-world FCD sources show that the Yuki-san indicators are indeed consistent with how well the applications perform using the data. With a wider variety of FCD (e.g., from navigation systems and CAN buses) becoming available, further research and validation into the dimensions covered and the efficacy of the Yuki-San indicators is needed.

2017

Arbitrated Ensemble for Solar Radiation Forecasting

Authors
Cerqueira, V; Torgo, L; Soares, C;

Publication
Advances in Computational Intelligence - 14th International Work-Conference on Artificial Neural Networks, IWANN 2017, Cadiz, Spain, June 14-16, 2017, Proceedings, Part I

Abstract