Publications

Publications by Carlos Manuel Soares

2016

Sentiment Aggregate Functions for Political Opinion Polling using Microblog Streams

Authors
Saleiro, P; Gomes, L; Soares, C;

Publication
C3S2E

Abstract
The automatic content analysis of mass media in the social sciences has become necessary and possible with the raise of social media and computational power. One particularly promising avenue of research concerns the use of sentiment analysis in microblog streams. However, one of the main challenges consists in aggregating sentiment polarity in a timely fashion that can be fed to the prediction method. We investigated a large set of sentiment aggregate functions and performed a regression analysis using political opinion polls as gold standard. Our dataset contains nearly 233 000 tweets, classified according to their polarity (positive, negative or neutral), regarding the five main Portuguese political leaders during the Portuguese bailout (2011-2014). Results show that different sentiment aggregate functions exhibit different feature importance over time while the error keeps almost unchanged.

CloseRead Abstract

2013

Active Selection of Training Instances for a Random forest Meta-Learner

Authors
Sousa, AFM; Prudêncio, RBC; Soares, C; Ludermir, TB;

Publication
2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)

Abstract
Several approaches have been applied to the task of algorithm selection. In this context, Meta-Learning provides an efficient solution by adopting a supervised strategy. Despite its promising results, Meta-Learning requires an adequate number of instances to produce a rich set of meta-examples. Recent approaches to generate synthetic or manipulated datasets have been adopted with success in the context of Meta-Learning. These proposals include the datasetoids approach, a simple data manipulation technique that generates new datasets from existing ones. Although such proposals can actually produce relevant datasets, they can eventually produce redundant, or even irrelevant, problem instances. Active Meta-Learning arises in this context to select only the most informative instances for meta-example generation. In this work, we investigate the Active Meta-Learning combined with datasetoids, focusing on using the Random forest algorithm in meta-learning. Our experiments revealed that it is possible to reduce the computational cost of generating meta-examples and obtain a significant gain in Meta-Learning performance.

CloseRead Abstract

2015

A Comparative Study of Regression and Classification Algorithms for Modelling Students' Academic Performance

Authors
Strecht, P; Cruz, L; Soares, C; Moreira, JM; Abreu, R;

Publication
EDM

Abstract

2017

FEUP at SemEval-2017 Task 5: Predicting Sentiment Polarity and Intensity with Financial Word Embeddings

Authors
Saleiro, P; Rodrigues, EM; Soares, C; Oliveira, EC;

Publication
SemEval@ACL

Abstract

2013

Space Allocation in the Retail Industry: A Decision Support System Integrating Evolutionary Algorithms and Regression Models

Authors
Pinto, F; Soares, C;

Publication
ECML/PKDD (3)

Abstract
One of the hardest resources to manage in retail is space. Retailers need to assign limited store space to a growing number of product categories such that sales and other performance metrics are maximized. Although this seems to be an ideal task for a data mining approach, there is one important barrier: the representativeness of the available data. In fact, changes to the layout of retail stores are infrequent. This means that very few values of the space variable are represented in the data, which makes it hard to generalize. In this paper, we describe a Decision Support System to assist retailers in this task. The system uses an Evolutionary Algorithm to optimize space allocation based on the estimated impact on sales caused by changes in the space assigned to product categories. We assess the quality of the system on a real case study, using different regression algorithms to generate the estimates. The system obtained very good results when compared with the recommendations made by the business experts. We also investigated the effect of the representativeness of the sample on the accuracy of the regression models. We selected a few product categories based on a heuristic assessment of their representativeness. The results indicate that the best regression models were obtained on products for which the sample was not the best. The reason for this unexpected results remains to be explained. © 2013 Springer-Verlag.

CloseRead Abstract

2015

Estimating Fuel Consumption from GPS Data

Authors
Vilaça, A; Aguiar, A; Soares, C;

Publication
PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2015)

Abstract
The road transportation sector is responsible for 87% of the human CO2 emissions. The estimation and prediction of fuel consumption plays a key role in the development of systems that foster the reduction of those emissions through trip planing. In this paper, we present a predictive regression model of instantaneous fuel consumption for diesel and gasoline light-duty vehicles, based on their instantaneous speed and acceleration and on road inclination. The parameters are extracted from GPS data, thus the models do not require data from dedicated vehicle sensors. We use data collected by 17 drivers during their daily commutes using the SenseMyCity crowdsensor. We perform an empyrical comparison of several regression algorithms for prediction across trips of the same vehicle and for prediction across vehicles. The results show that models trained for a vehicle show similar RMSE when are applied to other vehicles with similar characteristics. Relying on these results, we propose fuel type specific models that provide an accurate prediction for vehicles with similar characteristics to those on which the models were trained.

CloseRead Abstract