Publications

Publications by LIAAD

2012

Estimating reliability for assessing and correcting individual streaming predictions

Authors
Rodrigues, PPE; Bosnic, Z; Gama, J; Kononenko, I;

Publication
Reliable Knowledge Discovery

Abstract
Several predictive systems are nowadays vital for operations and decision support. The quality of these systems is most of the time defined by their average accuracy which has low or no information at all about the estimated error of each individual prediction. In these cases, users should be allowed to associate a measure of reliability to each prediction. However, with the advent of data streams, batch state-of-the-art reliability estimates need to be redefined. In this chapter we adapt and evaluate five empirical measures for online reliability estimation of individual predictions: similarity-based (k-NN) error, local sensitivity (bias and variance) and online bagging predictions (bias and variance). Evaluation is performed with a neural network base model on two different problems, with results showing that online bagging and k-NN estimates are consistently correlated with the error of the base model. Furthermore, we propose an approach for correcting individual predictions based on the CNK reliability estimate. Evaluation is done on a real-world problem (prediction of the electricity load for a selected European geographical region), using two different regression models: neural network and the k nearest neighbors algorithm. Comparison is performed with corrections based on the Kalman filter. The results show that our method performs better than the Kalman filter, significantly improving the original predictions to more accurate values.

CloseRead Abstract

2012

A Predictive Model for the Passenger Demand on a Taxi Network

Authors
Moreira Matias, L; Gama, J; Ferreira, M; Damas, L;

Publication
2012 15TH INTERNATIONAL IEEE CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC)

Abstract
In the last decade, the real-time vehicle location systems attracted everyone attention for the new kind of rich spatio-temporal information. The fast processing of this large amount of information is a growing and explosive challenge. Taxi companies are already exploring such information in efficient taxi dispatching and time-saving route finding. In this paper, we propose a novel methodology to produce online short term predictions on the passenger demand spatial distribution over 63 taxi stands in the city of Porto, Portugal. We did so using time series forecasting techniques to the processed events constantly communicated for 441 taxi vehicles. Our tests - using 4 months of real data - demonstrated that this model is a true major contribution to the driver mobility intelligence: 76% of the 86411 demanded taxi services were accurately forecasted in a 30 minutes time horizon.

CloseRead Abstract

2012

Text categorization using an ensemble classifier based on a mean co-association matrix

Authors
Moreira Matias, L; Mendes Moreira, J; Gama, J; Brazdil, P;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
Text Categorization (TC) has attracted the attention of the research community in the last decade. Algorithms like Support Vector Machines, Naïve Bayes or k Nearest Neighbors have been used with good performance, confirmed by several comparative studies. Recently, several ensemble classifiers were also introduced in TC. However, many of those can only provide a category for a given new sample. Instead, in this paper, we propose a methodology - MECAC - to build an ensemble of classifiers that has two advantages to other ensemble methods: 1) it can be run using parallel computing, saving processing time and 2) it can extract important statistics from the obtained clusters. It uses the mean co-association matrix to solve binary TC problems. Our experiments revealed that our framework performed, on average, 2.04% better than the best individual classifier on the tested datasets. These results were statistically validated for a significance level of 0.05 using the Friedman Test. © 2012 Springer-Verlag.

CloseRead Abstract

2012

Online predictive model for taxi services

Authors
Moreira Matias, L; Gama, J; Ferreira, M; Mendes Moreira, J; Damas, L;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
In recent years, both companies and researchers have been exploring intelligent data analysis to increase the profitability of the taxi industry. Intelligent systems for online taxi dispatching and time saving route finding have been built to do so. In this paper, we propose a novel methodology to produce online predictions regarding the spatial distribution of passenger demand throughout taxi stand networks. We have done so by assembling two well-known time series short-term forecast models: the time-varying Poisson models and ARIMA models. Our tests were performed using data gathered over a period of 6 months and collected from 63 taxi stands within the city of Porto, Portugal. Our results demonstrate that this model is a true major contribution to the driver mobility intelligence: 78% of the 253745 demanded taxi services were correctly forecasted in a 30 minutes horizon. © Springer-Verlag Berlin Heidelberg 2012.

CloseRead Abstract

2012

An Online Recommendation System for the Taxi Stand choice Problem

Authors
Moreira Matias, L; Fernandes, R; Gama, J; Ferreira, M; Mendes Moreira, J; Damas, L;

Publication
2012 IEEE VEHICULAR NETWORKING CONFERENCE (VNC)

Abstract
Nowadays, Informed Driving is crucial to the transportation industry. We present an online recommendation model to help the driver to decide about the best stand to head in each moment, minimizing the waiting time. Our approach uses time series forecasting techniques to predict the spatiotemporal distribution in real-time. Then, we combine this information with the live current network status to produce our output. Our online test-beds were carried out using data obtained from a fleet of 441 vehicles running in the city of Porto, Portugal. We demonstrate that our approach can be a major contribution to this industry: 395.361/506.873 of the services dispatched were correctly predicted. Our tests also highlighted that a fleet equipped with such framework surpassed a fleet that is not: they experienced an average waiting time to pick-up a passenger 5% lower than its competitor.

CloseRead Abstract

2012

Bus bunching detection by mining sequences of headway deviations

Authors
Moreira Matias, L; Ferreira, C; Gama, J; Mendes Moreira, J; De Sousa, JF;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
In highly populated urban zones, it is common to notice headway deviations (HD) between pairs of buses. When these events occur in a bus stop, they often cause bus bunching (BB) in the following bus stops. Several proposals have been suggested to mitigate this problem. In this paper, we propose to find BBS (Bunching Black Spots) - sequences of bus stops where systematic HD events cause the formation of BB. We run a sequence mining algorithm, named PrefixSpan, to find interesting events available in time series. We prove that we can accurately model the BB trip usual pattern like a frequent sequence mining problem. The subsequences proved to be a promising way of identify the route' schedule points to adjust in order to mitigate such events. © 2012 Springer-Verlag.

CloseRead Abstract