Publicacoes - INESC TEC

Publicações

Publicações por João Mendes Moreira

2013

Predicting Taxi-Passenger Demand Using Streaming Data

Autores
Moreira Matias, L; Gama, J; Ferreira, M; Mendes Moreira, J; Damas, L;

Publicação
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS

Abstract
Informed driving is increasingly becoming a key feature for increasing the sustainability of taxi companies. The sensors that are installed in each vehicle are providing new opportunities for automatically discovering knowledge, which, in return, delivers information for real-time decision making. Intelligent transportation systems for taxi dispatching and for finding time-saving routes are already exploring these sensing data. This paper introduces a novel methodology for predicting the spatial distribution of taxi-passengers for a short-term time horizon using streaming data. First, the information was aggregated into a histogram time series. Then, three time-series forecasting techniques were combined to originate a prediction. Experimental tests were conducted using the online data that are transmitted by 441 vehicles of a fleet running in the city of Porto, Portugal. The results demonstrated that the proposed framework can provide effective insight into the spatiotemporal distribution of taxi-passenger demand for a 30-min horizon.

FecharLer Abstract

2014

Evaluating changes in the operational planning of public transportation

Autores
Mendes Moreira, J; De Freire Sousa, J;

Publicação
Advances in Intelligent Systems and Computing

Abstract
Operational planning at public transport companies is a complex process that usually comprises several phases. In the planning phase, schedules are constructed considering that buses arrive and depart as scheduled. Obviously, several disruptions frequently occur, but their impact on the operating conditions is not easy to estimate. This difficulty arises mostly due to the impossibility of testing different solutions under the same conditions. Indeed, typically, the available data are a result of the current plan, while new proposed solutions have not produced real data yet. Along this chapter we discuss the assessment of the impact of changes in the operational planning on the real operating conditions, before their occurrence. We present a framework for such assessment, which includes two components: the impact on costs, and the impact on revenues. We believe that this framework will be useful in future works on operational planning of public transport companies. © Springer International Publishing Switzerland 2014.

FecharLer Abstract

2014

Merging Decision Trees: A Case Study in Predicting Student Performance

Autores
Strecht, P; Mendes Moreira, J; Soares, C;

Publicação
ADVANCED DATA MINING AND APPLICATIONS, ADMA 2014

Abstract
Predicting the failure of students in university courses can provide useful information for course and programme managers as well as to explain the drop out phenomenon. While it is important to have models at course level, their number makes it hard to extract knowledge that can be useful at the university level. Therefore, to support decision making at this level, it is important to generalize the knowledge contained in those models. We propose an approach to group and merge interpretable models in order to replace them with more general ones without compromising the quality of predictive performance. We evaluate our approach using data from the U. Porto. The results obtained are promising, although they suggest alternative approaches to the problem.

FecharLer Abstract

2014

On predicting a call center's workload: A discretization-based approach

Autores
Moreira Matias, L; Nunes, R; Ferreira, M; Mendes Moreira, J; Gama, J;

Publicação
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
Agent scheduling in call centers is a major management problem as the optimal ratio between service quality and costs is hardly achieved. In the literature, regression and time series analysis methods have been used to address this problem by predicting the future arrival counts. In this paper, we propose to discretize these target variables into finite intervals. By reducing its domain length, the goal is to accurately mine the demand peaks as these are the main cause for abandoned calls. This was done by employing multi-class classification. This approach was tested on a real-world dataset acquired through a taxi dispatching call center. The results demonstrate that this framework can accurately reduce the number of abandoned calls, while maintaining a reasonable staff-based cost. © 2014 Springer International Publishing.

FecharLer Abstract

2013

On Predicting the Taxi-Passenger Demand: A Real-Time Approach

Autores
Moreira Matias, L; Gama, J; Ferreira, M; Mendes Moreira, J; Damas, L;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2013

Abstract
Informed driving is becoming a key feature to increase the sustainability of taxi companies. Some recent works are exploring the data broadcasted by each vehicle to provide live information for decision making. In this paper, we propose a method to employ a learning model based on historical GPS data in a real-time environment. Our goal is to predict the spatiotemporal distribution of the Taxi-Passenger demand in a short time horizon. We did so by using learning concepts originally proposed to a well-known online algorithm: the perceptron [1]. The results were promising: we accomplished a satisfactory performance to output the next prediction using a short amount of resources.

FecharLer Abstract

2014

An Empirical Methodology to Analyze the Behavior of Bagging

Autores
Pinto, F; Soares, C; Mendes Moreira, J;

Publicação
ADVANCED DATA MINING AND APPLICATIONS, ADMA 2014

Abstract
In this paper we propose and apply a methodology to study the relationship between the performance of bagging and the characteristics of the bootstrap samples. The methodology consists of 1) an extensive set of experiments to estimate the empirical distribution of performance of the population of all possible ensembles that can be created with those bootstraps and 2) a metalearning approach to analyze that distribution based on characteristics of the bootstrap samples and their relationship with the complete training set. Given the large size of the population of all ensembles, we empirically show that it is possible to apply the methodology to a sample. We applied the methodology to 53 classification datasets for ensembles of 20 and 100 models. Our results show that diversity is crucial for an important bootstrap and we show evidence of a metric that can measure diversity without any learning process involved. We also found evidence that the best bootstraps have a predictive power very similar to the one presented by the training set using naive models.

FecharLer Abstract