Publications

Publications by João Mendes Moreira

2014

An Incremental Probabilistic Model to Predict Bus Bunching in Real-Time

Authors
Moreira Matias, L; Gama, J; Mendes Moreira, J; de Sousa, JF;

Publication
ADVANCES IN INTELLIGENT DATA ANALYSIS XIII

Abstract
In this paper, we presented a probabilistic framework to predict Bus Bunching (BB) occurrences in real-time. It uses both historical and real-time data to approximate the headway distributions on the further stops of a given route by employing both offline and online supervised learning techniques. Such approximations are incrementally calculated by reusing the latest prediction residuals to update the further ones. These update rules extend the Perceptron's delta rule by assuming an adaptive beta value based on the current context. These distributions are then used to compute the likelihood of forming a bus platoon on a further stop - which may trigger an threshold-based BB alarm. This framework was evaluated using real-world data about the trips of 3 bus lines throughout an year running on the city of Porto, Portugal. The results are promising.

CloseRead Abstract

2014

An Online Learning Framework for Predicting the Taxi Stand's Profitability

Authors
Moreira Matias, L; Mendes Moreira, J; Ferreira, M; Gama, J; Damas, L;

Publication
2014 IEEE 17TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC)

Abstract
Taxi services play a central role in the mobility dynamics of major urban areas. Advanced communication devices such as GPS (Global Positioning System) and GSM (Global System for Mobile Communications) made it possible to monitor the drivers' activities in real-time. This paper presents an online learning approach to predict profitability in taxi stands. This approach consists of classifying each stand based according to the type of services that are being requested (for instance, short and long trips). This classification is achieved by maintaining a time-evolving histogram to approximate local probability density functions (p.d.f.) in service revenues. The future values of this histogram are estimated using time series analysis methods assuming that a non-homogeneous Poisson process is in place. Finally, the method's outputs were combined using a voting ensemble scheme based on a sliding window of historical data. Experimental tests were conducted using online data transmitted by 441 vehicles of a fleet running in the city of Porto, Portugal. The results demonstrated that the proposed framework can provide an effective insight on the characterization of taxi stand profitability.

CloseRead Abstract

2016

CHADE: Metalearning with Classifier Chains for Dynamic Combination of Classifiers

Authors
Pinto, F; Soares, C; Moreira, JM;

Publication
Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2016, Riva del Garda, Italy, September 19-23, 2016, Proceedings, Part I

Abstract
Dynamic selection or combination (DSC) methods allow to select one or more classifiers from an ensemble according to the characteristics of a given test instance x. Most methods proposed for this purpose are based on the nearest neighbours algorithm: it is assumed that if a classifier performed well on a set of instances similar to x, it will also perform well on x. We address the problem of dynamically combining a pool of classifiers by combining two approaches: metalearning and multi-label classification. Taking into account that diversity is a fundamental concept in ensemble learning and the interdependencies between the classifiers cannot be ignored, we solve the multi-label classification problem by using a widely known technique: Classifier Chains (CC). Additionally, we extend a typical metalearning approach by combining metafeatures characterizing the interdependencies between the classifiers with the base-level features.We executed experiments on 42 classification datasets and compared our method with several state-of-the-art DSC techniques, including another metalearning approach. Results show that our method allows an improvement over the other metalearning approach and is very competitive with the other four DSC methods. © Springer International Publishing AG 2016.

CloseRead Abstract

2017

autoBagging: Learning to Rank Bagging Workflows with Metalearning

Authors
Pinto, F; Cerqueira, V; Soares, C; Moreira, JM;

Publication
Proceedings of the International Workshop on Automatic Selection, Configuration and Composition of Machine Learning Algorithms co-located with the European Conference on Machine Learning & Principles and Practice of Knowledge Discovery in Databases, AutoML@PKDD/ECML 2017, Skopje, Macedonia, September 22, 2017.

Abstract
Machine Learning (ML) has been successfully applied to a wide range of domains and applications. One of the techniques behind most of these successful applications is Ensemble Learning (EL), the field of ML that gave birth to methods such as Random Forests or Boosting. The complexity of applying these techniques together with the market scarcity on ML experts, has created the need for systems that enable a fast and easy drop-in replacement for ML libraries. Automated machine learning (autoML) is the field of ML that attempts to answers these needs. We propose autoBagging, an autoML system that automatically ranks 63 bagging workflows by exploiting past performance and metalearning. Results on 140 classification datasets from the OpenML platform show that autoBagging can yield better performance than the Average Rank method and achieve results that are not statistically different from an ideal model that systematically selects the best workflow for each dataset. For the purpose of reproducibility and generalizability, autoBagging is publicly available as an R package on CRAN.

CloseRead Abstract

2015

Improving Mass Transit Operations by Using AVL-Based Systems: A Survey

Authors
Moreira Matias, L; Mendes Moreira, J; de Sousa, JF; Gama, J;

Publication
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS

Abstract
Intelligent transportation systems based on automated data collection frameworks are widely used by the major transit companies around the globe. This paper describes the current state of the art on improving both planning and control on public road transportation companies using automatic vehicle location (AVL) data. By surveying this topic, the expectation is to help develop a better understanding of the nature, approaches, challenges, and opportunities with regard to these problems. This paper starts by presenting a brief review on improving the network definition based on historical location-based data. Second, it presents a comprehensive review on AVL-based evaluation techniques of the schedule plan (SP) reliability, discussing the existing metrics. Then, the different dimensions on improving the SP reliability are presented in detail, as well as the works addressing such problem. Finally, the automatic control strategies are also revised, along with the research employed over the location-based data. A comprehensive discussion on the techniques employed is provided to encourage those who are starting research on this topic. It is important to highlight that there are still gaps in AVL-based literature, such as the following: 1) long-term travel time prediction; 2) finding optimal slack time; or 3) choosing the best control strategy to apply in each situation in the event of schedule instability. Hence, this paper includes introductory model formulations, reference surveys, formal definitions, and an overview of a promising area, which is of interest to any researcher, regardless of the level of expertise.

CloseRead Abstract

2014

Simulation of the ensemble generation process: The divergence between data and model similarity

Authors
Pinto, F; Mendes Moreira, J; Soares, C; Rossetti, RJF;

Publication
Modelling and Simulation 2014 - European Simulation and Modelling Conference, ESM 2014

Abstract
In this paper we present a Netlogo simulation model for a Data Mining methodological process: ensemble classifier generation. The model allows to study the trade-off between data characteristics and diversity, a key concept in Ensemble Learning. We studied the re™ search hypothesis that data characteristics should also be taken into account while generating ensemble classifier models. The results of our experiments indicate that diversity is in fact a key concept in Ensemble Learning but regarding our research hypothesis, the findings axe inconclusive.

CloseRead Abstract