Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by LIAAD

2014

An Incremental Probabilistic Model to Predict Bus Bunching in Real-Time

Authors
Moreira Matias, L; Gama, J; Mendes Moreira, J; de Sousa, JF;

Publication
ADVANCES IN INTELLIGENT DATA ANALYSIS XIII

Abstract
In this paper, we presented a probabilistic framework to predict Bus Bunching (BB) occurrences in real-time. It uses both historical and real-time data to approximate the headway distributions on the further stops of a given route by employing both offline and online supervised learning techniques. Such approximations are incrementally calculated by reusing the latest prediction residuals to update the further ones. These update rules extend the Perceptron's delta rule by assuming an adaptive beta value based on the current context. These distributions are then used to compute the likelihood of forming a bus platoon on a further stop - which may trigger an threshold-based BB alarm. This framework was evaluated using real-world data about the trips of 3 bus lines throughout an year running on the city of Porto, Portugal. The results are promising.

2014

An Online Learning Framework for Predicting the Taxi Stand's Profitability

Authors
Moreira Matias, L; Mendes Moreira, J; Ferreira, M; Gama, J; Damas, L;

Publication
2014 IEEE 17TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC)

Abstract
Taxi services play a central role in the mobility dynamics of major urban areas. Advanced communication devices such as GPS (Global Positioning System) and GSM (Global System for Mobile Communications) made it possible to monitor the drivers' activities in real-time. This paper presents an online learning approach to predict profitability in taxi stands. This approach consists of classifying each stand based according to the type of services that are being requested (for instance, short and long trips). This classification is achieved by maintaining a time-evolving histogram to approximate local probability density functions (p.d.f.) in service revenues. The future values of this histogram are estimated using time series analysis methods assuming that a non-homogeneous Poisson process is in place. Finally, the method's outputs were combined using a voting ensemble scheme based on a sliding window of historical data. Experimental tests were conducted using online data transmitted by 441 vehicles of a fleet running in the city of Porto, Portugal. The results demonstrated that the proposed framework can provide an effective insight on the characterization of taxi stand profitability.

2014

Collaborative Wind Power Forecast

Authors
Almeida, V; Gama, J;

Publication
ADAPTIVE AND INTELLIGENT SYSTEMS, ICAIS 2014

Abstract
There are several new emerging environments, generating data spatially spread and interrelated. These applications reinforce the importance of the development of analytical systems capable to sense the environment and receive data from different locations. In this study we explore collaborative methodologies in a real-world problem: wind power prediction. Wind power is considered one of the most rapidly growing sources of electricity generation all over the world. The problem consists of monitoring a network of wind farms that collaborate by sharing information in a very short-term forecasting problem. We use an auto-regressive integrated moving average (ARIMA) model. The Symbolic Aggregate Approximation (SAX) is used in the selection of the set of neighbours. We propose two collaborative methods. The first one, based on a centralized management, exchange data-points between nodes. In the second approach, correlated wind farms share their own ARIMA models. In the experimental work we use 1 year data from 16 wind farms. The goal is to predict the energy produced at each farm every hour in the next 6 hours. We compare the proposed methods against ARIMA models trained with data of each one of the farms and with the persistence model at each farm. We observe a small but consistent reduction of the root mean square error (RMSE) of the predictions.

2014

Using probabilistic graphical models to enhance the prognosis of health-related quality of life in adult survivors of critical illness

Authors
Dias, CC; Granja, C; Costa Pereira, A; Gama, J; Rodrigues, PP;

Publication
2014 IEEE 27TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS (CBMS)

Abstract
Health-related quality of life (HR-QoL) is a subjective concept, reflecting the overall mental and physical state of the patient, and their own sense of well-being. Estimating current and future QoL has become a major outcome in the evaluation of critically ill patients. The aim of this study is to enhance the inference process of 6 weeks and 6 months prognosis of QoL after intensive care unit (ICU) stay, using the EQ-5D questionnaire. The main outcomes of the study were the EQ-5D five main dimensions: mobility, self-care, usual activities, pain and anxiety/depression. For each outcome, three Bayesian classifiers were built and validated with 10-fold cross-validation. Sixty and 473 patients (6 weeks and 6 months, respectively) were included. Overall, 6 months QoL is higher than 6 weeks, with the probability of absence of problems ranging from 31% (6 weeks mobility) to 72% (6 months self-care). Bayesian models achieved prognosis accuracies of 56% (6 months, anxiety/depression) up to 80% (6 weeks, mobility). The prognosis inference process for an individual patient was enhanced with the visual analysis of the models, showing that women, elderly, or people with longer ICU stay have higher risk of QoL problems at 6 weeks. Likewise, for the 6 months prognosis, a higher APACHE II severity score also leads to a higher risk of problems, except for anxiety/depression where the youngest and active have increased risk. Bayesian networks are competitive with less descriptive strategies, improve the inference process by incorporating domain knowledge and present a more interpretable model. The relationships among different factors extracted by the Bayesian models are in accordance with those collected by previous state-of-the-art literature, hence showing their usability as inference model.

2014

Event labeling combining ensemble detectors and background knowledge

Authors
T, HF; Gama, J;

Publication
Progress in AI

Abstract
Event labeling is the process of marking events in unlabeled data. Traditionally, this is done by involving one or more human experts through an expensive and timeconsuming task. In this article we propose an event labeling system relying on an ensemble of detectors and background knowledge. The target data are the usage log of a real bike sharing system. We first label events in the data and then evaluate the performance of the ensemble and individual detectors on the labeled data set using ROC analysis and static evaluation metrics in the absence and presence of background knowledge. Our results show that when there is no access to human experts, the proposed approach can be an effective alternative for labeling events. In addition to the main proposal, we conduct a comparative study regarding the various predictive models performance, semi-supervised and unsupervised approaches, train data scale, time series filtering methods, online and offline predictive models, and distance functions in measuring time series similarity. © Springer-Verlag Berlin Heidelberg 2013.

2014

Unsupervised density-based behavior change detection in data streams

Authors
Vallim, RMM; Andrade Filho, JA; de Mello, RF; de Carvalho, ACPLF; Gama, J;

Publication
INTELLIGENT DATA ANALYSIS

Abstract
The ability to detect changes in the data distribution is an important issue in Data Stream mining. Detecting changes in data distribution allows the adaptation of a previously learned model to accommodate the most recent data and, therefore, improve its prediction capability. This paper proposes a framework for non-supervised automatic change detection in Data Streams called M-DBScan. This framework is composed of a density-based clustering step followed by a novelty detection procedure based on entropy level measures. This work uses two different types of entropy measures, where one considers the spatial distribution of data while the other models temporal relations between observations in the stream. The performance of the method is assessed in a set of experiments comparing M-DBScan with a proximity-based approach. Experimental results provide important insight on how to design change detection mechanisms for streams.

  • 327
  • 499