Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by LIAAD

2014

Constructing fading histograms from data streams

Authors
Sebastião, R; Gama, J; Mendonça, T;

Publication
Progress in AI

Abstract
The ability to collect data is changing drastically. Nowadays, data are gathered in the form of transient and finite data streams. Memory restrictions preclude keeping all received data in memory. When dealing with massive data streams, it is mandatory to create compact representations of data, also known as synopses structures or summaries. Reducing memory occupancy is of utmost importance when handling a huge amount of data. This paper addresses the problem of constructing histograms from data streams under error constraints. When constructing online histograms from data streams there are two main characteristics to embrace: the updating facility and the error of the histogram. Moreover, in dynamic environments, besides the need of compact summaries to capture the most important properties of data, it is also essential to forget old data. Therefore, this paper presents sliding histograms and fading histograms, an abrupt and a smooth strategies to forget outdated data. © 2014 Springer-Verlag Berlin Heidelberg.

2014

Distributed clustering of ubiquitous data streams

Authors
Rodrigues, PP; Gama, J;

Publication
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
Nowadays information is generated and gathered from distributed streaming data sources, stressing communications and computing infrastructure, making it hard to transmit, compute, and store. Knowledge discovery from ubiquitous data streams has become a major goal for all sorts of applications, mostly based on unsupervised techniques such as clustering. Two subproblems exist: clustering streaming data observations and clustering streaming data sources. The former searches for dense regions of the data space, identifying hot spots where data sources tend to produce data, while the latter finds groups of sources that behave similarly over time. In order to better assess the current status of this topic, this article presents a thorough review on distributed algorithms addressing either of the subproblems. We characterize clustering algorithms for ubiquitous data streams, discussing advantages and disadvantages of distributed procedures. Overall, distributed stream clustering methods improve communication ratios, processing speed, and resources consumption, while achieving similar clustering validity as the centralized counterparts. (C) 2013 John Wiley & Sons, Ltd.

2014

Enhancing data stream predictions with reliability estimators and explanation

Authors
Bosnic, Z; Demsar, J; Kespret, G; Rodrigues, PP; Gama, J; Kononenko, I;

Publication
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE

Abstract
Incremental learning from data streams is increasingly attracting research focus due to many real streaming problems (such as learning from transactions, sensors or other sequential observations) that require processing and forecasting in the real time. In this paper we deal with two issues related to incremental learning - prediction accuracy and prediction explanation - and demonstrate their applicability on several streaming problems for predicting electricity load in the future. For improving prediction accuracy we propose and evaluate the use of two reliability estimators that allow us to estimate prediction error and correct predictions. For improving interpretability of the incremental model and its predictions we propose an adaptation of the existing prediction explanation methodology, which was originally developed for batch learning from stationary data. The explanation methodology is combined with a state-of-the-art concept drift detector and a visualization technique to enhance the explanation in dynamic streaming settings. The results show that the proposed approaches can improve prediction accuracy and allow transparent insight into the modeled concept.

2014

Event and Anomaly Detection Using Tucker3 Decomposition

Authors
T, HadiFanaee; Oliveira, MarciaD.B.; Gama, Joao; Malinowski, Simon; Morla, Ricardo;

Publication
CoRR

Abstract

2014

On Predicting a Call Center's Workload: A Discretization-Based Approach

Authors
Matias, LM; Nunes, R; Ferreira, M; Moreira, JM; Gama, J;

Publication
Foundations of Intelligent Systems - 21st International Symposium, ISMIS 2014, Roskilde, Denmark, June 25-27, 2014. Proceedings

Abstract
Agent scheduling in call centers is a major management problem as the optimal ratio between service quality and costs is hardly achieved. In the literature, regression and time series analysis methods have been used to address this problem by predicting the future arrival counts. In this paper, we propose to discretize these target variables into finite intervals. By reducing its domain length, the goal is to accurately mine the demand peaks as these are the main cause for abandoned calls. This was done by employing multi-class classification. This approach was tested on a real-world dataset acquired through a taxi dispatching call center. The results demonstrate that this framework can accurately reduce the number of abandoned calls, while maintaining a reasonable staff-based cost. © 2014 Springer International Publishing.

2014

An eigenvector-based hotspot detection

Authors
T, HadiFanaee; Gama, Joao;

Publication
CoRR

Abstract

  • 326
  • 499