Publications

Publications by João Gama

2009

Regression Trees from Data Streams with Drift Detection

Authors
Ikonomovska, E; Gama, J; Sebastiao, R; Gjorgjevik, D;

Publication
DISCOVERY SCIENCE, PROCEEDINGS

Abstract
The problem of extracting meaningful patterns from time changing data streams is of increasing importance for the machine learning and data mining communities. We present an algorithm which is able to learn regression trees from fast and unbounded data streams in the presence of concept drifts. To our best knowledge there is no other algorithm for incremental learning regression trees equipped with change detection abilities. The FIRT-DD algorithm has mechanisms for drift detection and model adaptation, which enable to maintain accurate and updated regression models at any time. The drift detection mechanism is based on sequential statistical tests that track the evolution of the local error, at each node of the tree, and inform the learning process for the detected changes. As a response to a local drift, the algorithm is able to adapt the model only locally, avoiding the necessity of a global model adaptation. The adaptation strategy consists of building a new tree whenever a change is suspected in the region and replacing the old ones when the new trees become more accurate. This enables smooth and granular adaptation of the global model. The results from the empirical evaluation performed over several different types of drift show that the algorithm has good capability of consistent detection and proper adaptation to concept drifts.

CloseRead Abstract

2009

Tracking Recurring Concepts with Meta-learners

Authors
Gama, J; Kosina, P;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS

Abstract
This work address data stream mining front dynamic environments where the distribution underlying the observations may change over time. In these contexts, learning algorithms must be equipped with change detection mechanisms. Several methods have been proposed able to detect and react to concept drift;. When a drift is signaled, most of the approaches use a forgetting mechanism, by releasing the current; model, and start, learning a, new decision model, Nevertheless, it; is not rare for the, concepts front history to reappear, for example seasonal changes. In this work we present; method that memorizes learnt; decision models whenever a concept drift is signaled. The system uses meta-learning techniques that characterize the domain of applicability of previous learnt models. The meta-learner can detect, re-occurrence of contexts and take pro-active actions by activating previous learnt models. The main benefit of this approach is that the proposed meta-learner is capable of selecting similar historical concepts, if there is one, without the knowledge of true classes of examples.

CloseRead Abstract

2009

An overview on mining data streams

Authors
Gama, J; Rodrigues, PP;

Publication
Studies in Computational Intelligence

Abstract
The most challenging applications of knowledge discovery involve dynamic environments where data continuous flow at high-speed and exhibit non-stationary properties. In this chapter we discuss the main challenges and issues when learning from data streams. In this work, we discuss the most relevant issues in knowledge discovery from data streams: incremental learning, cost-performance management, change detection, and novelty detection. We present illustrative algorithms for these learning tasks, and a real-world application illustrating the advantages of stream processing. The chapter ends with some open issues that emerge from this new research area. © 2009 Springer-Verlag Berlin Heidelberg.

CloseRead Abstract

2007

Semi-fuzzy splitting in Online Divisive-Agglomerative Clustering

Authors
Rodrigues, PP; Gama, J;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS

Abstract
The Online Divisive-Agglomerative Clustering (ODAC) is an incremental approach for clustering streaming time series using a hierarchical procedure over time. It constructs a tree-like hierarchy of clusters of streams, using a top-down strategy based on the correlation between streams. The system also possesses an agglomerative phase to enhance a dynamic behavior capable of structural change detection. However, the split decision used in the algorithm focus on the crisp boundary between two groups, which implies a high risk since it has to decide based on only a small subset of the entire data. In this work we propose a semi-fuzzy approach to the assignment of variables to newly created clusters, for a better trade-off between validity and performance. Experimental work supports the benefits of our approach.

CloseRead Abstract

2007

Stream-based electricity load forecast

Authors
Gama, J; Rodrigues, PP;

Publication
Knowledge Discovery in Databases: PKDD 2007, Proceedings

Abstract
Sensors distributed all around electrical-power distribution networks produce strean is of data it high-speed. From a data mining perspective, this sensor network problem is characterized by a large number of variables (sensors), producing a continuous flow of data, in a dynamic non-stationary environment. Companies make decisions to buy or sell energy based on load profiles and forecast. We propose an architecture based on an online clustering algorithm where each cluster (group of sensors with high correlation) contains a neural-network based predictive model. The goal is to maintain in real-time a clustering model and a predictive model able to incorporate new information at the speed data arrives. detecting changes and adapting the decision models to the most recent information. We present results illustrating the advantages of the proposed architecture, on several temporal horizons, and its competitiveness with another predictive strategy.

CloseRead Abstract

2007

Clustering techniques in sensor networks

Authors
Rodrigues, PP; Gama, J;

Publication
Learning from Data Streams: Processing Techniques in Sensor Networks

Abstract
The traditional knowledge discovery environment, where data and processing units are centralized in controlled laboratories and servers, is now completely transformed into a web of sensorial devices, some of them with local processing ability. This scenario represents a new knowledge-extraction environment, possibly not completely observable, that is much less controlled by both the human user and a common centralized control process. © 2007 Springer-Verlag Berlin Heidelberg.

CloseRead Abstract