2009
Autores
Gama, J; Sebastiao, R; Rodrigues, PP;
Publicação
KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING
Abstract
Learning from data streams is a research area of increasing importance. Nowadays, several stream learning algorithms have been developed. Most of them learn decision models that continuously evolve over time, run in resource-aware environments, detect and react to changes in the environment generating data. One important issue, not yet conveniently addressed, is the design of experimental work to evaluate and compare decision models that evolve over time. There are no golden standards for assessing performance in non-stationary environments. This paper proposes a general framework for assessing predictive stream learning algorithms. We defend the use of Predictive Sequential methods for error estimate - the prequential error. The prequential error allows us to monitor the evolution of the performance of models that evolve over time. Nevertheless, it is known to be a pessimistic estimator in comparison to holdout estimates. To obtain more reliable estimators we need some forgetting mechanism. Two viable alternatives are: sliding windows and fading factors. We observe that the prequential error converges to an holdout estimator when estimated over a sliding window or using fading factors. We present illustrative examples of the use of prequential error estimators, using fading factors, for the tasks of: i) assessing performance of a learning algorithm; ii) comparing learning algorithms; iii) hypothesis testing using McNemar test; and iv) change detection using Page-Hinkley test. In these tasks, the prequential error estimated using fading factors provide reliable estimators. In comparison to sliding windows, fading factors are faster and memory-less, a requirement for streaming applications. This paper is a contribution to a discussion in the good-practices on performance assessment when learning dynamic models that evolve over time.
2009
Autores
Ikonomovska, E; Gama, J; Sebastiao, R; Gjorgjevik, D;
Publicação
DISCOVERY SCIENCE, PROCEEDINGS
Abstract
The problem of extracting meaningful patterns from time changing data streams is of increasing importance for the machine learning and data mining communities. We present an algorithm which is able to learn regression trees from fast and unbounded data streams in the presence of concept drifts. To our best knowledge there is no other algorithm for incremental learning regression trees equipped with change detection abilities. The FIRT-DD algorithm has mechanisms for drift detection and model adaptation, which enable to maintain accurate and updated regression models at any time. The drift detection mechanism is based on sequential statistical tests that track the evolution of the local error, at each node of the tree, and inform the learning process for the detected changes. As a response to a local drift, the algorithm is able to adapt the model only locally, avoiding the necessity of a global model adaptation. The adaptation strategy consists of building a new tree whenever a change is suspected in the region and replacing the old ones when the new trees become more accurate. This enables smooth and granular adaptation of the global model. The results from the empirical evaluation performed over several different types of drift show that the algorithm has good capability of consistent detection and proper adaptation to concept drifts.
2009
Autores
Gama, J; Kosina, P;
Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS
Abstract
This work address data stream mining front dynamic environments where the distribution underlying the observations may change over time. In these contexts, learning algorithms must be equipped with change detection mechanisms. Several methods have been proposed able to detect and react to concept drift;. When a drift is signaled, most of the approaches use a forgetting mechanism, by releasing the current; model, and start, learning a, new decision model, Nevertheless, it; is not rare for the, concepts front history to reappear, for example seasonal changes. In this work we present; method that memorizes learnt; decision models whenever a concept drift is signaled. The system uses meta-learning techniques that characterize the domain of applicability of previous learnt models. The meta-learner can detect, re-occurrence of contexts and take pro-active actions by activating previous learnt models. The main benefit of this approach is that the proposed meta-learner is capable of selecting similar historical concepts, if there is one, without the knowledge of true classes of examples.
2009
Autores
Gama, J; Rodrigues, PP;
Publicação
Studies in Computational Intelligence
Abstract
The most challenging applications of knowledge discovery involve dynamic environments where data continuous flow at high-speed and exhibit non-stationary properties. In this chapter we discuss the main challenges and issues when learning from data streams. In this work, we discuss the most relevant issues in knowledge discovery from data streams: incremental learning, cost-performance management, change detection, and novelty detection. We present illustrative algorithms for these learning tasks, and a real-world application illustrating the advantages of stream processing. The chapter ends with some open issues that emerge from this new research area. © 2009 Springer-Verlag Berlin Heidelberg.
2009
Autores
Huang, R; Yang, Q; Pei, J; Gama, J; Meng, X; Li, X;
Publicação
ADMA
Abstract
2009
Autores
Gama, J; Rodrigues, PP;
Publicação
Encyclopedia of Data Warehousing and Mining, Second Edition (4 Volumes)
Abstract
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.