Publicacoes - INESC TEC

Publicações

Publicações por João Gama

2011

Learning model trees from evolving data streams

Autores
Ikonomovska, E; Gama, J; Dzeroski, S;

Publicação
DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
The problem of real-time extraction of meaningful patterns from time-changing data streams is of increasing importance for the machine learning and data mining communities. Regression in time-changing data streams is a relatively unexplored topic, despite the apparent applications. This paper proposes an efficient and incremental stream mining algorithm which is able to learn regression and model trees from possibly unbounded, high-speed and time-changing data streams. The algorithm is evaluated extensively in a variety of settings involving artificial and real data. To the best of our knowledge there is no other general purpose algorithm for incremental learning regression/model trees able to perform explicit change detection and informed adaptation. The algorithm performs online and in real-time, observes each example only once at the speed of arrival, and maintains at any-time a ready-to-use model tree. The tree leaves contain linear models induced online from the examples assigned to them, a process with low complexity. The algorithm has mechanisms for drift detection and model adaptation, which enable it to maintain accurate and updated regression models at any time. The drift detection mechanism exploits the structure of the tree in the process of local change detection. As a response to local drift, the algorithm is able to update the tree structure only locally. This approach improves the any-time performance and greatly reduces the costs of adaptation.

FecharLer Abstract

2008

A review on the combination of binary classifiers in multiclass problems

Autores
Lorena, AC; de Carvalho, ACPLF; Gama, JMP;

Publicação
ARTIFICIAL INTELLIGENCE REVIEW

Abstract
Several real problems involve the classification of data into categories or classes. Given a data set containing data whose classes are known, Machine Learning algorithms can be employed for the induction of a classifier able to predict the class of new data from the same domain, performing the desired discrimination. Some learning techniques are originally conceived for the solution of problems with only two classes, also named binary classification problems. However, many problems require the discrimination of examples into more than two categories or classes. This paper presents a survey on the main strategies for the generalization of binary classifiers to problems with more than two classes, known as multiclass classification problems. The focus is on strategies that decompose the original multiclass problem into multiple binary subtasks, whose outputs are combined to obtain the final prediction.

FecharLer Abstract

2011

Best papers from the Fifth International Conference on Advanced Data Mining and Applications (ADMA 2009)

Autores
Pei, JA; Gama, J; Yang, QA; Huang, RH; Li, X;

Publicação
KNOWLEDGE AND INFORMATION SYSTEMS

Abstract

2007

An overview on learning from data streams - Preface

Autores
Gama, J; Rodrigues, P; Aguilar Ruiz, J;

Publicação
NEW GENERATION COMPUTING

Abstract

2012

Where are we going? Predicting the evolution of individuals

Autores
Siddiqui, ZF; Oliveira, M; Gama, J; Spiliopoulou, M;

Publicação
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
When searching for patterns on data streams, we come across perennial (dynamic) objects that evolve over time. These objects are encountered repeatedly and each time with different definition and values. Examples are (a) companies registered at stock exchange and reporting their progress at the end of each year, and (b) students whose performance is evaluated at the end of each semester. On such data, domain experts also pose questions on how the individual objects will evolve: would it be beneficial to invest in a given company, given both the company's individual performance thus far and the drift experienced in the model? Or, how will a given student perform next year, given the performance variations observed thus far? While there is much research on how models evolve/change over time [Ntoutsi et al., 2011a], little is done to predict the change of individual objects when the states are not known a priori. In this work, we propose a framework that learns the clusters to which the objects belong at each moment, uses them as ad hoc states in a state-transition graph, and then learns a mixture model of Markov Chains, which predicts the next most likely state/cluster per object. We report on our evaluation on synthetic and real datasets. © Springer-Verlag Berlin Heidelberg 2012.

FecharLer Abstract

2012

Handling time changing data with adaptive very fast decision rules

Autores
Kosina, P; Gama, J;

Publicação
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
Data streams are usually characterized by changes in the underlying distribution generating data. Therefore algorithms designed to work with data streams should be able to detect changes and quickly adapt the decision model. Rules are one of the most interpretable and flexible models for data mining prediction tasks. In this paper we present the Adaptive Very Fast Decision Rules (AVFDR), an on-line, any-time and one-pass algorithm for learning decision rules in the context of time changing data. AVFDR can learn ordered and unordered rule sets. It is able to adapt the decision model via incremental induction and specialization of rules. Detecting local drifts takes advantage of the modularity of rule sets. In AVFDR, each individual rule monitors the evolution of performance metrics to detect concept drift. AVFDR prunes rules that detect drift. This explicit change detection mechanism provides useful information about the dynamics of the process generating data, faster adaption to changes and generates compact rule sets. The experimental evaluation shows this method is able to learn fast and compact rule sets from evolving streams in comparison to alternative methods. © 2012 Springer-Verlag.

FecharLer Abstract