2010
Authors
Vatsavai, RR; Omitaomu, OA; Gama, J; Chawla, NV; Gaber, MM; Ganguly, AR;
Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Abstract
2004
Authors
Kubat, M; Gama, J; Utgoff, P;
Publication
Intelligent Data Analysis
Abstract
2009
Authors
Gama, J; Carvalho, A; Rodrigues, PP; Aguilar, J;
Publication
Proceedings of the ACM Symposium on Applied Computing
Abstract
2009
Authors
Qiang, Y; Ronghuai, H; Jian, P; Gama, J; Xiaofeng, M;
Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Abstract
2005
Authors
Gama, J; Moura Pires, J; Cardoso, M; Marques, NC; Cavique, L;
Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Abstract
2006
Authors
Gama, J; Fernandes, R; Rocha, R;
Publication
INTELLIGENT DATA ANALYSIS
Abstract
In this paper we study the problem of constructing accurate decision tree models from data streams. Data streams are incremental tasks that require incremental, online, and any-time learning algorithms. One of the most successful algorithms for mining data streams is VFDT. We have extended VFDT in three directions: the ability to deal with continuous data; the use of more powerful classification techniques at tree leaves, and the ability to detect and react to concept drift. VFDTc system can incorporate and classify new information online, with a single scan of the data, in time constant per example. The most relevant property of our system is the ability to obtain a performance similar to a standard decision tree algorithm even for medium size datum. This is relevant due to the any-time property. We also extend VFDTc with the ability to deal with concept drift, by continuously monitoring differences between two class-distribution of the examples: the distribution when a node was built and the distribution in a time window of the most recent examples. We study the sensitivity of VFDTc with respect to drift, noise, the order of examples, and the initial parameters in different problems and demonstrate its utility in large and medium data sets.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.