2003
Autores
Gama, J; Rocha, R; Medas, P;
Publicação
Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Abstract
In this paper we study the problem of constructing accurate decision tree models from data streams. Data streams are incremental tasks that require incremental, online, and any-time learning algorithms. One of the most successful algorithms for mining data streams is VFDT. In this paper we extend the VFDT system in two directions: the ability to deal with continuous data and the use of more powerful classification techniques at tree leaves. The proposed system, VFDTc, can incorporate and classify new information online, with a single scan of the data, in time constant per example. The most relevant property of our system is the ability to obtain a performance similar to a standard decision tree algorithm even for medium size datasets. This is relevant due to the any-time property. We study the behaviour of VFDTc in different problems and demonstrate its utility in large and medium data sets. Under a bias-variance analysis we observe that VFDTc in comparison to C4.5 is able to reduce the variance component. Copyright 2003 ACM.
2003
Autores
Castillo, G; Gama, J; Medas, P;
Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE
Abstract
Most of supervised learning algorithms assume the stability of the target concept over time. Nevertheless in many real-user modeling systems, where the data is collected over an extended period of time, the learning task can be complicated by changes in the distribution underlying the data. This problem is known in machine learning as concept drift. The main idea behind Statistical Quality Control is to monitor the stability of one or more quality characteristics in a production process which generally shows some variation over time. In this paper we present a method for handling concept drift based on Shewhart P-Charts in an on-line framework for supervised learning. We explore the use of two alternatives P-charts, which differ only by the way they estimate the target value to set the center line. Experiments with simulated concept drift scenarios in the context of a user modeling prediction task compare the proposed method with other adaptive approaches. The results show that, both P-Charts consistently recognize concept changes, and that the learner can adapt quickly to these changes to maintain its performance level.
2003
Autores
Castillo, G; Gama, J; Breda, AM;
Publicação
USER MODELING 2003, PROCEEDINGS
Abstract
We present Adaptive Bayes, an adaptive incremental version of Naive Bayes, to model a prediction task based on learning styles in the context of an Adaptive Hypermedia Educational System. Since the student's preferences can change over time, this task is related to a problem known as concept drift in the machine learning community. For this class of problems an adaptive predictive model, able to adapt quickly to the user's changes, is desirable. The results from conducted experiments show that Adaptive Bayes seems to be a fine and simple choice for this kind of prediction task in user modeling.
2003
Autores
Fonseca, N; Costa, VS; Silva, F; Camacho, R;
Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE
Abstract
2003
Autores
Fonseca, N; Rocha, R; Camacho, R; Silva, F;
Publicação
INDUCTIVE LOGIC PROGRAMMING, PROCEEDINGS
Abstract
This work aims at improving the scalability of memory usage in Inductive Logic Programming systems. In this context, we propose two efficient data structures: the Trie, used to represent lists and clauses; and the RL-Tree, a novel data structure used to represent the clauses coverage. We evaluate their performance in the April system using well known datasets. Initial results show a substantial reduction in memory usage without incurring extra execution time overheads. Our proposal is applicable in any ILP system.
2003
Autores
Michalski, RS; Brazdil, P;
Publicação
Machine Learning
Abstract
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.