Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by LIAAD

2003

Accurate decision trees for mining high-speed data streams

Authors
Gama, J; Rocha, R; Medas, P;

Publication
Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Abstract
In this paper we study the problem of constructing accurate decision tree models from data streams. Data streams are incremental tasks that require incremental, online, and any-time learning algorithms. One of the most successful algorithms for mining data streams is VFDT. In this paper we extend the VFDT system in two directions: the ability to deal with continuous data and the use of more powerful classification techniques at tree leaves. The proposed system, VFDTc, can incorporate and classify new information online, with a single scan of the data, in time constant per example. The most relevant property of our system is the ability to obtain a performance similar to a standard decision tree algorithm even for medium size datasets. This is relevant due to the any-time property. We study the behaviour of VFDTc in different problems and demonstrate its utility in large and medium data sets. Under a bias-variance analysis we observe that VFDTc in comparison to C4.5 is able to reduce the variance component. Copyright 2003 ACM.

2003

Adaptation to drifting concepts

Authors
Castillo, G; Gama, J; Medas, P;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE

Abstract
Most of supervised learning algorithms assume the stability of the target concept over time. Nevertheless in many real-user modeling systems, where the data is collected over an extended period of time, the learning task can be complicated by changes in the distribution underlying the data. This problem is known in machine learning as concept drift. The main idea behind Statistical Quality Control is to monitor the stability of one or more quality characteristics in a production process which generally shows some variation over time. In this paper we present a method for handling concept drift based on Shewhart P-Charts in an on-line framework for supervised learning. We explore the use of two alternatives P-charts, which differ only by the way they estimate the target value to set the center line. Experiments with simulated concept drift scenarios in the context of a user modeling prediction task compare the proposed method with other adaptive approaches. The results show that, both P-Charts consistently recognize concept changes, and that the learner can adapt quickly to these changes to maintain its performance level.

2003

Adaptive Bayes for a student modeling prediction task based on learning styles

Authors
Castillo, G; Gama, J; Breda, AM;

Publication
USER MODELING 2003, PROCEEDINGS

Abstract
We present Adaptive Bayes, an adaptive incremental version of Naive Bayes, to model a prediction task based on learning styles in the context of an Adaptive Hypermedia Educational System. Since the student's preferences can change over time, this task is related to a problem known as concept drift in the machine learning community. For this class of problems an adaptive predictive model, able to adapt quickly to the user's changes, is desirable. The results from conducted experiments show that Adaptive Bayes seems to be a fine and simple choice for this kind of prediction task in user modeling.

2003

Experimental evaluation of a caching technique for ILP

Authors
Fonseca, N; Costa, VS; Silva, F; Camacho, R;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE

Abstract

2003

Efficient data structures for inductive logic programming

Authors
Fonseca, N; Rocha, R; Camacho, R; Silva, F;

Publication
INDUCTIVE LOGIC PROGRAMMING, PROCEEDINGS

Abstract
This work aims at improving the scalability of memory usage in Inductive Logic Programming systems. In this context, we propose two efficient data structures: the Trie, used to represent lists and clauses; and the RL-Tree, a novel data structure used to represent the clauses coverage. We evaluate their performance in the April system using well known datasets. Initial results show a substantial reduction in memory usage without incurring extra execution time overheads. Our proposal is applicable in any ILP system.

2003

Introduction

Authors
Michalski, RS; Brazdil, P;

Publication
Machine Learning

Abstract

  • 496
  • 514