Publications

Publications by LIAAD

2005

An adaptive predictive model for student modeling

Authors
Castillo, G; Gama, J; Breda, AM;

Publication
Advances in Web-Based Education: Personalized Learning Environments

Abstract
This chapter presents an adaptive predictive model for a student modeling prediction task in the context of an adaptive educational hypermedia system (AEHS). The task, that consists in determining what kind of learning resources are more appropriate to a particular learning style, presents two issues that are critical. The first is related to the uncertainty of the information about the student's learning style acquired by psychometric instruments. The second is related to the changes over time of the student's preferences (concept drift). To approach this task, we propose a probabilistic adaptive predictive model that includes a method to handle concept drift based on statistical quality control. We claim that our approach is able to adapt quickly to changes in the student's preferences and that it should be successfully used in similar user modeling prediction tasks, where uncertainty and concept drift are presented. © 2006, Idea Group Inc.

CloseRead Abstract

2005

Learning decision trees from dynamic data streams

Authors
Gama, J; Medas, P; Rodrigues, P;

Publication
Proceedings of the ACM Symposium on Applied Computing

Abstract
This paper presents a system for induction of forest of functional trees from data streams able to detect concept drift. The Ultra Fast Forest of Trees (UFFT) is an incremental algorithm, that works online, processing each example in constant time, and performing a single scan over the training examples. It uses analytical techniques to choose the splitting criteria, and the information gain to estimate the merit of each possible splitting-test. For multi-class problems the algorithm grows a binary tree for each possible pair of classes, leading to a forest of trees. Decision nodes and leaves contain naive-Bayes classifiers playing different roles during the induction process. Naive-Bayes in leaves are used to classify test examples, naive-Bayes in inner nodes can be used as multivariate splitting-tests if chosen by the splitting criteria, and used to detect drift in the distribution of the examples that traverse the node. When a drift is detected, all the sub-tree rooted at that node will be pruned. The use of naive-Bayes classifiers at leaves to classify test examples, the use of splitting-tests based on the outcome of naive-Bayes, and the use of naive-Bayes classifiers at decision nodes to detect drift are directly obtained from the sufficient statistics required to compute the splitting criteria, without no additional computations. This aspect is a main advantage in the context of high-speed data streams. This methodology was tested with artificial and real-world data sets. The experimental results show a very good performance in comparison to a batch decision tree learner, and high capacity to detect and react to drift. Copyright 2005 ACM.

CloseRead Abstract

2005

A study on Error Correcting Output Codes

Authors
Pimenta, E; Gama, J;

Publication
2005 Portuguese Conference on Artificial Intelligence, Proceedings

Abstract
Recent work points towards advantages in decomposing multi-class decision problems into multiple binary problems. There are several strategies for this decomposition. The most used and studied are All-vs-All, One-vs-All and the Error correction output codes (Ecocs). Ecocs appeared in the scope of telecommunications thanks to the capacity to correct transmission errors. This capacity is due to introducing redundancy when codifying messages. Ecocs are binary words and can be adapted to be used in classifications problems. They must, however, respect some specific constraints. The binary words must be further apart as much as possible. Equal or complementary columns cannot exist and no column can be constant (either 1 or 0). Given two ecocs satisfying these constrains, which one is more appropriate for classification purposes? In this work we suggest a function for evaluating the quality of Ecocs. This function is used to guide the search in the persecution algorithm, a new method to generate Ecocs for classifications purposes. The binary words that form the Ecocs can have several dimensions for the same number of classes that it intends to represent. The growth of these possible dimensions is exponential with the number of classes of the multi-class problem. In this paper we present a method to choose the dimension of the Ecoc that assure a good tradeoff between redundancy and error correction capacity. The method is evaluated in a set of benchmark classification problems. Experimental results are competitive against standard decomposition methods.

CloseRead Abstract

2005

Partition incremental discretization

Authors
Pinto, C; Gama, J;

Publication
2005 Portuguese Conference on Artificial Intelligence, Proceedings

Abstract
In this paper we propose a new method to perform incremental discretization. This approach consists in splitting the task in two layers. The first layer receives the sequence of input data and stores statistics of this data, using a higher number of intervals than what is usually required. The final discretization is generated by the second layer, based on the statistics stored by the previous layer. The proposed architecture processes streaming examples in a single scan, in constant time and space even for infinite sequences of examples. We demonstrate with examples that incremental discretization achieves better results than batch discretization, maintaining the performance of learning algorithms. The proposed method is much more appropriate to evaluate incremental algorithms, and in problems where data flows continuously as most of recent data mining applications.

CloseRead Abstract

2005

EKDB&W'05: Workshop on extraction of knowledge from databases and warehouses

Authors
Gama, J; Pires, JM; Cardoso, M; Marques, NC; Cavique, L;

Publication
2005 Portuguese Conference on Artificial Intelligence, Proceedings

Abstract

2005

Bias management of Bayesian network classifiers

Authors
Castillo, G; Gama, J;

Publication
DISCOVERY SCIENCE, PROCEEDINGS

Abstract
The purpose of this paper is to describe an adaptive algorithm for improving the performance of Bayesian Network Classifiers (BNCs) in an on-line learning framework. Instead of choosing a priori a particular model class of BNCs, our adaptive algorithm scales up the model's complexity by gradually increasing the number of allowable dependencies among features, Starting with the simple Naive Bayes structure, it uses simple decision rules based on qualitative information about the performance's dynamics to decide when it makes sense to do the next move in the spectrum of feature dependencies and to start searching for a more complex classifier. Results in conducted experiments using the class of Dependence Bayesian Classifiers on three large datasets show that our algorithm is able to select a model with the appropriate complexity for the current amount of training data, thus balancing the computational cost of updating a model with the benefits of increasing in accuracy.

CloseRead Abstract