Publications

Publications by LIAAD

2005

A weighted rank measure of correlation

Authors
Da Costa, JP; Soares, C;

Publication
AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS

Abstract
Spearman's rank correlation coefficient is not entirely suitable for measuring the correlation between two rankings in some applications because it treats all ranks equally. In 2000, Blest proposed an alternative measure of correlation that gives more importance to higher ranks but has some drawbacks. This paper proposes a weighted rank measure of correlation that weights the distance between two ranks using a linear function of those ranks, giving more importance to higher ranks than lower ones. It analyses its distribution and provides a table of critical values to test whether a given value of the coefficient is significantly different from zero. The paper also summarizes a number of applications for which the new measure is more suitable than Spearman's.

CloseRead Abstract

2005

An adaptive predictive model for student modeling

Authors
Castillo, G; Gama, J; Breda, AM;

Publication
Advances in Web-Based Education: Personalized Learning Environments

Abstract
This chapter presents an adaptive predictive model for a student modeling prediction task in the context of an adaptive educational hypermedia system (AEHS). The task, that consists in determining what kind of learning resources are more appropriate to a particular learning style, presents two issues that are critical. The first is related to the uncertainty of the information about the student's learning style acquired by psychometric instruments. The second is related to the changes over time of the student's preferences (concept drift). To approach this task, we propose a probabilistic adaptive predictive model that includes a method to handle concept drift based on statistical quality control. We claim that our approach is able to adapt quickly to changes in the student's preferences and that it should be successfully used in similar user modeling prediction tasks, where uncertainty and concept drift are presented. © 2006, Idea Group Inc.

CloseRead Abstract

2005

Learning decision trees from dynamic data streams

Authors
Gama, J; Medas, P; Rodrigues, P;

Publication
Proceedings of the ACM Symposium on Applied Computing

Abstract
This paper presents a system for induction of forest of functional trees from data streams able to detect concept drift. The Ultra Fast Forest of Trees (UFFT) is an incremental algorithm, that works online, processing each example in constant time, and performing a single scan over the training examples. It uses analytical techniques to choose the splitting criteria, and the information gain to estimate the merit of each possible splitting-test. For multi-class problems the algorithm grows a binary tree for each possible pair of classes, leading to a forest of trees. Decision nodes and leaves contain naive-Bayes classifiers playing different roles during the induction process. Naive-Bayes in leaves are used to classify test examples, naive-Bayes in inner nodes can be used as multivariate splitting-tests if chosen by the splitting criteria, and used to detect drift in the distribution of the examples that traverse the node. When a drift is detected, all the sub-tree rooted at that node will be pruned. The use of naive-Bayes classifiers at leaves to classify test examples, the use of splitting-tests based on the outcome of naive-Bayes, and the use of naive-Bayes classifiers at decision nodes to detect drift are directly obtained from the sufficient statistics required to compute the splitting criteria, without no additional computations. This aspect is a main advantage in the context of high-speed data streams. This methodology was tested with artificial and real-world data sets. The experimental results show a very good performance in comparison to a batch decision tree learner, and high capacity to detect and react to drift. Copyright 2005 ACM.

CloseRead Abstract

2005

A study on Error Correcting Output Codes

Authors
Pimenta, E; Gama, J;

Publication
2005 Portuguese Conference on Artificial Intelligence, Proceedings

Abstract
Recent work points towards advantages in decomposing multi-class decision problems into multiple binary problems. There are several strategies for this decomposition. The most used and studied are All-vs-All, One-vs-All and the Error correction output codes (Ecocs). Ecocs appeared in the scope of telecommunications thanks to the capacity to correct transmission errors. This capacity is due to introducing redundancy when codifying messages. Ecocs are binary words and can be adapted to be used in classifications problems. They must, however, respect some specific constraints. The binary words must be further apart as much as possible. Equal or complementary columns cannot exist and no column can be constant (either 1 or 0). Given two ecocs satisfying these constrains, which one is more appropriate for classification purposes? In this work we suggest a function for evaluating the quality of Ecocs. This function is used to guide the search in the persecution algorithm, a new method to generate Ecocs for classifications purposes. The binary words that form the Ecocs can have several dimensions for the same number of classes that it intends to represent. The growth of these possible dimensions is exponential with the number of classes of the multi-class problem. In this paper we present a method to choose the dimension of the Ecoc that assure a good tradeoff between redundancy and error correction capacity. The method is evaluated in a set of benchmark classification problems. Experimental results are competitive against standard decomposition methods.

CloseRead Abstract

2005

Partition incremental discretization

Authors
Pinto, C; Gama, J;

Publication
2005 Portuguese Conference on Artificial Intelligence, Proceedings

Abstract
In this paper we propose a new method to perform incremental discretization. This approach consists in splitting the task in two layers. The first layer receives the sequence of input data and stores statistics of this data, using a higher number of intervals than what is usually required. The final discretization is generated by the second layer, based on the statistics stored by the previous layer. The proposed architecture processes streaming examples in a single scan, in constant time and space even for infinite sequences of examples. We demonstrate with examples that incremental discretization achieves better results than batch discretization, maintaining the performance of learning algorithms. The proposed method is much more appropriate to evaluate incremental algorithms, and in problems where data flows continuously as most of recent data mining applications.

CloseRead Abstract

2005

EKDB&W'05: Workshop on extraction of knowledge from databases and warehouses

Authors
Gama, J; Pires, JM; Cardoso, M; Marques, NC; Cavique, L;

Publication
2005 Portuguese Conference on Artificial Intelligence, Proceedings

Abstract