Publications

Publications by LIAAD

2004

Forest trees for on-line data

Authors
Gama, J; Medas, P; Rocha, R;

Publication
Proceedings of the ACM Symposium on Applied Computing

Abstract
This paper presents an hybrid adaptive system for induction of forest of trees from data streams. The Ultra Fast Forest Tree system (UFFT) is an incremental algorithm, with constant time for processing each example, works online, and uses the Hoeffding bound to decide when to install a splitting test in a leaf leading to a decision node. Our system has been designed for continuous data. It uses analytical techniques to choose the splitting criteria, and the information gain to estimate the merit of each possible splitting-test. The number of examples required to evaluate the splitting criteria is sound, based on the Hoeffding bound. For multiclass problems,the algorithm builds a binary tree for each possible pair of classes, leading to a forest of trees. During the training phase the algorithm maintains a short term memory. Given a data stream, a fixed number of the most recent examples are maintained in a data-structure that supports constant time insertion and deletion. When a test is installed, a leaf is transformed into a decision node with two descendant leaves. The sufficient statistics of these leaves are initialized with the examples in the short term memory that will fall at these leaves. We study the behavior of UFFT in different problems. The experimental results shows that UFFT is competitive against a batch decision tree learner in large and medium datasets.

CloseRead Abstract

2004

On data and algorithms: Understanding inductive performance

Authors
Kalousis, A; Gama, J; Hilario, M;

Publication
MACHINE LEARNING

Abstract
In this paper we address two symmetrical issues, the discovery of similarities among classification algorithms, and among datasets. Both on the basis of error measures, which we use to define the error correlation between two algorithms, and determine the relative performance of a list of algorithms. We use the first to discover similarities between learners, and both of them to discover similarities between datasets. The latter sketch maps on the dataset space. Regions within each map exhibit specific patterns of error correlation or relative performance. To acquire an understanding of the factors determining these regions we describe them using simple characteristics of the datasets. Descriptions of each region are given in terms of the distributions of dataset characteristics within it.

CloseRead Abstract

2004

Learning in Dynamic Environments: Decision Trees for Data Streams

Authors
Gama, J; Medas, P;

Publication
Pattern Recognition in Information Systems, Proceedings of the 4th International Workshop on Pattern Recognition in Information Systems, PRIS 2004, In conjunction with ICEIS 2004, Porto, Portugal, April 2004

Abstract

2004

Incremental learning and concept drift: Editor's introduction

Authors
Kubat, M; Gama, J; Utgoff, P;

Publication
Intelligent Data Analysis

Abstract

2004

On avoiding redundancy in inductive logic programming

Authors
Fonseca, N; Costa, VS; Silva, F; Camacho, R;

Publication
INDUCTIVE LOGIC PROGRAMMING, PROCEEDINGS

Abstract
ILP systems induce first-order clausal theories performing a search through very large hypotheses spaces containing redundant hypotheses. The generation of redundant hypotheses may prevent the systems from finding good models and increases the time to induce them. In this paper we propose a classification of hypotheses redundancy and show how expert knowledge can be provided to an ILP system to avoid it. Experimental results show that the number of hypotheses generated and execution time are reduced when expert knowledge is used to avoid redundancy.

CloseRead Abstract

2004

Introduction to the special issue on meta-learning

Authors
Giraud Carrier, C; Vilalta, R; Brazdil, P;

Publication
MACHINE LEARNING

Abstract