Publications

Publications by LIAAD

2009

UCI plus plus : Improved Support for Algorithm Selection Using Datasetoids

Authors
Soares, C;

Publication
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS

Abstract
As companies employ a larger number of models, the problem of algorithm (and parameter) selection is becoming increasingly important. Two approaches to obtain empirical knowledge that is useful for that purpose are empirical studies and metalearning. However, most empirical (meta)knowledge is obtained from a, relatively small set, of datasets. In this paper, we propose a method to obtain a large number of datasets which is based on a simple transformation of existing datasets, referred to as datasetoids. We test our approach on the problem of using metalearning to predict when to prune decision trees. The results show significant; improvement when using datasetoids. Additionally, we identify a number of potential anomalies in the generated datasetoids and propose methods to solve them.

CloseRead Abstract

2009

Meta-Learning

Authors
Carrier, CGG; Brazdil, P; Soares, C; Vilalta, R;

Publication
Encyclopedia of Data Warehousing and Mining, Second Edition (4 Volumes)

Abstract

2009

Cognitive Technologies: Preface

Authors
Brazdil, P; Giraud Carrier, C; Soares, C; Vilalta, R;

Publication
Cognitive Technologies

Abstract

2009

Evaluating algorithms that learn from data streams

Authors
Gama, J; Rodrigues, PP; Sebastião, R;

Publication
Proceedings of the 2009 ACM Symposium on Applied Computing (SAC), Honolulu, Hawaii, USA, March 9-12, 2009

Abstract
Learning from data streams is a research area of increasing importance. Nowadays, several stream learning algorithms have been developed. Most of them learn decision models that continuously evolve over time, run in resource-aware environments, and detect and react to changes in the environment generating data. One important issue, not yet conveniently addressed, is the design of experimental work to evaluate and compare decision models that evolve over time. In this paper we propose a general framework for assessing the quality of streaming learning algorithms. We defend the use of Predictive Sequential error estimates over a sliding window to assess performance of learning algorithms that learn from open-ended data streams in non-stationary environments. This paper studies properties of convergence and methods to comparatively assess algorithms performance. Copyright 2009 ACM.

CloseRead Abstract

2009

Adaptive Bayesian network classifiers

Authors
Castillo, G; Gama, J;

Publication
INTELLIGENT DATA ANALYSIS

Abstract
This paper is concerned with adaptive learning algorithms for Bayesian network classifiers in a prequential (on-line) learning scenario. In this scenario, new data is available over time. An efficient supervised learning algorithm must be able to improve its predictive accuracy by incorporating the incoming data, while optimizing the cost of updating. However, if the process is not strictly stationary, the target concept could change over time. Hence, the predictive model should be adapted quickly to these changes. The main contribution of this work is a proposal of an unified, adaptive prequential framework for supervised learning called AdPreqFr4SL, which attempts to handle the cost-performance trade-off and deal with concept drift. Starting with the simple Naive Bayes, we scale up the complexity by gradually increasing the maximum number of allowable attribute dependencies, and then by searching for new dependences in the extended search space. Since updating the structure is a costly task, we use new data to primarily adapt the parameters. We adapt the structure only when is actually necessary. The method for handling concept drift is based on the Shewhart P-Chart. We experimentally prove the advantages of using the AdPreqFr4SL in comparison with its non-adaptive versions.

CloseRead Abstract

2009

Knowledge discovery from data streams Introduction

Authors
Gama, J; Ganguly, A; Omitaomu, O; Vatsavai, R; Gaber, M;

Publication
INTELLIGENT DATA ANALYSIS

Abstract