Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by LIAAD

2000

Algoritmos para la clasificación piramidal simbólica

Authors
Rodríguez, O; Brito, MP; Diday, E;

Publication
Revista de Matemática: Teoría y Aplicaciones

Abstract

2000

Zoomed anking: Selection of Classification Algorithms Based on Relevant Performance Information

Authors
Soares, C; Brazdil, PB;

Publication
LECTURE NOTES IN COMPUTER SCIENCE <D>

Abstract
Given the wide variety of available classification algorithms and the volume of data today's organizations need to analyze, the selection of the right algorithm to use on a new problem is an important issue. In this paper we present a combination of techniques to address this problem. The first one, zooming, analyzes a given dataset and selects relevant (similar) datasets that were processed by the candidate algoritms in the past. This process is based on the concept of distance, calculated on the basis of several dataset characteristics. The information about the performance of the candidate algorithms on the selected datasets is then processed by a second technique, a ranking method. Such a method uses performance information to generate advice in the form of a ranking, indicating which algorithms should be applied in which order. Here we propose the adjusted ratio of ratios ranking method. This method takes into account not only accuracy but also the time performance of the candidate algorithms. The generalization power of this ranking method is analyzed. For this purpose, an appropriate methodology is defined. The experimental results indicate that on average better results are obtained with zooming than without it.

2000

A comparison of ranking methods for classification algorithm selection

Authors
Brazdil, PB; Soares, C;

Publication
MACHINE LEARNING: ECML 2000

Abstract
We investigate the problem of using past performance information to select an algorithm for a given classification problem. We present three ranking methods for that purpose: average ranks, success rate ratios and significant wins. We also analyze the problem of evaluating and comparing these methods. The evaluation technique used is based on a leave-one-out procedure. On each iteration, the method generates a ranking using the results obtained by the algorithms on the training datasets. This ranking is then evaluated by calculating its distance from the ideal ranking built using the performance information on the test dataset. The distance measure adopted here, average correlation, is based on Spearman's rank correlation coefficient. To compare ranking methods, a combination of Friedman's test and Dunn's multiple comparison procedure is adopted. When applied to the methods presented here, these tests indicate that the success rate ratios and average ranks methods perform better than significant wins.

2000

Measures to evaluate rankings of classification algorithms

Authors
Soares, C; Brazdil, P; Costa, J;

Publication
DATA ANALYSIS, CLASSIFICATION, AND RELATED METHODS

Abstract
Due to the wide variety of algorithms for supervised classification originating from several research areas, selecting one of them to apply on a given problem is not a trivial task. Recently several methods have been developed to create rankings of classification algorithms based on their previous performance. Therefore, it is necessary to develop techniques to evaluate and compare those methods. We present three measures to evaluate rankings of classification algorithms, give examples of their use and discuss their characteristics.

2000

Cascade generalization

Authors
Gama, J; Brazdil, P;

Publication
MACHINE LEARNING

Abstract
Using multiple classifiers for increasing learning accuracy is an active research area. In this paper we present two related methods for merging classifiers. The first method, Cascade Generalization, couples classifiers loosely. It belongs to the family of stacking algorithms. The basic idea of Cascade Generalization is to use sequentially the set of classifiers, at each step performing an extension of the original data by the insertion of new attributes. The new attributes are derived from the probability class distribution given by a base classifier. This constructive step extends the representational language for the high level classifiers, relaxing their bias. The second method exploits tight coupling of classifiers, by applying Cascade Generalization locally. At each iteration of a divide and conquer algorithm, a reconstruction of the instance space occurs by the addition of new attributes. Each new attribute represents the probability that an example belongs to a class given by a base classifier. We have implemented three Local Generalization Algorithms. The first merges a linear discriminant with a decision tree, the second merges a naive Bayes with a decision tree, and the third merges a linear discriminant and a naive Bayes with a decision tree. All the algorithms show an increase of performance, when compared with the corresponding single models. Cascade also outperforms other methods for combining classifiers, like Stacked Generalization, and competes well against Boosting at statistically significant confidence levels.

2000

Iterative Bayes

Authors
Gama, J;

Publication
Intelligent Data Analysis

Abstract
Naive Bayes is a well known and studied algorithm both in statistics and machine learning. Bayesian learning algorithms represent each concept with a single probabilistic summary. In this paper we present an iterative approach to naive Bayes. The iterative Bayes begins with the distribution tables built by the naive Bayes. Those tables are iteratively updated in order to improve the probability class distribution associated with each training example. Experimental evaluation of Iterative Bayes on 27 benchmark datasets shows consistent gains in accuracy. Moreover, the update schema can take costs into account turning the algorithm cost sensitive. Unlike stratification, it is applicable to any number of classes and to arbitrary cost matrices. An interesting side effect of our algorithm is that it shows to be robust to attribute dependencies.

  • 503
  • 514