2000
Authors
Brazdil, PB; Soares, C;
Publication
MACHINE LEARNING: ECML 2000
Abstract
We investigate the problem of using past performance information to select an algorithm for a given classification problem. We present three ranking methods for that purpose: average ranks, success rate ratios and significant wins. We also analyze the problem of evaluating and comparing these methods. The evaluation technique used is based on a leave-one-out procedure. On each iteration, the method generates a ranking using the results obtained by the algorithms on the training datasets. This ranking is then evaluated by calculating its distance from the ideal ranking built using the performance information on the test dataset. The distance measure adopted here, average correlation, is based on Spearman's rank correlation coefficient. To compare ranking methods, a combination of Friedman's test and Dunn's multiple comparison procedure is adopted. When applied to the methods presented here, these tests indicate that the success rate ratios and average ranks methods perform better than significant wins.
2000
Authors
Soares, C; Brazdil, P; Costa, J;
Publication
DATA ANALYSIS, CLASSIFICATION, AND RELATED METHODS
Abstract
Due to the wide variety of algorithms for supervised classification originating from several research areas, selecting one of them to apply on a given problem is not a trivial task. Recently several methods have been developed to create rankings of classification algorithms based on their previous performance. Therefore, it is necessary to develop techniques to evaluate and compare those methods. We present three measures to evaluate rankings of classification algorithms, give examples of their use and discuss their characteristics.
2000
Authors
Gama, J; Brazdil, P;
Publication
MACHINE LEARNING
Abstract
Using multiple classifiers for increasing learning accuracy is an active research area. In this paper we present two related methods for merging classifiers. The first method, Cascade Generalization, couples classifiers loosely. It belongs to the family of stacking algorithms. The basic idea of Cascade Generalization is to use sequentially the set of classifiers, at each step performing an extension of the original data by the insertion of new attributes. The new attributes are derived from the probability class distribution given by a base classifier. This constructive step extends the representational language for the high level classifiers, relaxing their bias. The second method exploits tight coupling of classifiers, by applying Cascade Generalization locally. At each iteration of a divide and conquer algorithm, a reconstruction of the instance space occurs by the addition of new attributes. Each new attribute represents the probability that an example belongs to a class given by a base classifier. We have implemented three Local Generalization Algorithms. The first merges a linear discriminant with a decision tree, the second merges a naive Bayes with a decision tree, and the third merges a linear discriminant and a naive Bayes with a decision tree. All the algorithms show an increase of performance, when compared with the corresponding single models. Cascade also outperforms other methods for combining classifiers, like Stacked Generalization, and competes well against Boosting at statistically significant confidence levels.
2000
Authors
Gama, J;
Publication
Intelligent Data Analysis
Abstract
Naive Bayes is a well known and studied algorithm both in statistics and machine learning. Bayesian learning algorithms represent each concept with a single probabilistic summary. In this paper we present an iterative approach to naive Bayes. The iterative Bayes begins with the distribution tables built by the naive Bayes. Those tables are iteratively updated in order to improve the probability class distribution associated with each training example. Experimental evaluation of Iterative Bayes on 27 benchmark datasets shows consistent gains in accuracy. Moreover, the update schema can take costs into account turning the algorithm cost sensitive. Unlike stratification, it is applicable to any number of classes and to arbitrary cost matrices. An interesting side effect of our algorithm is that it shows to be robust to attribute dependencies.
2000
Authors
Gama, J;
Publication
ADVANCES IN ARTIFICIAL INTELLIGENCE
Abstract
Naive Bayes is a well known and studied algorithm both in statistics and machine learning. Although its limitations with respect to expressive power, this procedure has a surprisingly good performance in a wide variety of domains, including many where there are clear dependencies between attributes. In this paper we address its main perceived limitation - its inability to deal with attribute dependencies. We present Linear Bayes that uses, for the continuous attributes, a multivariate normal distribution to compute the require probabilities. In this way, the interdependencies between the continuous attributes are considered. On the empirical evaluation, we compare Linear Bayes against a naive-Bayes that discretize continuous attributes, a naive-Bayes that assumes a univariate Gaussian for continuous attributes, and a standard Linear discriminant function. We show that Linear Bayes is a plausible algorithm, that competes quite well against other well established techniques.
2000
Authors
Gama, J;
Publication
AI Commun.
Abstract
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.