Publicacoes - INESC TEC

Publicações

Publicações por LIAAD

2003

Hierarchical and Pyramidal Clustering for Symbolic Data

Autores
Brito, P;

Publicação
Journal of the Japanese Society of Computational Statistics

Abstract

2003

Mining official data

Autores
Brito, P; Malerba, D;

Publicação
Intelligent Data Analysis

Abstract

2003

Symbolic clustering of constrained probabilistic data

Autores
Brito, P; de Carvalho, FAT;

Publicação
EXPLORATORY DATA ANALYSIS IN EMPIRICAL RESEARCH, PROCEEDINGS

Abstract
In previous work (Brito and De Carvalho (1999)) we have considered the presence of dependence rules between variables in the framework of a symbolic clustering method. In another paper Brito (1998) has addressed the problem of clustering probabilistic data. The aim of this paper is to bring together the two issues, that is, to take into account dependence rules on probabilistic data. This is accomplished by introducing new generality measures with an appropriate generalization operator. This approach allows for the extension of a symbolic clustering. method to constrained probabilistic data.

FecharLer Abstract

2003

Ranking learning algorithms: Using IBL and meta-learning on accuracy and time results

Autores
Brazdil, PB; Soares, C; Da Costa, JP;

Publicação
MACHINE LEARNING

Abstract
We present a meta-learning method to support selection of candidate learning algorithms. It uses a k-Nearest Neighbor algorithm to identify the datasets that are most similar to the one at hand. The distance between datasets is assessed using a relatively small set of data characteristics, which was selected to represent properties that affect algorithm performance. The performance of the candidate algorithms on those datasets is used to generate a recommendation to the user in the form of a ranking. The performance is assessed using a multicriteria evaluation measure that takes not only accuracy, but also time into account. As it is not common in Machine Learning to work with rankings, we had to identify and adapt existing statistical techniques to devise an appropriate evaluation methodology. Using that methodology, we show that the meta-learning method presented leads to significantly better rankings than the baseline ranking method. The evaluation methodology is general and can be adapted to other ranking problems. Although here we have concentrated on ranking classification algorithms, the meta-learning framework presented can provide assistance in the selection of combinations of methods or more complex problem solving strategies.

FecharLer Abstract

2003

Is the UCI repository useful for data mining?

Autores
Soares, C;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE

Abstract
We propose a methodology to investigate the relevance for the real world of repositories of benchmark problems like the one commonly known as the UCI repository. It compares the distribution of relative performance of algorithms in data sets from a given repository and from the "real world". If the distributions are different, the knowledge about the relative performance of algorithms obtained from the repository in question is mostly useless. In the case of the UCI repository, this would mean that a significant proportion of published results would be of little practical use. However, this is not what our results indicate. We also propose an adaptation of this method to test whether tool developers are "overfitting" repositories, which also yields negative results in the UCI repository.

FecharLer Abstract

2003

Iterative Bayes

Autores
Gama, J;

Publicação
THEORETICAL COMPUTER SCIENCE

Abstract
Naive Bayes is a well-known and studied algorithm both in statistics and machine learning. Bayesian learning algorithms represent each concept with a single probabilistic summary. In this paper we present an iterative approach to naive Bayes. The Iterative Bayes begins with the distribution tables built by the naive Bayes. Those tables are iteratively updated in order to improve the probability class distribution associated with each training example. In this paper we argue that Iterative Bayes minimizes a quadratic loss function instead of the 0-1 loss function that usually applies, to classification problems. Experimental evaluation of Iterative Bayes on 27 benchmark data sets shows consistent gains in accuracy. An interesting side effect of our algorithm is that it shows to be robust to attribute dependencies.

FecharLer Abstract