2011
Autores
Brito, P; Polaillon, G;
Publicação
CEUR Workshop Proceedings
Abstract
This work comes within the field of data analysis using Galois lattices. We consider ordinal, numerical single or interval data as well as data that consist on frequency/probability distributions on a finite set of categories. Data are represented and dealt with on a common framework, by defining a generalization operator that determines intents by intervals. In the case of distribution data, the obtained concepts are more homogeneous and more easily interpretable than those obtained by using the maximum and minimum operators previously proposed. The number of obtained concepts being often rather large, and to limit the influence of atypical elements, we propose to identify stable concepts using interval distances in a cross validation-like approach.
2011
Autores
Noirhomme Fraiture, M; Brito, P;
Publicação
Statistical Analysis and Data Mining
Abstract
This paper introduces symbolic data analysis, explaining how it extends the classical data models to take into account more complete and complex information. Several examples motivate the approach, before the modeling of variables assuming new types of realizations are formally presented. Some methods for the (multivariate) analysis of symbolic data are presented and discussed. This is however far from being exhaustive, given the present dynamic development of this new field of research. Copyright © 2011 Wiley Periodicals, Inc., A Wiley Company.
2011
Autores
Prudencio, RBC; Soares, C; Ludermir, TB;
Publicação
HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, PART I
Abstract
Several meta-learning approaches have been developed for the problem of algorithm selection. In this context, it is of central importance to collect a sufficient number of datasets to be used as meta-examples in order to provide reliable results. Recently, some proposals to generate datasets have addressed this issue with successful results. These proposals include datasetoids, which is a simple manipulation method to obtain new datasets from existing ones. However, the increase in the number of datasets raises another issue: in order to generate meta-examples for training, it is necessary to estimate the performance of the algorithms on the datasets. This typically requires running all candidate algorithms on all datasets, which is computationally very expensive. One approach to address this problem is the use of active learning, termed active meta-learning. In this paper we investigate the combined use of active meta-learning and datasetoids. Our results show that it is possible to significantly reduce the computational cost of generating meta-examples not only without loss of meta-learning accuracy but with potential gains.
2011
Autores
Kanda, J; Carvalho, ACPLFd; Hruschka, ER; Soares, C;
Publicação
Int. J. Hybrid Intell. Syst.
Abstract
2011
Autores
Prudencio, RBC; Soares, C; Ludermir, TB;
Publicação
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2011, PT II
Abstract
Several meta-learning approaches have been developed for the problem of algorithm selection. In this context, it is of central importance to collect a sufficient number of datasets to be used as meta-examples in order to provide reliable results. Recently, some proposals to generate datasets have addressed this issue with successful results. These proposals include datasetoids, which is a simple manipulation method to obtain new datasets from existing ones. However, the increase in the number of datasets raises another issue: in order to generate meta-examples for training, it is necessary to estimate the performance of the algorithms on the datasets. This typically requires running all candidate algorithms on all datasets, which is computationally very expensive. One approach to address this problem is the use of an active learning approach to meta-learning, termed active meta-learning. In this paper we investigate the combined use of an active meta-learning approach based on an uncertainty score and datasetoids. Based on our results, we conclude that the accuracy of our method is very good results with as little as 10% to 20% of the meta-examples labeled.
2011
Autores
Prudencio, RBC; Soares, C; Ludermir, TB;
Publicação
2011 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)
Abstract
Several meta-learning approaches have been developed for the problem of algorithm selection. In this context, it is of central importance to collect a sufficient number of datasets to be used as meta-examples in order to provide reliable results. Recently, some proposals to generate datasets have addressed this issue with successful results. These proposals include datasetoids, which is a simple manipulation method to obtain new datasets from existing ones. However, the increase in the number of datasets raises another issue: in order to generate meta-examples for training, it is necessary to estimate the performance of the algorithms on the datasets. This typically requires running all candidate algorithms on all datasets, which is computationally very expensive. In a recent paper, active meta-learning has been used to address this problem. An uncertainty sampling method for the k-NN algorithm using a least confidence score based on a distance measure was employed. Here we extend that work, namely by investigating three hypotheses: 1) is there advantage in using a frequency-based least confidence score over the distance-based score? 2) given that the meta-learning problem used has three classes, is it better to use a margin-based score? and 3) given that datasetoids are expected to contain some noise, are better results achieved by starting the search with all datasets already labeled? Some of the results obtained are unexpected and should be further analyzed. However, they confirm that active meta-learning can significantly reduce the computational cost of meta-learning with potential gains in accuracy.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.