2014
Autores
Pinto, F; Soares, C; Mendes Moreira, J;
Publicação
ADVANCED DATA MINING AND APPLICATIONS, ADMA 2014
Abstract
In this paper we propose and apply a methodology to study the relationship between the performance of bagging and the characteristics of the bootstrap samples. The methodology consists of 1) an extensive set of experiments to estimate the empirical distribution of performance of the population of all possible ensembles that can be created with those bootstraps and 2) a metalearning approach to analyze that distribution based on characteristics of the bootstrap samples and their relationship with the complete training set. Given the large size of the population of all ensembles, we empirically show that it is possible to apply the methodology to a sample. We applied the methodology to 53 classification datasets for ensembles of 20 and 100 models. Our results show that diversity is crucial for an important bootstrap and we show evidence of a metric that can measure diversity without any learning process involved. We also found evidence that the best bootstraps have a predictive power very similar to the one presented by the training set using naive models.
2014
Autores
Pinto, F; Mendes Moreira, J; Soares, C; Rossetti, RJF;
Publicação
Modelling and Simulation 2014 - European Simulation and Modelling Conference, ESM 2014
Abstract
In this paper we present a Netlogo simulation model for a Data Mining methodological process: ensemble classifier generation. The model allows to study the trade-off between data characteristics and diversity, a key concept in Ensemble Learning. We studied the re™ search hypothesis that data characteristics should also be taken into account while generating ensemble classifier models. The results of our experiments indicate that diversity is in fact a key concept in Ensemble Learning but regarding our research hypothesis, the findings axe inconclusive.
2014
Autores
Cunha, T; Soares, C; Rodrigues, EM;
Publicação
ADVANCED DATA MINING AND APPLICATIONS, ADMA 2014
Abstract
Online social networks present themselves as valuable information sources about their users and their respective behaviours and interests. Many researchers in data mining have analysed these types of data, aiming to find interesting patterns. This paper addresses the problem of identifying and displaying tweet profiles by analysing multiple types of data: spatial, temporal, social and content. The data mining process that extracts the patterns is composed by the manipulation of the dissimilarity matrices for each type of data, which are fed to a clustering algorithm to obtain the desired patterns. This paper studies appropriate distance functions for the different types of data, the normalization and combination methods available for different dimensions and the existing clustering algorithms. The visualization platform is designed for a dynamic and intuitive usage, aimed at revealing the extracted profiles in an understandable and interactive manner. In order to accomplish this, various visualization patterns were studied and widgets were chosen to better represent the information. The use of the project is illustrated with data from the Portuguese twittosphere.
2014
Autores
Pinto, F; Soares, C; Mendes Moreira, J;
Publicação
CEUR Workshop Proceedings
Abstract
This paper proposes a framework to decompose and develop metafeatures for Metalearning (MtL) problems. Several metafeatures (also known as data characteristics) are proposed in the literature for a wide range of problems. Since MtL applicability is very general but problem dependent, researchers focus on generating specific and yet informative metafeatures for each problem. This process is carried without any sort of conceptual framework. We believe that such framework would open new horizons on the development of metafeatures and also aid the process of understanding the metafeatures already proposed in the state-of-the-art. We propose a framework with the aim of fill that gap and we show its applicability in a scenario of algorithm recommendation for regression problems.
2014
Autores
Vanschoren, J; Brazdil, P; Soares, C; Kotthoff, L;
Publicação
MetaSel@ECAI
Abstract
2014
Autores
Cunha, T; Rossetti, RJF; Soares, C;
Publicação
Modelling and Simulation 2014 - European Simulation and Modelling Conference, ESM 2014
Abstract
The huge amount of online information deprives the user to keep up with his/hers interests and preferences, Recommender Systems appeared to solve this problem, by employing social behavioural paradigms in order to recommend potentially interesting items to users, Among the several kinds of Recommender Systems, one of the most mature and most used in real world applications are known as Collaborative Filtering. These methods recommend items based on the preferences of similar-users, using only a user-item rating matrix. In this pa™ per we explain a methodology to use Multi™Agent based simulation to study the evolution of the data rating matrix and its effect on the performance of several Collaborative Filtering algorithms. Our results show that the best performing methods are user-based and item-based Collaborative Filtering and that the average algorithm performance is surprisingly constant for different rating schemes.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.