Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por LIAAD

2014

An Empirical Methodology to Analyze the Behavior of Bagging

Autores
Pinto, F; Soares, C; Mendes Moreira, J;

Publicação
ADVANCED DATA MINING AND APPLICATIONS, ADMA 2014

Abstract
In this paper we propose and apply a methodology to study the relationship between the performance of bagging and the characteristics of the bootstrap samples. The methodology consists of 1) an extensive set of experiments to estimate the empirical distribution of performance of the population of all possible ensembles that can be created with those bootstraps and 2) a metalearning approach to analyze that distribution based on characteristics of the bootstrap samples and their relationship with the complete training set. Given the large size of the population of all ensembles, we empirically show that it is possible to apply the methodology to a sample. We applied the methodology to 53 classification datasets for ensembles of 20 and 100 models. Our results show that diversity is crucial for an important bootstrap and we show evidence of a metric that can measure diversity without any learning process involved. We also found evidence that the best bootstraps have a predictive power very similar to the one presented by the training set using naive models.

2014

Simulation of the ensemble generation process: The divergence between data and model similarity

Autores
Pinto, F; Mendes Moreira, J; Soares, C; Rossetti, RJF;

Publicação
Modelling and Simulation 2014 - European Simulation and Modelling Conference, ESM 2014

Abstract
In this paper we present a Netlogo simulation model for a Data Mining methodological process: ensemble classifier generation. The model allows to study the trade-off between data characteristics and diversity, a key concept in Ensemble Learning. We studied the re™ search hypothesis that data characteristics should also be taken into account while generating ensemble classifier models. The results of our experiments indicate that diversity is in fact a key concept in Ensemble Learning but regarding our research hypothesis, the findings axe inconclusive.

2014

TweeProfiles: Detection of Spatio-temporal Patterns on Twitter

Autores
Cunha, T; Soares, C; Rodrigues, EM;

Publicação
ADVANCED DATA MINING AND APPLICATIONS, ADMA 2014

Abstract
Online social networks present themselves as valuable information sources about their users and their respective behaviours and interests. Many researchers in data mining have analysed these types of data, aiming to find interesting patterns. This paper addresses the problem of identifying and displaying tweet profiles by analysing multiple types of data: spatial, temporal, social and content. The data mining process that extracts the patterns is composed by the manipulation of the dissimilarity matrices for each type of data, which are fed to a clustering algorithm to obtain the desired patterns. This paper studies appropriate distance functions for the different types of data, the normalization and combination methods available for different dimensions and the existing clustering algorithms. The visualization platform is designed for a dynamic and intuitive usage, aimed at revealing the extracted profiles in an understandable and interactive manner. In order to accomplish this, various visualization patterns were studied and widgets were chosen to better represent the information. The use of the project is illustrated with data from the Portuguese twittosphere.

2014

A framework to decompose and develop metafeatures

Autores
Pinto, F; Soares, C; Mendes Moreira, J;

Publicação
CEUR Workshop Proceedings

Abstract
This paper proposes a framework to decompose and develop metafeatures for Metalearning (MtL) problems. Several metafeatures (also known as data characteristics) are proposed in the literature for a wide range of problems. Since MtL applicability is very general but problem dependent, researchers focus on generating specific and yet informative metafeatures for each problem. This process is carried without any sort of conceptual framework. We believe that such framework would open new horizons on the development of metafeatures and also aid the process of understanding the metafeatures already proposed in the state-of-the-art. We propose a framework with the aim of fill that gap and we show its applicability in a scenario of algorithm recommendation for regression problems.

2014

Proceedings of the International Workshop on Meta-learning and Algorithm Selection co-located with 21st European Conference on Artificial Intelligence, MetaSel@ECAI 2014, Prague, Czech Republic, August 19, 2014

Autores
Vanschoren, J; Brazdil, P; Soares, C; Kotthoff, L;

Publicação
MetaSel@ECAI

Abstract

2014

Analysing Collaborative Filtering algorithms in a multi-agent environment

Autores
Cunha, T; Rossetti, RJF; Soares, C;

Publicação
Modelling and Simulation 2014 - European Simulation and Modelling Conference, ESM 2014

Abstract
The huge amount of online information deprives the user to keep up with his/hers interests and preferences, Recommender Systems appeared to solve this problem, by employing social behavioural paradigms in order to recommend potentially interesting items to users, Among the several kinds of Recommender Systems, one of the most mature and most used in real world applications are known as Collaborative Filtering. These methods recommend items based on the preferences of similar-users, using only a user-item rating matrix. In this pa™ per we explain a methodology to use Multi™Agent based simulation to study the evolution of the data rating matrix and its effect on the performance of several Collaborative Filtering algorithms. Our results show that the best performing methods are user-based and item-based Collaborative Filtering and that the average algorithm performance is surprisingly constant for different rating schemes.

  • 328
  • 503