2016
Autores
Pinto, F; Soares, C; Moreira, JM;
Publicação
Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2016, Riva del Garda, Italy, September 19-23, 2016, Proceedings, Part I
Abstract
Dynamic selection or combination (DSC) methods allow to select one or more classifiers from an ensemble according to the characteristics of a given test instance x. Most methods proposed for this purpose are based on the nearest neighbours algorithm: it is assumed that if a classifier performed well on a set of instances similar to x, it will also perform well on x. We address the problem of dynamically combining a pool of classifiers by combining two approaches: metalearning and multi-label classification. Taking into account that diversity is a fundamental concept in ensemble learning and the interdependencies between the classifiers cannot be ignored, we solve the multi-label classification problem by using a widely known technique: Classifier Chains (CC). Additionally, we extend a typical metalearning approach by combining metafeatures characterizing the interdependencies between the classifiers with the base-level features.We executed experiments on 42 classification datasets and compared our method with several state-of-the-art DSC techniques, including another metalearning approach. Results show that our method allows an improvement over the other metalearning approach and is very competitive with the other four DSC methods. © Springer International Publishing AG 2016.
2016
Autores
Rodrigues, T; Cunha, T; Ienco, D; Poncelet, P; Soares, C;
Publicação
NEW ADVANCES IN INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 1
Abstract
Social media is strongly present in people's everyday life and Twitter is one example that stands out. The data within these types of services can be analyzed in order to discover useful knowledge. One interesting approach is to use data mining techniques to perceive hidden behaviours and patterns. The primary focus of this paper is the identification of patterns of retweets and to understand how information spreads over time in Twitter. The aim of this work lies in the adaptation of the GetMove tool, that is capable of extracting spatio-temporal pattern trajectories, and TweeProfiles, that identifies tweet profiles regarding several dimensions: spatial, temporal, social and content. We hope that the more flexible clustering strategy from TweeProfiles will enhance the results extracted by GetMove. We study the application of said mechanism to one case study and developed a visualization tool to interpret the results.
2016
Autores
Maia, A; Cunha, T; Soares, C; Abreu, PH;
Publicação
NEW ADVANCES IN INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 1
Abstract
With the advent of social networking, a lot of user-specific, voluntarily provided data has been generated. Researchers and companies noticed the value that lied within those enormous amounts of data and developed algorithms and tools to extract patterns in order to act on them. TweeProfiles is an offline clustering tool that analyses tweets over multiple dimensions: spatial, temporal, content and social. This project was extended in TweeProfiles2 by enabling the processing of real-time data. In this work, we developed a visualization tool suitable for data streaming, using multiple widgets to better represent all the information. The usefulness of the developed tool for journalism was evaluated based on a usability test, which despite its reduced number of participants yielded good results.
2016
Autores
de Sa, CR; Soares, C; Knobbe, A;
Publicação
INFORMATION SCIENCES
Abstract
Label Ranking (LR) problems are becoming increasingly important in Machine Learning. While there has been a significant amount of work on the development of learning algorithms for LR in recent years, there are not many pre-processing methods for LR Some methods, like Naive Bayes for LR and APRIORI-LR, cannot handle real-valued data directly. Conventional discretization methods used in classification are not suitable for LR problems, due to the different target variable. In this work, we make an extensive analysis of the existing methods using simple approaches. We also propose a new method called EDiRa (Entropy-based Discretization for Ranking) for the discretization of ranking data. We illustrate the advantages of the method using synthetic data and also on several benchmark datasets. The results clearly indicate that the discretization is performing as expected and also improves the results and efficiency of the learning algorithms.
2016
Autores
de Sa, CR; Duivesteijn, W; Soares, C; Knobbe, A;
Publicação
DISCOVERY SCIENCE, (DS 2016)
Abstract
Exceptional Preferences Mining (EPM) is a crossover between two subfields of datamining: local pattern mining and preference learning. EPM can be seen as a local pattern mining task that finds subsets of observations where the preference relations between subsets of the labels significantly deviate from the norm; a variant of Subgroup Discovery, with rankings as the (complex) target concept. We employ three quality measures that highlight subgroups featuring exceptional preferences, where the focus of what constitutes 'exceptional' varies with the quality measure: the first gauges exceptional overall ranking behavior, the second indicates whether a particular label stands out from the rest, and the third highlights subgroups featuring unusual pairwise label ranking behavior. As proof of concept, we explore five datasets. The results confirm that the new task EPM can deliver interesting knowledge. The results also illustrate how the visualization of the preferences in a Preference Matrix can aid in interpreting exceptional preference subgroups.
2016
Autores
Cunha, T; Soares, C; Carvalho, ACPLFd;
Publicação
Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2016, Riva del Garda, Italy, September 19-23, 2016, Proceedings, Part II
Abstract
Recommender Systems are an important tool in e-business, for both companies and customers. Several algorithms are available to developers, however, there is little guidance concerning which is the best algorithm for a specific recommendation problem. In this study, a metalearning approach is proposed to address this issue. It consists of relating the characteristics of problems (metafeatures) to the performance of recommendation algorithms. We propose a set of metafeatures based on the application of systematic procedure to develop metafeatures and by extending and generalizing the state of the art metafeatures for recommender systems. The approach is tested on a set of Matrix Factorization algorithms and a collection of real-world Collaborative Filtering datasets. The performance of these algorithms in these datasets is evaluated using several standard metrics. The algorithm selection problem is formulated as classification tasks, where the target attribute is the best Matrix Factorization algorithm, according to each metric. The results show that the approach is viable and that the metafeatures used contain information that is useful to predict the best algorithm for a dataset. © Springer International Publishing AG 2016.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.