Publicacoes - INESC TEC

Publicações

Publicações por LIAAD

2016

Entropy-based discretization methods for ranking data

Autores
de Sá, CR; Soares, C; Knobbe, A;

Publicação
INFORMATION SCIENCES

Abstract
Label Ranking (LR) problems are becoming increasingly important in Machine Learning. While there has been a significant amount of work on the development of learning algorithms for LR in recent years, there are not many pre-processing methods for LR Some methods, like Naive Bayes for LR and APRIORI-LR, cannot handle real-valued data directly. Conventional discretization methods used in classification are not suitable for LR problems, due to the different target variable. In this work, we make an extensive analysis of the existing methods using simple approaches. We also propose a new method called EDiRa (Entropy-based Discretization for Ranking) for the discretization of ranking data. We illustrate the advantages of the method using synthetic data and also on several benchmark datasets. The results clearly indicate that the discretization is performing as expected and also improves the results and efficiency of the learning algorithms.

FecharLer Abstract

2016

Exceptional Preferences Mining

Autores
de Sá, CR; Duivesteijn, W; Soares, C; Knobbe, A;

Publicação
DISCOVERY SCIENCE, (DS 2016)

Abstract
Exceptional Preferences Mining (EPM) is a crossover between two subfields of datamining: local pattern mining and preference learning. EPM can be seen as a local pattern mining task that finds subsets of observations where the preference relations between subsets of the labels significantly deviate from the norm; a variant of Subgroup Discovery, with rankings as the (complex) target concept. We employ three quality measures that highlight subgroups featuring exceptional preferences, where the focus of what constitutes 'exceptional' varies with the quality measure: the first gauges exceptional overall ranking behavior, the second indicates whether a particular label stands out from the rest, and the third highlights subgroups featuring unusual pairwise label ranking behavior. As proof of concept, we explore five datasets. The results confirm that the new task EPM can deliver interesting knowledge. The results also illustrate how the visualization of the preferences in a Preference Matrix can aid in interpreting exceptional preference subgroups.

FecharLer Abstract

2016

Selecting Collaborative Filtering Algorithms Using Metalearning

Autores
Cunha, T; Soares, C; de Carvalho, ACPLF;

Publicação
ECML/PKDD (2)

Abstract
Recommender Systems are an important tool in e-business, for both companies and customers. Several algorithms are available to developers, however, there is little guidance concerning which is the best algorithm for a specific recommendation problem. In this study, a metalearning approach is proposed to address this issue. It consists of relating the characteristics of problems (metafeatures) to the performance of recommendation algorithms. We propose a set of metafeatures based on the application of systematic procedure to develop metafeatures and by extending and generalizing the state of the art metafeatures for recommender systems. The approach is tested on a set of Matrix Factorization algorithms and a collection of real-world Collaborative Filtering datasets. The performance of these algorithms in these datasets is evaluated using several standard metrics. The algorithm selection problem is formulated as classification tasks, where the target attribute is the best Matrix Factorization algorithm, according to each metric. The results show that the approach is viable and that the metafeatures used contain information that is useful to predict the best algorithm for a dataset.

FecharLer Abstract

2016

AToMRS: A Tool to Monitor Recommender Systems

Autores
Costa, A; Cunha, T; Soares, C;

Publicação
KDIR: PROCEEDINGS OF THE 8TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT - VOL. 1

Abstract
Recommender systems arose in response to the excess of available online information. These systems assign, to a given individual, suggestions of items that may be relevant. These system's monitoring and evaluation are fundamental to the proper functioning of many business related services. It is the goal of this paper to create a tool capable of collecting, aggregating and supervising the results obtained from the recommendation systems' evaluation. To achieve this goal, a multi-granularity approach is developed and implemented in order to organize the different levels of the problem. This tool also aims to tackle the lack of mechanisms to enable visually assessment of the performance of a recommender systems' algorithm. A functional prototype of the application is presented, with the purpose of validating the solution's concept.

FecharLer Abstract

2016

Learning from the News: Predicting Entity Popularity on Twitter

Autores
Saleiro, P; Soares, C;

Publicação
ADVANCES IN INTELLIGENT DATA ANALYSIS XV

Abstract
In this work, we tackle the problem of predicting entity popularity on Twitter based on the news cycle. We apply a supervised learning approach and extract four types of features: (i) signal, (ii) textual, (iii) sentiment and (iv) semantic, which we use to predict whether the popularity of a given entity will be high or low in the following hours. We run several experiments on six different entities in a dataset of over 150M tweets and 5M news and obtained F1 scores over 0.70. Error analysis indicates that news perform better on predicting entity popularity on Twitter when they are the primary information source of the event, in opposition to events such as live TV broadcasts, political debates or football matches.

FecharLer Abstract

2016

Active learning and data manipulation techniques for generating training examples in meta-learning

Autores
Sousa, AFM; Prudêncio, RBC; Ludermir, TB; Soares, C;

Publicação
NEUROCOMPUTING

Abstract
Algorithm selection is an important task in different domains of knowledge. Meta-learning treats this task by adopting a supervised learning strategy. Training examples in meta-learning (called meta examples) are generated from experiments performed with a pool of candidate algorithms in a number of problems, usually collected from data repositories or synthetically generated. A meta-learner is then applied to acquire knowledge relating features of the problems and the best algorithms in terms of performance. In this paper, we address an important aspect in meta-learning which is to produce a significant number of relevant meta-examples. Generating a high quality set of meta-examples can be difficult due to the low availability of real datasets in some domains and the high computational cost of labelling the meta-examples. In the current work, we focus on the generation of meta-examples for meta-learning by combining: (1) a promising approach to generate new datasets (called datasetoids) by manipulating existing ones; and (2) active learning methods to select the most relevant datasets previously generated. The datasetoids approach is adopted to augment the number of useful problem instances for meta-example construction. However not all generated problems are equally relevant. Active meta-learning then arises to select only the most informative instances to be labelled. Experiments were performed in different scenarios, algorithms for meta-learning and strategies to select datasets. Our experiments revealed that it is possible to reduce the computational cost of generating meta-examples, while maintaining a good meta-learning performance.

FecharLer Abstract