Publicacoes - INESC TEC

Publicações

Publicações por LIAAD

2018

CF4CF-META: Hybrid Collaborative Filtering Algorithm Selection Framework

Autores
Cunha, T; Soares, C; de Carvalho, ACPLF;

Publicação
Discovery Science - 21st International Conference, DS 2018, Limassol, Cyprus, October 29-31, 2018, Proceedings

Abstract
The algorithm selection problem refers to the ability to predict the best algorithms for a new problem. This task has been often addressed by Metalearning, which looks for a function able to map problem characteristics to the performance of a set of algorithms. In the context of Collaborative Filtering, a few studies have proposed and validated the merits of different types of problem characteristics for this problem (i.e. dataset-based approach): using systematic metafeatures and performance estimations obtained by subsampling landmarkers. More recently, the problem was tackled using Collaborative Filtering models in a novel framework named CF4CF. This framework leverages the performance estimations as ratings in order to select the best algorithms without using any data characteristics (i.e algorithm-based approach). Given the good results obtained independently using each approach, this paper starts with the hypothesis that the integration of both approaches in a unified algorithm selection framework can improve the predictive performance. Hence, this work introduces CF4CF-META, an hybrid framework which leverages both data and algorithm ratings within a modified Label Ranking model. Furthermore, it takes advantage of CF4CF’s internal mechanism to use samples of data at prediction time, which has proven to be effective. This work starts by explaining and formalizing state of the art Collaborative Filtering algorithm selection frameworks (Metalearning, CF4CF and CF4CF-META) and assess their performance via an empirical study. The results show CF4CF-META is able to consistently outperform all other frameworks with statistically significant differences in terms of meta-accuracy and requires fewer landmarkers to do so. © 2018, Springer Nature Switzerland AG.

FecharLer Abstract

2018

CF4CF: Recommending Collaborative Filtering algorithms using Collaborative Filtering

Autores
Cunha, T; Soares, C; de Carvalho, ACPLF;

Publicação
12TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS)

Abstract
As Collaborative Filtering becomes increasingly important in both academia and industry recommendation solutions, it also becomes imperative to study the algorithm selection task in this domain. This problem aims at inding automatic solutions which enable the selection of the best algorithms for a new problem, without performing full-ledged training and validation procedures. Existing work in this area includes several approaches using Metalearning, which relate the characteristics of the problem domain with the performance of the algorithms. This study explores an alternative approach to deal with this problem. Since, in essence, the algorithm selection problem is a recommendation problem, we investigate the use of Collaborative Filtering algorithms to select Collaborative Filtering algorithms. The proposed approach integrates subsampling landmarkers, a data characterization approach commonly used in Metalearning, with a Collaborative Filtering methodology, named CF4CF. The predictive performance obtained by CF4CF using benchmark recommendation datasets was similar or superior to that obtained with Metalearning.

FecharLer Abstract

2018

Machine Learning for Drugs Prescription

Autores
Silva, P; Rivolli, A; Rocha, P; Correia, F; Soares, C;

Publicação
Intelligent Data Engineering and Automated Learning - IDEAL 2018 - 19th International Conference, Madrid, Spain, November 21-23, 2018, Proceedings, Part I

Abstract
In a medical appointment, patient information, including past exams, is analyzed in order to define a diagnosis. This process is prone to errors, since there may be many possible diagnoses. This analysis is very dependent on the experience of the doctor. Even with the correct diagnosis, prescribing medicines can be a problem, because there are multiple drugs for each disease and some may not be used due to allergies or high cost. Therefore, it would be helpful, if the doctors were able to use a system that, for each diagnosis, provided a list of the most suitable medicines. Our approach is to support the physician in this process. Rather than trying to predict the medicine, we aim to, given the available information, predict the set of the most likely drugs. The prescription problem may be solved as a Multi-Label classification problem since, for each diagnosis, multiple drugs may be prescribed at the same time. Due to its complexity, some simplifications were performed for the problem to be treatable. So, multiple approaches were done with different assumptions. The data supplied was also complex, with important problems in its quality, that led to a strong investment in data preparation, in particular, feature engineering. Overall, the results in each scenario are good with performances almost twice the baseline, especially using Binary Relevance as transformation approach. © 2018, Springer Nature Switzerland AG.

FecharLer Abstract

2018

Bandit-Based Automated Machine Learning

Autores
Das Dores, SCN; Soares, C; Ruiz, D;

Publicação
2018 7TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS)

Abstract
Machine Learning (ML) has been successfully applied to a wide range of domains and applications. Since the number of ML applications is growing, there is a need for tools that boost the data scientist's productivity. Automated Machine Learning (AutoML) is the field of ML that aims to address these needs through the development of solutions which enable data science practitioners, experts and non-experts, to efficiently create fine-tuned predictive models with minimum intervention. In this paper, we present the application of the multi-armed bandit optimization algorithm Hyperband to address the AutoML problem of generating customized classification workflows, a combination of preprocessing methods and ML algorithms including hyperparameter optimization. Experimental results comparing the bandit-based approach against Auto ML Bayesian Optimization methods show that this new approach is superior to the state-of-the-art methods in the test evaluation and equivalent to them in a statistical analysis.

FecharLer Abstract

2018

Analysing the Footprint of Classifiers in Overlapped and Imbalanced Contexts

Autores
Mercier, M; Santos, MS; Abreu, PH; Soares, C; Soares, JP; Santos, J;

Publicação
Advances in Intelligent Data Analysis XVII - 17th International Symposium, IDA 2018, 's-Hertogenbosch, The Netherlands, October 24-26, 2018, Proceedings

Abstract
It is recognised that the imbalanced data problem is aggravated by other difficulty factors, such as class overlap. Over the years, several research works have focused on this problematic, although presenting two major hitches: the limitation of test domains and the lack of a formulation of the overlap degree, which makes results hard to generalise. This work studies the performance degradation of classifiers with distinct learning biases in overlap and imbalanced contexts, focusing on the characteristics of the test domains (shape, dimensionality and imbalance ratio) and on to what extent our proposed overlapping measure (degOver) is aligned with the performance results observed. Our results show that MLP and CART classifiers are the most robust to high levels of class overlap, even for complex domains, and that KNN and linear SVM are the most aligned with degOver. Furthermore, we found that the dimensionality of data also plays an important role in explaining performance results. © Springer Nature Switzerland AG 2018.

FecharLer Abstract

2018

Label Expansion for Multi-Label Classification

Autores
Rivolli, A; Soares, C; de Carvalho, ACPLF;

Publicação
2018 7TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS)

Abstract
In multi-label classification tasks, instances are simultaneously associated with multiple labels, representing different and, possibly, related concepts from a domain. One characteristic of these tasks is a high class-label imbalance. In order to obtain improved predictive models, several algorithms either have explored the label dependencies or have dealt with the problem of imbalanced labels. This work proposes a label expansion approach which combines both alternatives. For such, some labels are expanded with data from a related class label, making the labels more balanced and representative. Preliminary experiments show the effectiveness of this approach to improve the Binary Relevance strategy. Particularly, it reduced the number of labels that were never predicted in the test instances. Although the results are preliminary, they are potentially attractive, considering the scale and consistency of the improvement obtained, as well as the broad scope of the proposed approach.

FecharLer Abstract