Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by LIAAD

2018

SMOTEBoost for Regression: Improving the Prediction of Extreme Values

Authors
Moniz, N; Ribeiro, RP; Cerqueira, V; Chawla, N;

Publication
2018 IEEE 5TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA)

Abstract
Supervised learning with imbalanced domains is one of the biggest challenges in machine learning. Such tasks differ from standard learning tasks by assuming a skewed distribution of target variables, and user domain preference towards under-represented cases. Most research has focused on imbalanced classification tasks, where a wide range of solutions has been tested. Still, little work has been done concerning imbalanced regression tasks. In this paper, we propose an adaptation of the SMOTEBoost approach for the problem of imbalanced regression. Originally designed for classification tasks, it combines boosting methods and the SMOTE resampling strategy. We present four variants of SMOTEBoost and provide an experimental evaluation using 30 datasets with an extensive analysis of results in order to assess the ability of SMOTEBoost methods in predicting extreme target values, and their predictive trade-off concerning baseline boosting methods. SMOTEBoost is publicly available in a software package.

2018

REBAGG: REsampled BAGGing for Imbalanced Regression

Authors
Branco, P; Torgo, L; Ribeiro, RP;

Publication
Second International Workshop on Learning with Imbalanced Domains: Theory and Applications, LIDTA@ECML/PKDD 2018, Dublin, Ireland, September 10, 2018

Abstract

2018

Comparing Reverse Complementary Genomic Words Based on Their Distance Distributions and Frequencies

Authors
Tavares, AH; Raymaekers, J; Rousseeuw, PJ; Silva, RM; Bastos, CAC; Pinho, A; Brito, P; Afreixo, V;

Publication
INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES

Abstract
In this work, we study reverse complementary genomic word pairs in the human DNA, by comparing both the distance distribution and the frequency of a word to those of its reverse complement. Several measures of dissimilarity between distance distributions are considered, and it is found that the peak dissimilarity works best in this setting. We report the existence of reverse complementary word pairs with very dissimilar distance distributions, as well as word pairs with very similar distance distributions even when both distributions are irregular and contain strong peaks. The association between distribution dissimilarity and frequency discrepancy is also explored, and it is speculated that symmetric pairs combining low and high values of each measure may uncover features of interest. Taken together, our results suggest that some asymmetries in the human genome go far beyond Chargaff's rules. This study uses both the complete human genome and its repeat-masked version.

2018

Outlier detection in interval data

Authors
Silva, APD; Filzmoser, P; Brito, P;

Publication
ADVANCES IN DATA ANALYSIS AND CLASSIFICATION

Abstract
A multivariate outlier detection method for interval data is proposed that makes use of a parametric approach to model the interval data. The trimmed maximum likelihood principle is adapted in order to robustly estimate the model parameters. A simulation study demonstrates the usefulness of the robust estimates for outlier detection, and new diagnostic plots allow gaining deeper insight into the structure of real world interval data.

2018

Metalearning and Recommender Systems: A literature review and empirical study on the algorithm selection problem for Collaborative Filtering

Authors
Cunha, T; Soares, C; de Carvalho, ACPLF;

Publication
INFORMATION SCIENCES

Abstract
The problem of information overload motivated the appearance of Recommender Systems. From the several open problems in this area, the decision of which is the best recommendation algorithm for a specific problem is one of the most important and less studied. The current trend to solve this problem is the experimental evaluation of several recommendation algorithms in a handful of datasets. However, these studies require an extensive amount of computational resources, particularly processing time. To avoid these drawbacks, researchers have investigated the use of Metalearning to select the best recommendation algorithms in different scopes. Such studies allow to understand the relationships between data characteristics and the relative performance of recommendation algorithms, which can be used to select the best algorithm(s) for a new problem. The contributions of this study are two-fold: 1) to identify and discuss the key concepts of algorithm selection for recommendation algorithms via a systematic literature review and 2) to perform an experimental study on the Metalearning approaches reviewed in order to identify the most promising concepts for automatic selection of recommendation algorithms.

2018

A Label Ranking approach for selecting rankings of Collaborative Filtering algorithms

Authors
Cunha, T; Soares, C; de Carvalho, ACPLF;

Publication
33RD ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING

Abstract
The large amount of Recommender System algorithms makes the selection of the most suitable algorithm for a new dataset a difficult task. Metalearning has been successfully used to deal with this problem. It works by mapping dataset characteristics with the predictive performance obtained by a set of algorithms. The models built on this data are capable of predicting the best algorithm for a new dataset. However, typical approaches try only to predict the best algorithm, overlooking the performance of others. This study focus on the use of Metalearning to select the best ranking of CF algorithms for a new recommendation dataset. The contribution lies in the formalization and experimental validation of using Label Ranking to select a ranked list of algorithms. The experimental procedure proves the superior performance of the proposed approach regarding both ranking accuracy and impact on the baselevel performance. Furthermore, it draws and compares the knowledge regarding metafeature importance for both classification and Label Ranking tasks in order to provide guidelines for the design of algorithms in the Recommender System community.

  • 228
  • 516