Publications

Publications by LIAAD

2017

Off the beaten track: A new linear model for interval data

Authors
Dias, S; Brito, P;

Publication
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH

Abstract
We propose a new linear regression model for interval-valued variables. The model uses quantile functions to represent the intervals, thereby considering the distributions within them. In this paper we study the special case where the Uniform distribution is assumed in each observed interval, and we analyze the extension to the Symmetric Triangular distribution. The parameters of the model are obtained solving a constrained quadratic optimization problem that uses the Mallows distance between quantile functions. As in the classical case, a goodness-of-fit measure is deduced. Two applications on up-to-date fields are presented: one predicting duration of unemployment and the other allowing forecasting burned area by forest fires.

CloseRead Abstract

2017

Exploratory data analysis for interval compositional data

Authors
Hron, K; Brito, P; Filzmoser, P;

Publication
ADVANCES IN DATA ANALYSIS AND CLASSIFICATION

Abstract
Compositional data are considered as data where relative contributions of parts on a whole, conveyed by (log-)ratios between them, are essential for the analysis. In Symbolic Data Analysis (SDA), we are in the framework of interval data when elements are characterized by variables whose values are intervals on representing inherent variability. In this paper, we address the special problem of the analysis of interval compositions, i.e., when the interval data are obtained by the aggregation of compositions. It is assumed that the interval information is represented by the respective midpoints and ranges, and both sources of information are considered as compositions. In this context, we introduce the representation of interval data as three-way data. In the framework of the log-ratio approach from compositional data analysis, it is outlined how interval compositions can be treated in an exploratory context. The goal of the analysis is to represent the compositions by coordinates which are interpretable in terms of the original compositional parts. This is achieved by summarizing all relative information (logratios) about each part into one coordinate from the coordinate system. Based on an example from the European Union Statistics on Income and Living Conditions (EU-SILC), several possibilities for an exploratory data analysis approach for interval compositions are outlined and investigated.

CloseRead Abstract

2017

Dissimilar Symmetric Word Pairs in the Human Genome

Authors
Tavares, AH; Raymaekers, J; Rousseeuw, PJ; Silva, RM; Bastos, CAC; Pinho, AJ; Brito, P; Afreixo, V;

Publication
11th International Conference on Practical Applications of Computational Biology & Bioinformatics, PACBB 2017, Porto, Portugal, 21-23 June, 2017

Abstract
In this work we explore the dissimilarity between symmetric word pairs, by comparing the inter-word distance distribution of a word to that of its reversed complement. We propose a new measure of dissimilarity between such distributions. Since symmetric pairs with different patterns could point to evolutionary features, we search for the pairs with the most dissimilar behaviour. We focus our study on the complete human genome and its repeat-masked version. © Springer International Publishing AG 2017.

CloseRead Abstract

2017

Arbitrated Ensemble for Solar Radiation Forecasting

Authors
Cerqueira, V; Torgo, L; Soares, C;

Publication
ADVANCES IN COMPUTATIONAL INTELLIGENCE, IWANN 2017, PT I

Abstract
Utility companies rely on solar radiation forecasting models to control the supply and demand of energy as well as the operability of the grid. They use these predictive models to schedule power plan operations, negotiate prices in the electricity market and improve the performance of solar technologies in general. This paper proposes a novel method for global horizontal irradiance forecasting. The method is based on an ensemble approach, in which individual competing models are arbitrated by a metalearning layer. The goal of arbitrating individual forecasters is to dynamically combine them according to their aptitude in the input data. We validate our proposed model for solar radiation forecasting using data collected by a real-world provider. The results from empirical experiments show that the proposed method is competitive with other methods, including current state-of-the-art methods used for time series forecasting tasks.

CloseRead Abstract

2017

Inductive Transfer

Authors
Vilalta, R; Giraud Carrier, CG; Brazdil, P; Soares, C;

Publication
Encyclopedia of Machine Learning and Data Mining

Abstract
We describe different scenarios where a learning mechanism is capable of acquiring experience on a source task, and subsequently exploit such experience on a target task. The core ideas behind this ability to transfer knowledge from one task to another have been studied in the machine learning literature under different titles and perspectives. Here we describe some of them under the names of inductive transfer, transfer learning, multitask learning, meta-searching, meta-generalization, and domain adaptation. © Springer Science+Business Media New York 2011, 2017

CloseRead Abstract

2017

Label Ranking Forests

Authors
de Sa, CR; Soares, C; Knobbe, A; Cortez, P;

Publication
EXPERT SYSTEMS

Abstract
The problem of Label Ranking is receiving increasing attention from several research communities. The algorithms that have been developed/adapted to treat rankings of a fixed set of labels as the target object, including several different types of decision trees (DT). One DT-based algorithm, which has been very successful in other tasks but which has not been adapted for label ranking is the Random Forests (RF) algorithm. RFs are an ensemble learning method that combines different trees obtained using different randomization techniques. In this work, we propose an ensemble of decision trees for Label Ranking, based on Random Forests, which we refer to as Label Ranking Forests (LRF). Two different algorithms that learn DT for label ranking are used to obtain the trees. We then compare and discuss the results of LRF with standalone decision tree approaches. The results indicate that the method is highly competitive.

CloseRead Abstract