Publications

Publications by LIAAD

2006

Rule-based prediction of rare extreme values

Authors
Ribeiro, R; Torgo, L;

Publication
DISCOVERY SCIENCE, PROCEEDINGS

Abstract
This paper describes a rule learning method that obtains models biased towards a particular class of regression tasks. These tasks have as main distinguishing feature the fact that the main goal is to be accurate at predicting rare extreme values of the continuous target variable. Many real-world applications from scientific areas like ecology, meteorology, finance,etc., share this objective. Most existing approaches to regression problems search for the model parameters that optimize a given average error estimator (e.g. mean squared error). This means that they are biased towards achieving a good performance on the most common cases. The motivation for our work is the claim that being accurate at a small set of rare cases requires different error metrics. Moreover, given the nature and relevance of this type of applications an interpretable model is usually of key importance to domain experts, as predicting these rare events is normally associated with costly decisions. Our proposed system (R-PREV) obtains a set of interpretable regression rules derived from a set of bagged regression trees using evaluation metrics that bias the resulting models to predict accurately rare extreme values. We provide an experimental evaluation of our method confirming the advantages of our proposal in terms of accuracy in predicting rare extreme values.

CloseRead Abstract

2006

Predicting rare extreme values

Authors
Torgo, L; Ribeiro, R;

Publication
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS

Abstract
Modelling extreme data is very important in several application domains, like for instance finance, meteorology, ecology, etc.. This paper addresses the problem of predicting extreme values of a continuous variable. The main distinguishing feature of our target applications resides on the fact that these values are rare. Any prediction model is obtained by some sort of search process guided by a pre-specified evaluation criterion. In this work we argue against the use of standard criteria for evaluating regression models in the context of our target applications. We propose. a new predictive performance metric for this class of problems that our experiments show to perform better in distinguishing models that are more accurate at rare extreme values. This new evaluation metric could be used as the basis for developing better models in terms of rare extreme values prediction.

CloseRead Abstract

2006

Organizational survival in cooperation networks: The case of automobile manufacturing

Authors
Campos, P; Brazdil, P; Brito, P;

Publication
Network-Centric Collaboration and Supporting Frameworks

Abstract
We propose a Multi-Agent framework to analyze the dynamics of organizational survival in cooperation networks. Firms can decide to cooperate horizontally (in the same market) or vertically with other firms that belong to the supply chain. Cooperation decisions are based on economic variables. We have defined a variant of the density dependence model to set up the dynamics of the survival in the simulation. To validate our model, we have used empirical outputs obtained in previous studies from the automobile manufacturing sector. We have observed that firms and networks proliferate in the regions with lower marginal costs, but new networks keep appearing and disappearing in regions with higher marginal costs.

CloseRead Abstract

2006

Dynamic clustering for interval data based on L-2 distance

Authors
de Carvalho, FDAT; Brito, P; Bock, HH;

Publication
COMPUTATIONAL STATISTICS

Abstract
This paper introduces a partitioning clustering method for objects described by interval data. It follows the dynamic clustering approach and uses an L-2 distance. Particular emphasis is put on the standardization problem where we propose and investigate three standardization techniques for interval-type variables. Moreover, various tools for cluster interpretation are presented and illustrated by simulated and real-case data.

CloseRead Abstract

2006

Linear discriminant analysis for interval data

Authors
Duarte Silva, APD; Brito, P;

Publication
COMPUTATIONAL STATISTICS

Abstract
This paper compares different approaches to the multivariate analysis of interval data, focusing on discriminant analysis. Three fundamental approaches are considered. The first approach assumes an uniform distribution in each observed interval, derives the corresponding measures of dispersion and association, and appropriately defines linear combinations of interval variables that maximize the usual discriminant criterion. The second approach expands the original data set into the set of all interval description vertices, and proceeds with a classical analysis of the expanded set. Finally, a third approach replaces each interval by a midpoint and range representation. Resulting representations, using intervals or single points, are discussed and distance based allocation rules are proposed. The three approaches are illustrated on a real data set.

CloseRead Abstract

2006

Symbolic and spatial data analysis: Mining complex data structures

Authors
Brito, P; Noirhomme Fraiture, M;

Publication
INTELLIGENT DATA ANALYSIS

Abstract