2006
Autores
Ribeiro, R; Torgo, L;
Publicação
DISCOVERY SCIENCE, PROCEEDINGS
Abstract
This paper describes a rule learning method that obtains models biased towards a particular class of regression tasks. These tasks have as main distinguishing feature the fact that the main goal is to be accurate at predicting rare extreme values of the continuous target variable. Many real-world applications from scientific areas like ecology, meteorology, finance,etc., share this objective. Most existing approaches to regression problems search for the model parameters that optimize a given average error estimator (e.g. mean squared error). This means that they are biased towards achieving a good performance on the most common cases. The motivation for our work is the claim that being accurate at a small set of rare cases requires different error metrics. Moreover, given the nature and relevance of this type of applications an interpretable model is usually of key importance to domain experts, as predicting these rare events is normally associated with costly decisions. Our proposed system (R-PREV) obtains a set of interpretable regression rules derived from a set of bagged regression trees using evaluation metrics that bias the resulting models to predict accurately rare extreme values. We provide an experimental evaluation of our method confirming the advantages of our proposal in terms of accuracy in predicting rare extreme values.
2006
Autores
Torgo, L; Ribeiro, R;
Publicação
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS
Abstract
Modelling extreme data is very important in several application domains, like for instance finance, meteorology, ecology, etc.. This paper addresses the problem of predicting extreme values of a continuous variable. The main distinguishing feature of our target applications resides on the fact that these values are rare. Any prediction model is obtained by some sort of search process guided by a pre-specified evaluation criterion. In this work we argue against the use of standard criteria for evaluating regression models in the context of our target applications. We propose. a new predictive performance metric for this class of problems that our experiments show to perform better in distinguishing models that are more accurate at rare extreme values. This new evaluation metric could be used as the basis for developing better models in terms of rare extreme values prediction.
2006
Autores
Campos, P; Brazdil, P; Brito, P;
Publicação
Network-Centric Collaboration and Supporting Frameworks
Abstract
We propose a Multi-Agent framework to analyze the dynamics of organizational survival in cooperation networks. Firms can decide to cooperate horizontally (in the same market) or vertically with other firms that belong to the supply chain. Cooperation decisions are based on economic variables. We have defined a variant of the density dependence model to set up the dynamics of the survival in the simulation. To validate our model, we have used empirical outputs obtained in previous studies from the automobile manufacturing sector. We have observed that firms and networks proliferate in the regions with lower marginal costs, but new networks keep appearing and disappearing in regions with higher marginal costs.
2006
Autores
de Carvalho, FDAT; Brito, P; Bock, HH;
Publicação
COMPUTATIONAL STATISTICS
Abstract
This paper introduces a partitioning clustering method for objects described by interval data. It follows the dynamic clustering approach and uses an L-2 distance. Particular emphasis is put on the standardization problem where we propose and investigate three standardization techniques for interval-type variables. Moreover, various tools for cluster interpretation are presented and illustrated by simulated and real-case data.
2006
Autores
Duarte Silva, APD; Brito, P;
Publicação
COMPUTATIONAL STATISTICS
Abstract
This paper compares different approaches to the multivariate analysis of interval data, focusing on discriminant analysis. Three fundamental approaches are considered. The first approach assumes an uniform distribution in each observed interval, derives the corresponding measures of dispersion and association, and appropriately defines linear combinations of interval variables that maximize the usual discriminant criterion. The second approach expands the original data set into the set of all interval description vertices, and proceeds with a classical analysis of the expanded set. Finally, a third approach replaces each interval by a midpoint and range representation. Resulting representations, using intervals or single points, are discussed and distance based allocation rules are proposed. The three approaches are illustrated on a real data set.
2006
Autores
Brito, P; Noirhomme Fraiture, M;
Publicação
INTELLIGENT DATA ANALYSIS
Abstract
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.