Publicacoes - INESC TEC

Publicações

Publicações por LIAAD

2008

Changing seasonality in North Atlantic coastal sea level from the analysis of long tide gauge records

Autores
Barbosa, SM; Silva, ME; Fernandes, MJ;

Publicação
TELLUS SERIES A-DYNAMIC METEOROLOGY AND OCEANOGRAPHY

Abstract
Sea level is a key variable in the context of global climate change. Climate-induced variability is expected to affect not only the mean sea level but also the amplitude and phase of its seasonal cycle. This study addresses the changes in the amplitude and phase of the annual cycle of coastal sea level in the extra-tropical North Atlantic. The physical causes of these variations are explored by analysing the association between fluctuations in the annual amplitude of sea level and in ancillary parameters [atmospheric pressure, sea-surface temperature and North Atlantic Oscillation (NAO) winter index]. The annual cycle is extracted through autoregressive decomposition, in order to be able to separate variations in seasonality from long-term interannual variations in the mean. The changes detected in the annual sea level cycle are regionally coherent, and related to changes in the analysed forcing parameters. At the northern sites, fluctuations in the annual amplitude of sea level are associated with concurrent changes in temperature, while atmospheric pressure is the dominant influence for most of the sites on the western boundary. The state of the NAO influences the annual variability in the Southern Bight, possibly through NAO-related changes in wind stress and ocean circulation.

FecharLer Abstract

2007

Iterative reordering of rules for building ensembles without relearning

Autores
Azevedo, PJ; Jorge, AM;

Publicação
DISCOVERY SCIENCE, PROCEEDINGS

Abstract
We study a new method for improving the classification accuracy of a model composed of classification association rules (CAR). The method consists in reordering the original set of rules according to the error rates obtained on a set of training examples. This is done iteratively, starting from the original set of rules. After obtaining N models these are used as an ensemble for classifying new cases. The net effect of this approach is that the original rule model is clearly improved. This improvement is due to the ensembling of the obtained models, which are, individually, slightly better than the original one. This ensembling approach has the advantage of running a single learning process, since the models in the ensemble are obtained by self replicating the original one.

FecharLer Abstract

2007

A tool for interactive subgroup discovery using distribution rules

Autores
Lucas, JP; Jorge, AM; Pereira, F; PernaS, AM; Machado, AA;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS

Abstract
We describe an approach and a tool for the discovery of subgroups within the framework of distribution rule mining. Distribution rules are a kind of association rules particularly suited for the exploratory study of numerical variables of interest. Being an exploratory technique, the result of a distribution mining process is typically a very large number of patterns. Exploring such results is thus a complex task and limits the use of the technique. To overcome this shortcoming we developed a tool, written in Java, which supports subgroup discovery in a post-processing step. The tool engages the analyst in an interactive process of subgroup discovery by means of a graphical interface with well defined statistical grounds, where domain knowledge can be used during the identification of such subgroups amid the population. We show a case study to analyze the results of students in a large scale university admission examination.

FecharLer Abstract

2007

Comparing rule measures for predictive association rules

Autores
Azevedo, PJ; Jorge, AM;

Publicação
Machine Learning: ECML 2007, Proceedings

Abstract
We study the predictive ability of some association rule measures typically used to assess descriptive interest. Such measures, namely conviction, lift and chi(2) are compared with confidence, Laplace, mutual information, cosine, Jaccard and phi-coefficient. As prediction models, we use sets of association rules. Classification is done by selecting the best rule, or by weighted voting. We performed an evaluation on 17 datasets with different characteristics and conclude that conviction is on average the best predictive measure to use in this setting. We also provide some meta-analysis insights for explaining the results.

FecharLer Abstract

2007

Quantitative evaluation of Clusterings for marketing applications: A web portal case study

Autores
Rebelo, C; Brito, PQ; Soares, C; Jorge, A; Brandao, R;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS

Abstract
The potential value of a market segmentation for a company is usually assessed in terms of six criteria: identifiability, substantiality, accessibility, responsiveness, stability and actionability. These are widely accepted as essential criteria, but they are difficult to quantify. Quantification is particularly important in early stages of the segmentation process, especially when automatic clustering methods are employed. With such methods it is easy to produce a large number of segmentations but only the most interesting ones should be selected for further analysis. In this paper, we address the problem of how to quantify the value of a segmentation according to the criteria above. We propose several measures and test them on a case study, consisting of a segmentation of portal users.

FecharLer Abstract

2007

Utility-based regression

Autores
Torgo, L; Ribeiro, R;

Publicação
Knowledge Discovery in Databases: PKDD 2007, Proceedings

Abstract
Cost-sensitive learning is a key technique for addressing many real world data mining applications. Most existing research has been focused on classification problems. In this paper we propose a framework for evaluating regression models in applications with non-uniform costs and benefits across the domain of the continuous target variable. Namely, we describe two metrics for asserting the costs and benefits of the predictions of any model given a set of test cases. We illustrate the use of our metrics in the context of a specific type of applications where non-uniform costs are required: the prediction of rare extreme values of a continuous target variable. Our experiments provide clear evidence of the utility of the proposed framework for evaluating the merits of any model in this class of regression domains.

FecharLer Abstract