Publicacoes - INESC TEC

Publicações

Publicações por LIAAD

2010

Ensembles of jittered association rule classifiers

Autores
Azevedo, PJ; Jorge, AM;

Publicação
DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
The ensembling of classifiers tends to improve predictive accuracy. To obtain an ensemble with N classifiers, one typically needs to run N learning processes. In this paper we introduce and explore Model Jittering Ensembling, where one single model is perturbed in order to obtain variants that can be used as an ensemble. We use as base classifiers sets of classification association rules. The two methods of jittering ensembling we propose are Iterative Reordering Ensembling (IRE) and Post Bagging (PB). Both methods start by learning one rule set over a single run, and then produce multiple rule sets without relearning. Empirical results on 36 data sets are positive and show that both strategies tend to reduce error with respect to the single model association rule classifier. A bias-variance analysis reveals that while both IRE and PB are able to reduce the variance component of the error, IRE is particularly effective in reducing the bias component. We show that Model Jittering Ensembling can represent a very good speed-up w.r.t. multiple model learning ensembling. We also compare Model Jittering with various state of the art classifiers in terms of predictive accuracy and computational efficiency.

FecharLer Abstract

2010

Interval Forecast of Water Quality Parameters

Autores
Ohashi, O; Torgo, L; Ribeiro, RP;

Publicação
ECAI 2010 - 19TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE

Abstract
The current quality control methodology adopted by the water distribution service provider in the metropolitan region of Porto - Portugal, is based on simple heuristics and empirical knowledge. Based on the domain complexity and data volume, this application is a perfect candidate to apply data mining process. In this paper, we propose a new methodology to predict the range of normality for the values of different water quality parameters. These intervals of normality are of key importance to decide on costly inspection activities. Our experimental evaluation confirms that our proposal achieves good results on the task of forecasting the normal distribution of values for the following 30 days. The proposed method can be applied to other domains with similar network monitoring objectives.

FecharLer Abstract

2010

Data Mining for Business Applications: Introduction

Autores
Soares, C; Ghani, R;

Publicação
Data Mining for Business Applications

Abstract
This chapter introduces the volume on Data Mining (DM) for Business Applications. The chapters in this book provide an overview of some of the major advances in the field, namely in terms of methodology and applications, both traditional and emerging. In this introductory paper, we provide a context for the rest of the book. The framework for discussing the contents of the book is the DM methodology, which is suitable both to organize and relate the diverse contributions of the chapters selected. The chapter closes with an overview of the chapters in the book to guide the reader.

FecharLer Abstract

2010

Resource-bounded Outlier Detection using Clustering Methods

Autores
Torgo, L; Soares, C;

Publicação
Data Mining for Business Applications

Abstract
This paper describes a methodology for the application of hierarchical clustering methods to the task of outlier detection. The methodology is tested on the problem of cleaning Official Statistics data. The goal is to detect erroneous foreign trade transactions in data collected by the Portuguese Institute of Statistics (INE). These transactions are a minority, but still they have an important impact on the statistics produced by the institute. The detectiong of these rare errors is a manual, time-consuming task. This type of tasks is usually constrained by a limited amount of available resources. Our proposal addresses this issue by producing a ranking of outlyingness that allows a better management of the available resources by allocating them to the cases which are most different from the other and, thus, have a higher probability of being errors. Our method is based on the output of standard agglomerative hierarchical clustering algorithms, resulting in no significant additional computational costs. Our results show that it enables large savings by selecting a small subset of suspicious transactions for manual inspection, which, nevertheless, includes most of the erroneous transactions. In this study we compare our proposal to a state of the art outlier ranking method (LOF) and show that our method achieves better results on this particular application. The results of our experiments are also competitive with previous results on the same data. Finally, the outcome of our experiments raises important questions concerning the method currently followed at INE concerning items with small number of transactions.

FecharLer Abstract

2010

Data Mining for Business Applications

Autores
Soares, C; Ghani, R;

Publicação

Abstract

2010

Inductive Transfer

Autores
Utgoff, PE; Cussens, J; Kramer, S; Jain, S; Stephan, F; Raedt, LD; Todorovski, L; Flener, P; Schmid, U; Vilalta, R; Giraud-Carrier, C; Brazdil, P; Soares, C; Keogh, E; Smart, WD; Abbeel, P; Ng, AY;

Publicação
Encyclopedia of Machine Learning

Abstract