Publicacoes - INESC TEC

Publicações

Publicações por LIAAD

2010

Data Mining for Business Applications: Introduction

Autores
Soares, C; Ghani, R;

Publicação
Data Mining for Business Applications

Abstract
This chapter introduces the volume on Data Mining (DM) for Business Applications. The chapters in this book provide an overview of some of the major advances in the field, namely in terms of methodology and applications, both traditional and emerging. In this introductory paper, we provide a context for the rest of the book. The framework for discussing the contents of the book is the DM methodology, which is suitable both to organize and relate the diverse contributions of the chapters selected. The chapter closes with an overview of the chapters in the book to guide the reader.

FecharLer Abstract

2010

Resource-bounded Outlier Detection using Clustering Methods

Autores
Torgo, L; Soares, C;

Publicação
Data Mining for Business Applications

Abstract
This paper describes a methodology for the application of hierarchical clustering methods to the task of outlier detection. The methodology is tested on the problem of cleaning Official Statistics data. The goal is to detect erroneous foreign trade transactions in data collected by the Portuguese Institute of Statistics (INE). These transactions are a minority, but still they have an important impact on the statistics produced by the institute. The detectiong of these rare errors is a manual, time-consuming task. This type of tasks is usually constrained by a limited amount of available resources. Our proposal addresses this issue by producing a ranking of outlyingness that allows a better management of the available resources by allocating them to the cases which are most different from the other and, thus, have a higher probability of being errors. Our method is based on the output of standard agglomerative hierarchical clustering algorithms, resulting in no significant additional computational costs. Our results show that it enables large savings by selecting a small subset of suspicious transactions for manual inspection, which, nevertheless, includes most of the erroneous transactions. In this study we compare our proposal to a state of the art outlier ranking method (LOF) and show that our method achieves better results on this particular application. The results of our experiments are also competitive with previous results on the same data. Finally, the outcome of our experiments raises important questions concerning the method currently followed at INE concerning items with small number of transactions.

FecharLer Abstract

2010

Data Mining for Business Applications

Autores
Soares, C; Ghani, R;

Publicação

Abstract

2010

Inductive Transfer

Autores
Utgoff, PE; Cussens, J; Kramer, S; Jain, S; Stephan, F; Raedt, LD; Todorovski, L; Flener, P; Schmid, U; Vilalta, R; Giraud-Carrier, C; Brazdil, P; Soares, C; Keogh, E; Smart, WD; Abbeel, P; Ng, AY;

Publicação
Encyclopedia of Machine Learning

Abstract

2010

Metalearning

Autores
Fürnkranz, J; Chan, PK; Craw, S; Sammut, C; Uther, W; Ratnaparkhi, A; Jin, X; Han, J; Yang, Y; Morik, K; Dorigo, M; Birattari, M; Stützle, T; Brazdil, P; Vilalta, R; Giraud-Carrier, C; Soares, C; Rissanen, J; Baxter, RA; Bruha, I; Baxter, RA; Webb, GI; Torgo, L; Banerjee, A; Shan, H; Ray, S; Tadepalli, P; Shoham, Y; Powers, R; Shoham, Y; Powers, R; Webb, GI; Ray, S; Scott, S; Blockeel, H; De Raedt, L;

Publicação
Encyclopedia of Machine Learning

Abstract

2010

Combining meta-learning and search techniques to SVM parameter selection

Autores
Gomes, TAF; Prudencio, RBC; Soares, C; Rossi, ALD; Carvalho, A;

Publicação
Proceedings - 2010 11th Brazilian Symposium on Neural Networks, SBRN 2010

Abstract
Support Vector Machines (SVMs) have achieved very good performance on different learning problems. However, the success of SVMs depends on the adequate choice of a number of parameters, including for instance the kernel and the regularization parameters. In the current work, we propose the combination of Meta-Learning and search techniques to the problem of SVM parameter selection. Given an input problem, Meta-Learning is used to recommend SVM parameters based on well-succeeded parameters adopted in previous similar problems. The parameters returned by Meta-Learning are then used as initial search points to a search technique which will perform a further exploration of the parameter space. In this combination, we envisioned that the initial solutions provided by Meta-Learning are located in good regions in the search space (i.e. they are closer to the optimum solutions). Hence, the search technique would need to evaluate a lower number of candidate search points in order to find an adequate solution. In our work, we implemented a prototype in which Particle Swarm Optimization (PSO) was used to select the values of two SVM parameters for regression problems. In the performed experiments, the proposed solution was compared to a PSO with random initialization, obtaining better average results on a set of 40 regression problems. © 2010 IEEE.

FecharLer Abstract