2012
Authors
Pinto, F; Soares, C;
Publication
CEUR Workshop Proceedings
Abstract
Companies are moving from developing a single model for a problem (e.g., a regression model to predict general sales) to developing several models for sub-problems of the original problem (e.g., regression models to predict sales of each of its product categories). Given the similarity between the sub-problems, the process of model development should not be independent. Information should be shared between processes. Different approaches can be used for that purpose, including metalearning (MtL) and transfer learning. In this work, we use MtL to predict the performance of a model based on the performance of models that were previously developed. Given that the sub-problems are related (e.g., the schemas of the data are the same), domain knowledge is used to develop the metafeatures that characterize them. The approach is applied to the development of models to predict sales of different product categories in a retail company from Portugal.
2010
Authors
Soares, C; Ghani, R;
Publication
Data Mining for Business Applications
Abstract
This chapter introduces the volume on Data Mining (DM) for Business Applications. The chapters in this book provide an overview of some of the major advances in the field, namely in terms of methodology and applications, both traditional and emerging. In this introductory paper, we provide a context for the rest of the book. The framework for discussing the contents of the book is the DM methodology, which is suitable both to organize and relate the diverse contributions of the chapters selected. The chapter closes with an overview of the chapters in the book to guide the reader.
2010
Authors
Torgo, L; Soares, C;
Publication
Data Mining for Business Applications
Abstract
This paper describes a methodology for the application of hierarchical clustering methods to the task of outlier detection. The methodology is tested on the problem of cleaning Official Statistics data. The goal is to detect erroneous foreign trade transactions in data collected by the Portuguese Institute of Statistics (INE). These transactions are a minority, but still they have an important impact on the statistics produced by the institute. The detectiong of these rare errors is a manual, time-consuming task. This type of tasks is usually constrained by a limited amount of available resources. Our proposal addresses this issue by producing a ranking of outlyingness that allows a better management of the available resources by allocating them to the cases which are most different from the other and, thus, have a higher probability of being errors. Our method is based on the output of standard agglomerative hierarchical clustering algorithms, resulting in no significant additional computational costs. Our results show that it enables large savings by selecting a small subset of suspicious transactions for manual inspection, which, nevertheless, includes most of the erroneous transactions. In this study we compare our proposal to a state of the art outlier ranking method (LOF) and show that our method achieves better results on this particular application. The results of our experiments are also competitive with previous results on the same data. Finally, the outcome of our experiments raises important questions concerning the method currently followed at INE concerning items with small number of transactions.
2010
Authors
Soares, C; Ghani, R;
Publication
Abstract
2012
Authors
de Miranda, PBC; Prudencio, RBC; de Carvalho, ACPLF; Soares, C;
Publication
COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2012, PT III
Abstract
Support Vector Machines (SVMs) have become a well succeed algorithm due to the good performance it achieves on different learning problems. However, to perform well the SVM formulation requires adjustments on its model. Avoiding the trial and error procedure, the automatic SVM parameter selection is a way to deal with this. The automatic parameter selection is commonly considered an optimization problem whose goal is to find suitable configuration of parameters which attends some learning problem. In the current work, we propose a study of the combination of Meta-learning (ML) with Particle Swarm Optimization (PSO) algorithms to optimize the SVM model, seeking for combinations of parameters which maximize the success rate of SVM. ML is used to recommend SVM parameters, to a given input problem, based on well-succeeded parameters adopted in previous similar problems. In this combination, initial solutions provided by ML are possibly located in good regions in the search space. Hence, using a reduced number of candidate search points, in the search process, to find an adequate solution, would be less expensive. In our work, we implemented five benchmarks PSO approaches applied to select two SVM parameters for classification. The experiments consist in comparing the performance of the search algorithms using a traditional random initialization and using ML suggestions as initial population. This research analysed the influence of meta-learning on convergence of the optimization algorithms, verifying that the combination of PSO techniques with ML obtained solutions with higher quality on a set of 40 classification problems.
2012
Authors
Miranda, PBC; Prudencio, RBC; de Carvalho, ACPLF; Soares, C;
Publication
2012 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)
Abstract
Support Vector Machines (SVMs) have become a well succeed technique due to the good performance it achieves on different learning problems. However, the performance depends on adjustments on its model. The automatic SVM parameter selection is a way to deal with this. This approach is considered an optimization problem whose goal is to find suitable configuration of parameters which attends some learning problem. This work proposes the use of Particle Swarm Optimization (PSO) to treat the SVM parameter selection problem. As the design of learning systems is inherently a multi-objective optimization problem, a multi-objective PSO (MOPSO) was used to maximize the success rate and minimize the number of support vectors of the model. Moreover, we propose the combination of Meta-Learning (ML) with MOPSO to the cited problem. ML is used to recommend SVM parameters, to a given input problem, based on well-succeeded parameters adopted in previous similar problems. In this combination, initial solutions provided by ML are possibly located in good regions in the search space. Hence, using a reduced number of candidate search points, the search process, to find an adequate solution, would be less expensive. We highlight that, the combination of search algorithms with ML was just studied in the single objective field and the use of MOPSO in this context has not been investigated. In our work, we implemented a prototype in which MOPSO was used to select the values of two SVM parameters for classification problems. In the performed experiments, the proposed solution (MOPSO using ML or Hybrid MOPSO) was compared to a MOPSO with random initialization, obtaining paretos with higher quality on a set of 40 classification problems.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.