Publicacoes - INESC TEC

Publicações

Publicações por LIAAD

2009

Metalearning - Applications to Data Mining

Autores
Brazdil, P; Giraud Carrier, CG; Soares, C; Vilalta, R;

Publicação
Cognitive Technologies

Abstract

2009

Meta-learning approach to gene expression data classification

Autores
Souza, BrunoFeresde; Soares, Carlos; Carvalho, AndreC.P.L.F.de;

Publicação
Int. J. Intelligent Computing and Cybernetics

Abstract
Purpose - The purpose of this paper is to investigate the applicability of meta-learning to the problem of algorithm recommendation for gene expression data classification. Design/methodology/approach - Meta-learning was used to provide a preference order of machine learning algorithms, based on their expected performances. Two approaches were considered for such: k-nearest neighbors and support vector machine-based ranking methods. They were applied to a set of 49 publicly available microarray datasets. The evaluation of the methods followed standard procedures suggested in the meta-learning literature. Findings - Empirical evidences show that both ranking methods produce more interesting suggestions for gene expression data classification than the baseline method. Although the rankings are more accurate, a significant difference in the performances of the top classifiers was not observed. Practical implications - As the experiments conducted in this paper suggest, the use of meta-learning approaches can provide an efficient data driven way to select algorithms for gene expression data classification. Originality/value - This paper reports contributions to the areas of meta-learning and gene expression data analysis. Regarding the former, it supports the claim that meta-learning can be suitably applied to problems of a specific domain, expanding its current practice. To the latter, it introduces a cost effective approach to better deal with classification tasks. © Emerald Group Publishing Limited.

FecharLer Abstract

2009

Selection of Heuristics for the Job-Shop Scheduling Problem Based on the Prediction of Gaps in Machines

Autores
Abreu, P; Soares, C; Valente, JMS;

Publicação
LEARNING AND INTELLIGENT OPTIMIZATION

Abstract
We present a general methodology to model the behavior of heuristics for the Job-Shop Scheduling (JSS) that address the problem by solving conflicts between different operations on the same machine. Our models estimate the gaps between consecutive operations on a machine given measures that characteristics the JSS instance and those operations. These models can be used for a better understanding of the behavior of the heuristics as well as to estimate the performance of the methods. We tested it using two well know heuristics: Shortest Processing Time and Longest Processing Time, that were tested on a large number of random JSS instances. Our results show that it is possible to predict the value of the gaps between consecutive operations from on the job, on random instances. However, the prediction the relative performance of the two heuristics based on those estimates is not successful. Concerning the main goal of this work, we show that the models provide interesting information about the behavior of the heuristics.

FecharLer Abstract

2009

Detecting Errors in Foreign Trade Transactions: Dealing with Insufficient Data

Autores
Torgo, L; Pereira, W; Soares, C;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS

Abstract
This paper describes a data mining approach to the problem of detecting erroneous foreign trade transactions in data collected by the Portuguese Institute of Statistics (INE). Erroneous transactions are a minority, but still they have an important impact: on the official statistics produced by INE. Detecting these rare errors is a manual, time-consuming task, which is constrained by a limited amount of available resources (e.g. financial, human). These constraints are common to many other data analysis problems (e.g. fraud detection). Our previous work addresses this issue by producing a ranking of outlyingness that allows a better management of the available resources by allocating them to the most, relevant cases. It is based on an adaptation of hierarchical clustering methods for outlier detection. However, the method cannot be applied to articles with a small number of transactions. In this paper, we complement the previous approach with some standard statistical methods for outlier detection for handling articles with few transactions. Our experiments clearly show its advantages in terms of the criteria, outlined by INE for considering any method applicable to this business problem. The generality of the approach remains to be tested in other problems which share the same constraints (e.g. fraud detection).

FecharLer Abstract

2009

Bioinspired Parameter Tuning of MLP Networks for Gene Expression Analysis: Quality of Fitness Estimates vs. Number of Solutions Analysed

Autores
Rossi, ALD; Soares, C; Carvalho, ACPLF;

Publicação
ADVANCES IN NEURO-INFORMATION PROCESSING, PT II

Abstract
The values selected for the free parameters of Artificial Neural Networks usually have a high impact on their performance. As a result, several works investigate the use of optimization techniques, mainly metaheuristics, for the selection of values related to the network architecture, like number of hidden neurons, number of hidden layers, activation function, and to the learning algorithm, like learning rate, momentum coefficient, etc. A large number of these works use Genetic Algorithms for parameter optimization. Lately, other bioinspired optimization techniques, like Ant Colony optimization, Particle Swarm Optimization, among others, have been successfully used. Although bioinspired optimization techniques have been successfully adopted to tune neural networks parameter values, little is known about the relation between the quality of the estimates of the fitness of a solution used during the search process and the quality of the solution obtained by the optimization method. In this paper, we describe an empirical study on this issue. To focus our analysis, we restricted the datasets to the domain of gene expression analysis. Our results indicate that, although the computational power saved by using simpler estimation methods can be used to increase the number of solutions tested in the search process, the use of accurate estimates to guide that search is the most important factor to obtain good solutions.

FecharLer Abstract

2009

UCI plus plus : Improved Support for Algorithm Selection Using Datasetoids

Autores
Soares, C;

Publicação
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS

Abstract
As companies employ a larger number of models, the problem of algorithm (and parameter) selection is becoming increasingly important. Two approaches to obtain empirical knowledge that is useful for that purpose are empirical studies and metalearning. However, most empirical (meta)knowledge is obtained from a, relatively small set, of datasets. In this paper, we propose a method to obtain a large number of datasets which is based on a simple transformation of existing datasets, referred to as datasetoids. We test our approach on the problem of using metalearning to predict when to prune decision trees. The results show significant; improvement when using datasetoids. Additionally, we identify a number of potential anomalies in the generated datasetoids and propose methods to solve them.

FecharLer Abstract