Publicacoes - INESC TEC

Publicações

Publicações por Carlos Manuel Soares

2012

Multilayer perceptron for label ranking

Autores
Ribeiro, G; Duivesteijn, W; Soares, C; Knobbe, A;

Publicação
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
Label Ranking problems are receiving increasing attention in machine learning. The goal is to predict not just a single value from a finite set of labels, but rather the permutation of that set that applies to a new example (e.g., the ranking of a set of financial analysts in terms of the quality of their recommendations). In this paper, we adapt a multilayer perceptron algorithm for label ranking. We focus on the adaptation of the Back-Propagation (BP) mechanism. Six approaches are proposed to estimate the error signal that is propagated by BP. The methods are discussed and empirically evaluated on a set of benchmark problems. © 2012 Springer-Verlag.

FecharLer Abstract

2010

Empirical evaluation of ranking prediction methods for gene expression data classification

Autores
De Souza, BF; De Carvalho, ACPLF; Soares, C;

Publicação
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
Recently, meta-learning techniques have been employed to the problem of algorithm recommendation for gene expression data classification. Due to their flexibility, the advice provided to the user was in the form of rankings, which are able to express a preference order of Machine Learning algorithms accordingly to their expected relative performance. Thus, choosing how to learn accurate rankings arises as a key research issue. In this work, the authors empirically evaluated 2 general approaches for ranking prediction and extended them. The results obtained for 49 publicly available microarray datasets indicate that the extensions introduced were very beneficial to the quality of the predicted rankings. © 2010 Springer-Verlag.

FecharLer Abstract

2010

Intelligent Document Routing as a First Step towards Workflow Automation: A Case Study Implemented in SQL

Autores
Soares, C; Calejo, M;

Publicação
LEVERAGING APPLICATIONS OF FORMAL METHODS, VERIFICATION, AND VALIDATION, PT I

Abstract
In large and complex organizations, the development of workflow automation projects is hard. In some cases, a first important step in that direction is the automation of the routing of incoming documents. In this paper, we describe a project to develop a system for the first routing of incoming letters to the right department within a large, public portuguese institution. We followed a data mining approach, where data representing previous routings were analyzed to obtain a model that can be used to route future documents. The approach followed was strongly influenced by some of the limitations imposed by the customer: the budget available was small and the solution should be developed in SQL to facilitate integration with the existing system. The system developed was able to obtain satisfactory results. However, as in any Data Mining project, most of the effort was dedicated to activities other than modelling (e.g., data preparation), which means that there is still plenty of room for improvement.

FecharLer Abstract

2010

A Similarity-Based Adaptation of Naive Bayes for Label Ranking: Application to the Metalearning Problem of Algorithm Recommendation

Autores
Aiguzhinov, A; Soares, C; Serra, AP;

Publicação
DISCOVERY SCIENCE, DS 2010

Abstract
The problem of learning label rankings is receiving increasing attention from several research communities. A number of common learning algorithms have been adapted for this task, including k-Nearest Neighbours (k-NN) and decision trees. Following this line, we propose an adaptation of the naive Bayes classification algorithm for the label ranking problem. Our main idea lies in the use of similarity between the rankings to replace the concept of probability. We empirically test the proposed method on some metalearning problems that consist of relating characteristics of learning problems to the relative performance of learning algorithms. Our method generally performs better than the baseline indicating that it is able to identify some of the underlying patterns in the data.

FecharLer Abstract

2009

Selection of Heuristics for the Job-Shop Scheduling Problem Based on the Prediction of Gaps in Machines

Autores
Abreu, P; Soares, C; Valente, JMS;

Publicação
LEARNING AND INTELLIGENT OPTIMIZATION

Abstract
We present a general methodology to model the behavior of heuristics for the Job-Shop Scheduling (JSS) that address the problem by solving conflicts between different operations on the same machine. Our models estimate the gaps between consecutive operations on a machine given measures that characteristics the JSS instance and those operations. These models can be used for a better understanding of the behavior of the heuristics as well as to estimate the performance of the methods. We tested it using two well know heuristics: Shortest Processing Time and Longest Processing Time, that were tested on a large number of random JSS instances. Our results show that it is possible to predict the value of the gaps between consecutive operations from on the job, on random instances. However, the prediction the relative performance of the two heuristics based on those estimates is not successful. Concerning the main goal of this work, we show that the models provide interesting information about the behavior of the heuristics.

FecharLer Abstract

2009

Detecting Errors in Foreign Trade Transactions: Dealing with Insufficient Data

Autores
Torgo, L; Pereira, W; Soares, C;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS

Abstract
This paper describes a data mining approach to the problem of detecting erroneous foreign trade transactions in data collected by the Portuguese Institute of Statistics (INE). Erroneous transactions are a minority, but still they have an important impact: on the official statistics produced by INE. Detecting these rare errors is a manual, time-consuming task, which is constrained by a limited amount of available resources (e.g. financial, human). These constraints are common to many other data analysis problems (e.g. fraud detection). Our previous work addresses this issue by producing a ranking of outlyingness that allows a better management of the available resources by allocating them to the most, relevant cases. It is based on an adaptation of hierarchical clustering methods for outlier detection. However, the method cannot be applied to articles with a small number of transactions. In this paper, we complement the previous approach with some standard statistical methods for outlier detection for handling articles with few transactions. Our experiments clearly show its advantages in terms of the criteria, outlined by INE for considering any method applicable to this business problem. The generality of the approach remains to be tested in other problems which share the same constraints (e.g. fraud detection).

FecharLer Abstract