Publicacoes - INESC TEC

Publicações

Publicações por Rui Leite

2018

An agent-based model for detection in economic networks

Autores
Brito, J; Campos, P; Leite, R;

Publicação
Communications in Computer and Information Science

Abstract
The economic impact of fraud is wide and fraud can be a critical problem when the prevention procedures are not robust. In this paper we create a model to detect fraudulent transactions, and then use a classification algorithm to assess if the agent is fraud prone or not. The model (BOND) is based on the analytics of an economic network of agents of three types: individuals, businesses and financial intermediaries. From the dataset of transactions, a sliding window of rows previously aggregated per agent has been used and machine learning (classification) algorithms have been applied. Results show that it is possible to predict the behavior of agents, based on previous transactions. © 2018, Springer International Publishing AG, part of Springer Nature.

FecharLer Abstract

2021

Exploiting Performance-based Similarity between Datasets in Metalearning

Autores
Leite, R; Brazdil, P;

Publicação
AAAI Workshop on Meta-Learning and MetaDL Challenge, MetaDL@AAAI 2021, virtual, February 9, 2021.

Abstract

2012

Selecting classification algorithms with active testing on similar datasets

Autores
Leite, R; Brazdil, P; Vanschoren, J;

Publicação
CEUR Workshop Proceedings

Abstract
Given the large amount of data mining algorithms, their combinations (e.g. ensembles) and possible parameter settings, finding the most adequate method to analyze a new dataset becomes an ever more challenging task. This is because in many cases testing all possibly useful alternatives quickly becomes prohibitively expensive. In this paper we propose a novel technique, called active testing, that intelligently selects the most useful cross-validation tests. It proceeds in a tournament-style fashion, in each round selecting and testing the algorithm that is most likely to outperform the best algorithm of the previous round on the new dataset. This 'most promising' competitor is chosen based on a history of prior duels between both algorithms on similar datasets. Each new cross-validation test will contribute information to a better estimate of dataset similarity, and thus better predict which algorithms are most promising on the new dataset. We also follow a different path to estimate dataset similarity based on data characteristics. We have evaluated this approach using a set of 292 algorithm-parameter combinations on 76 UCI datasets for classification. The results show that active testing will quickly yield an algorithm whose performance is very close to the optimum, after relatively few tests. It also provides a better solution than previously proposed methods. The variants of our method that rely on crossvalidation tests to estimate dataset similarity provides better solutions than those that rely on data characteristics.

FecharLer Abstract

2007

An iterative process for building learning curves and predicting relative performance of classifiers

Autores
Leite, R; Brazdil, P;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS

Abstract
This paper concerns the problem of predicting the relative performance of classification algorithms. Our approach requires that experiments are conducted on small samples. The information gathered is used to identify the nearest learning curve for which the sampling procedure was fully carried out. This allows the generation of a prediction regarding the relative performance of the algorithms. The method automatically establishes how many samples are needed and their sizes. This is done iteratively by taking into account the results of all previous experiments - both on other datasets and on the new dataset obtained so far. Experimental evaluation has shown that the method achieves better performance than previous approaches.

FecharLer Abstract

2004

Improving progressive sampling via meta-learning on learning curves

Autores
Leite, R; Brazdil, P;

Publicação
MACHINE LEARNING: ECML 2004, PROCEEDINGS

Abstract
This paper describes a method that can be seen as an improvement of, the standard progressive sampling. The standard method uses samples of data of increasing size until accuracy of the learned concept cannot be further improved. The issue we have addressed here is how to avoid using some of the samples in this progression. The paper presents a method for predicting the stopping point using a meta-learning approach. The method requires just four iterations of the progressive sampling. The information gathered is used to identify the nearest learning curves, for which the sampling procedure was carried out fully. This in turn permits to generate the prediction regards the stopping point. Experimental evaluation shows that the method can lead to significant savings of time without significant losses of accuracy.

FecharLer Abstract

2003

Improving progressive sampling via meta-learning

Autores
Leite, R; Brazdil, P;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE

Abstract
We present a method that can be seen as an improvement of standard progressive sampling method. The method exploits information concerning performance of a given algorithm on past datasets, which is used to generate predictions of the stopping point. Experimental evaluation shows that the method can lead to significant time savings without significant losses in accuracy.

FecharLer Abstract