Detalhes
Nome
Rui LeiteCluster
InformáticaCargo
Investigador Colaborador ExternoDesde
01 janeiro 2010
Nacionalidade
PortugalCentro
Laboratório de Inteligência Artificial e Apoio à DecisãoContactos
+351220402963
rui.leite@inesctec.pt
2018
Autores
Brito, J; Campos, P; Leite, R;
Publicação
Communications in Computer and Information Science
Abstract
The economic impact of fraud is wide and fraud can be a critical problem when the prevention procedures are not robust. In this paper we create a model to detect fraudulent transactions, and then use a classification algorithm to assess if the agent is fraud prone or not. The model (BOND) is based on the analytics of an economic network of agents of three types: individuals, businesses and financial intermediaries. From the dataset of transactions, a sliding window of rows previously aggregated per agent has been used and machine learning (classification) algorithms have been applied. Results show that it is possible to predict the behavior of agents, based on previous transactions. © 2018, Springer International Publishing AG, part of Springer Nature.
2012
Autores
Leite, R; Brazdil, P; Vanschoren, J;
Publicação
CEUR Workshop Proceedings
Abstract
Given the large amount of data mining algorithms, their combinations (e.g. ensembles) and possible parameter settings, finding the most adequate method to analyze a new dataset becomes an ever more challenging task. This is because in many cases testing all possibly useful alternatives quickly becomes prohibitively expensive. In this paper we propose a novel technique, called active testing, that intelligently selects the most useful cross-validation tests. It proceeds in a tournament-style fashion, in each round selecting and testing the algorithm that is most likely to outperform the best algorithm of the previous round on the new dataset. This 'most promising' competitor is chosen based on a history of prior duels between both algorithms on similar datasets. Each new cross-validation test will contribute information to a better estimate of dataset similarity, and thus better predict which algorithms are most promising on the new dataset. We also follow a different path to estimate dataset similarity based on data characteristics. We have evaluated this approach using a set of 292 algorithm-parameter combinations on 76 UCI datasets for classification. The results show that active testing will quickly yield an algorithm whose performance is very close to the optimum, after relatively few tests. It also provides a better solution than previously proposed methods. The variants of our method that rely on crossvalidation tests to estimate dataset similarity provides better solutions than those that rely on data characteristics.
2010
Autores
Leite, R; Brazdil, P;
Publicação
ECAI 2010 - 19TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE
Abstract
Currently many classification algorithms exist and there is no algorithm that would outperform all the others in all tasks. Therefore it is of interest to determine which classification algorithm is the best one for a given task. Although direct comparisons can be made for any given problem using a cross-validation evaluation, it is desirable to avoid this, as the computational costs are significant. We describe a method which relies on relatively fast pairwise comparisons involving two algorithms. This method exploits sampling landmarks, that is information about learning curves besides classical data characteristics. One key feature of this method is an iterative procedure for extending the series of experiments used to gather new information in the form of sampling landmarks. Metalearning plays also a vital role. The comparisons between various pairs of algorithm are repeated and the result is represented in the form of a partially ordered ranking. Evaluation is done by comparing the partial order of algorithm that has been predicted to the partial order representing the supposedly correct result. The results of our analysis show that the method has good performance and could be of help in practical applications.
2007
Autores
Leite, R; Brazdil, P;
Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS
Abstract
This paper concerns the problem of predicting the relative performance of classification algorithms. Our approach requires that experiments are conducted on small samples. The information gathered is used to identify the nearest learning curve for which the sampling procedure was fully carried out. This allows the generation of a prediction regarding the relative performance of the algorithms. The method automatically establishes how many samples are needed and their sizes. This is done iteratively by taking into account the results of all previous experiments - both on other datasets and on the new dataset obtained so far. Experimental evaluation has shown that the method achieves better performance than previous approaches.
2004
Autores
Leite, R; Brazdil, P;
Publicação
MACHINE LEARNING: ECML 2004, PROCEEDINGS
Abstract
This paper describes a method that can be seen as an improvement of, the standard progressive sampling. The standard method uses samples of data of increasing size until accuracy of the learned concept cannot be further improved. The issue we have addressed here is how to avoid using some of the samples in this progression. The paper presents a method for predicting the stopping point using a meta-learning approach. The method requires just four iterations of the progressive sampling. The information gathered is used to identify the nearest learning curves, for which the sampling procedure was carried out fully. This in turn permits to generate the prediction regards the stopping point. Experimental evaluation shows that the method can lead to significant savings of time without significant losses of accuracy.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.