Publicacoes - INESC TEC

Publicações

Publicações por Pavel Brazdil

2015

Algorithm selection via meta-learning and sample-based active testing

Autores
Abdulrahman, SM; Brazdil, P; Van Rijn, JN; Vanschoren, J;

Publicação
CEUR Workshop Proceedings

Abstract
Identifying the best machine learning algorithm for a given problem continues to be an active area of research. In this paper we present a new method which exploits both meta-level information acquired in past experiments and active testing, an algorithm selection strategy. Active testing attempts to iteratively identify an algorithm whose performance will most likely exceed the performance of previously tried algorithms. The novel method described in this paper uses tests on smaller data sample to rank the most promising candidates, thus optimizing the schedule of experiments to be carried out. The experimental results show that this approach leads to considerably faster algorithm selection.

FecharLer Abstract

2015

Combining regression models and metaheuristics to optimize space allocation in the retail industry

Autores
Pinto, F; Soares, C; Brazdil, P;

Publicação
INTELLIGENT DATA ANALYSIS

Abstract
Data Mining (DM) researchers often focus on the development and testing of models for a single decision (e.g., direct mailing, churn detection, etc.). In practice, however, multiple decisions have often to be made simultaneously which are not independent and the best global solution is often not the combination of the best individual solutions. This problem can be addressed by searching for the overall best solution by using optimization methods based on the predictions made by the DM models. We describe one case study were this approach was used to optimize the layout of a retail store in order to maximize predicted sales. A metaheuristic is used to search different hypothesis of space allocations for multiple product categories, guided by the predictions made by regression models that estimate the sales for each category based on the assigned space. We test three metaheuristics and three regression algorithms on this task. Results show that the Particle Swam Optimization method guided by the models obtained with Random Forests and Support Vector Machines models obtain good results. We also provide insights about the relationship between the correctness of the regression models and the metaheuristics performance.

FecharLer Abstract

2015

Fast Algorithm Selection Using Learning Curves

Autores
van Rijn, JN; Abdulrahman, SM; Brazdil, P; Vanschoren, J;

Publicação
Advances in Intelligent Data Analysis XIV

Abstract
One of the challenges in Machine Learning to find a classifier and parameter settings that work well on a given dataset. Evaluating all possible combinations typically takes too much time, hence many solutions have been proposed that attempt to predict which classifiers are most promising to try. As the first recommended classifier is not always the correct choice, multiple recommendations should be made, making this a ranking problem rather than a classification problem. Even though this is a well studied problem, there is currently no good way of evaluating such rankings. We advocate the use of Loss Time Curves, as used in the optimization literature. These visualize the amount of budget (time) needed to converge to a acceptable solution. We also investigate a method that utilizes the measured performances of classifiers on small samples of data to make such recommendation, and adapt it so that it works well in Loss Time space. Experimental results show that this method converges extremely fast to an acceptable solution.

FecharLer Abstract

2016

Meta-learning to select the best meta-heuristic for the Traveling Salesman Problem: A comparison of meta-features

Autores
Kanda, J; de Carvalho, A; Hruschka, E; Soares, C; Brazdil, P;

Publicação
NEUROCOMPUTING

Abstract
The Traveling Salesman Problem (TSP) is one of the most studied optimization problems. Various meta heuristics (MHs) have been proposed and investigated on many instances of this problem. It is widely accepted that the best MH varies for different instances. Ideally, one should be able to recommend the best MHs for a new TSP instance without having to execute them. However, this is a very difficult task. We address this task by using a meta-learning approach based on label ranking algorithms. These algorithms build a mapping that relates the characteristics of those instances (i.e., the meta-features) with the relative performance (i.e., the ranking) of MHs, based on (meta-)data extracted from TSP instances that have been already solved by those MHs. The success of this approach depends on the quality of the meta-features that describe the instances. In this work, we investigate four different sets of meta-features based on different measurements of the properties of TSP instances: edge and vertex measures, complex network measures, properties from the MHs, and subsampling landmarkers properties. The models are investigated in four different TSP scenarios presenting symmetry and connection strength variations. The experimental results indicate that meta-learning models can accurately predict rankings of MHs for different TSP scenarios. Good solutions for the investigated TSP instances can be obtained from the prediction of rankings of MHs, regardless of the learning algorithm used at the meta level. The experimental results also show that the definition of the set of meta-features has an important impact on the quality of the solutions obtained.

FecharLer Abstract

2015

Retrieval, visualization and validation of affinities between documents

Autores
Trigo, L; Víta, M; Sarmento, R; Brazdil, P;

Publicação
IC3K 2015 - Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management

Abstract
We present an Information Retrieval tool that facilitates the task of the user when searching for a particular information that is of interest to him. Our system processes a given set of documents to produce a graph, where nodes represent documents and links the similarities. The aim is to offer the user a tool to navigate in this space in an easy way. It is possible to collapse/expand nodes. Our case study shows affinity groups based on the similarities of text production of researchers. This goes beyond the already established communities revealed by co-authorship. The system characterizes the activity of each author by a set of automatically generated keywords and by membership to a particular affinity group. The importance of each author is highlighted visually by the size of the node corresponding to the number of publications and different measures of centrality. Regarding the validation of the method, we analyse the impact of using different combinations of titles, abstracts and keywords on capturing the similarity between researchers.

FecharLer Abstract

2017

Inductive Transfer

Autores
Vilalta, R; Giraud Carrier, CG; Brazdil, P; Soares, C;

Publicação
Encyclopedia of Machine Learning and Data Mining

Abstract
We describe different scenarios where a learning mechanism is capable of acquiring experience on a source task, and subsequently exploit such experience on a target task. The core ideas behind this ability to transfer knowledge from one task to another have been studied in the machine learning literature under different titles and perspectives. Here we describe some of them under the names of inductive transfer, transfer learning, multitask learning, meta-searching, meta-generalization, and domain adaptation. © Springer Science+Business Media New York 2011, 2017

FecharLer Abstract