Cookies Policy
We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out More
Close
  • Menu
Interest
Topics
Details

Details

  • Name

    Fábio Hernâni Pinto
  • Cluster

    Computer Science
  • Role

    External Research Collaborator
  • Since

    12th November 2012
001
Publications

2016

Combining Boosted Trees with Metafeature Engineering for Predictive Maintenance

Authors
Cerqueira, V; Pinto, F; Sa, C; Soares, C;

Publication
ADVANCES IN INTELLIGENT DATA ANALYSIS XV

Abstract
We describe a data mining workflow for predictive maintenance of the Air Pressure System in heavy trucks. Our approach is composed by four steps: (i) a filter that excludes a subset of features and examples based on the number of missing values (ii) a metafeatures engineering procedure used to create a meta-level features set with the goal of increasing the information on the original data; (iii) a biased sampling method to deal with the class imbalance problem; and (iv) boosted trees to learn the target concept. Results show that the metafeatures engineering and the biased sampling method are critical for improving the performance of the classifier.

2016

Towards Automatic Generation of Metafeatures

Authors
Pinto, F; Soares, C; Mendes Moreira, J;

Publication
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2016, PT I

Abstract
The selection of metafeatures for metalearning (MtL) is often an ad hoc process. The lack of a proper motivation for the choice of a metafeature rather than others is questionable and may originate a loss of valuable information for a given problem (e.g., use of class entropy and not attribute entropy). We present a framework to systematically generate metafeatures in the context of MtL. This framework decomposes a metafeature into three components: meta-function, object and post-processing. The automatic generation of metafeatures is triggered by the selection of a meta-function used to systematically generate metafeatures from all possible combinations of object and post-processing alternatives. We executed experiments by addressing the problem of algorithm selection in classification datasets. Results show that the sets of systematic metafeatures generated from our framework are more informative than the non-systematic ones and the set regarded as state-of-the-art.

2016

CHADE: Metalearning with Classifier Chains for Dynamic Combination of Classifiers

Authors
Pinto, F; Soares, C; Moreira, JM;

Publication
Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2016, Riva del Garda, Italy, September 19-23, 2016, Proceedings, Part I

Abstract
Dynamic selection or combination (DSC) methods allow to select one or more classifiers from an ensemble according to the characteristics of a given test instance x. Most methods proposed for this purpose are based on the nearest neighbours algorithm: it is assumed that if a classifier performed well on a set of instances similar to x, it will also perform well on x. We address the problem of dynamically combining a pool of classifiers by combining two approaches: metalearning and multi-label classification. Taking into account that diversity is a fundamental concept in ensemble learning and the interdependencies between the classifiers cannot be ignored, we solve the multi-label classification problem by using a widely known technique: Classifier Chains (CC). Additionally, we extend a typical metalearning approach by combining metafeatures characterizing the interdependencies between the classifiers with the base-level features.We executed experiments on 42 classification datasets and compared our method with several state-of-the-art DSC techniques, including another metalearning approach. Results show that our method allows an improvement over the other metalearning approach and is very competitive with the other four DSC methods. © Springer International Publishing AG 2016.

2015

Combining regression models and metaheuristics to optimize space allocation in the retail industry

Authors
Pinto, F; Soares, C; Brazdil, P;

Publication
INTELLIGENT DATA ANALYSIS

Abstract
Data Mining (DM) researchers often focus on the development and testing of models for a single decision (e.g., direct mailing, churn detection, etc.). In practice, however, multiple decisions have often to be made simultaneously which are not independent and the best global solution is often not the combination of the best individual solutions. This problem can be addressed by searching for the overall best solution by using optimization methods based on the predictions made by the DM models. We describe one case study were this approach was used to optimize the layout of a retail store in order to maximize predicted sales. A metaheuristic is used to search different hypothesis of space allocations for multiple product categories, guided by the predictions made by regression models that estimate the sales for each category based on the assigned space. We test three metaheuristics and three regression algorithms on this task. Results show that the Particle Swam Optimization method guided by the models obtained with Random Forests and Support Vector Machines models obtain good results. We also provide insights about the relationship between the correctness of the regression models and the metaheuristics performance.

2014

An Empirical Methodology to Analyze the Behavior of Bagging

Authors
Pinto, F; Soares, C; Mendes Moreira, J;

Publication
ADVANCED DATA MINING AND APPLICATIONS, ADMA 2014

Abstract
In this paper we propose and apply a methodology to study the relationship between the performance of bagging and the characteristics of the bootstrap samples. The methodology consists of 1) an extensive set of experiments to estimate the empirical distribution of performance of the population of all possible ensembles that can be created with those bootstraps and 2) a metalearning approach to analyze that distribution based on characteristics of the bootstrap samples and their relationship with the complete training set. Given the large size of the population of all ensembles, we empirically show that it is possible to apply the methodology to a sample. We applied the methodology to 53 classification datasets for ensembles of 20 and 100 models. Our results show that diversity is crucial for an important bootstrap and we show evidence of a metric that can measure diversity without any learning process involved. We also found evidence that the best bootstraps have a predictive power very similar to the one presented by the training set using naive models.