Publications

Publications by Carlos Manuel Soares

2016

Can Metalearning Be Applied to Transfer on Heterogeneous Datasets?

Authors
Felix, C; Soares, C; Jorge, A;

Publication
Hybrid Artificial Intelligent Systems

Abstract
Machine learning processes consist in collecting data, obtaining a model and applying it to a given task. Given a new task, the standard approach is to restart the learning process and obtain a new model. However, previous learning experience can be exploited to assist the new learning process. The two most studied approaches for this are meta-learning and transfer learning. Metalearning can be used for selecting the predictive model to use on a new dataset. Transfer learning allows the reuse of knowledge from previous tasks. However, when multiple heterogeneous tasks are available as potential sources for transfer, the question is which one to use. One approach to address this problem is metalearning. In this paper we investigate the feasibility of this approach. We propose a method to transfer weights from a source trained neural network to initialize a network that models a potentially very different target dataset. Our experiments with 14 datasets indicate that this method enables faster convergence without significant difference in accuracy provided that the source task is adequately chosen. This means that there is potential for applying metalearning to support transfer between heterogeneous datasets.

CloseRead Abstract

2016

Collaborative Data Analysis in Hyperconnected Transportation Systems

Authors
Zarmehri, MN; Soares, C;

Publication
COLLABORATION IN A HYPERCONNECTED WORLD

Abstract
Taxi trip duration affects the efficiency of operation, the satisfaction of drivers, and, mainly, the satisfaction of the customers, therefore, it is an important metric for the taxi companies. Especially, knowing the predicted trip duration beforehand is very useful to allocate taxis to the taxi stands and also finding the best route for different trips. The existence of hyperconnected network can help to collect data from connected taxis in the city environment and use it collaboratively between taxis for a better prediction. As a matter of fact, the existence of high volume of data, for each individual taxi, several models can be generated. Moreover, taking into account the difference between the data collected by taxis, this data can be organized into different levels of hierarchy. However, finding the best level of granularity which leads to the best model for an individual taxi could be computationally expensive. In this paper, the use of metalearning for addressing the problem of selection of the right level of the hierarchy and the right algorithm that generates the model with the best performance for each taxi is proposed. The proposed approach is evaluated by the data collected in the Drive-In project. The results show that metalearning helps the selection of the algorithm with the best performance.

CloseRead Abstract

2016

Combining Boosted Trees with Metafeature Engineering for Predictive Maintenance

Authors
Cerqueira, V; Pinto, F; Sa, C; Soares, C;

Publication
ADVANCES IN INTELLIGENT DATA ANALYSIS XV

Abstract
We describe a data mining workflow for predictive maintenance of the Air Pressure System in heavy trucks. Our approach is composed by four steps: (i) a filter that excludes a subset of features and examples based on the number of missing values (ii) a metafeatures engineering procedure used to create a meta-level features set with the goal of increasing the information on the original data; (iii) a biased sampling method to deal with the class imbalance problem; and (iv) boosted trees to learn the target concept. Results show that the metafeatures engineering and the biased sampling method are critical for improving the performance of the classifier.

CloseRead Abstract

2015

Combining regression models and metaheuristics to optimize space allocation in the retail industry

Authors
Pinto, F; Soares, C; Brazdil, P;

Publication
INTELLIGENT DATA ANALYSIS

Abstract
Data Mining (DM) researchers often focus on the development and testing of models for a single decision (e.g., direct mailing, churn detection, etc.). In practice, however, multiple decisions have often to be made simultaneously which are not independent and the best global solution is often not the combination of the best individual solutions. This problem can be addressed by searching for the overall best solution by using optimization methods based on the predictions made by the DM models. We describe one case study were this approach was used to optimize the layout of a retail store in order to maximize predicted sales. A metaheuristic is used to search different hypothesis of space allocations for multiple product categories, guided by the predictions made by regression models that estimate the sales for each category based on the assigned space. We test three metaheuristics and three regression algorithms on this task. Results show that the Particle Swam Optimization method guided by the models obtained with Random Forests and Support Vector Machines models obtain good results. We also provide insights about the relationship between the correctness of the regression models and the metaheuristics performance.

CloseRead Abstract

2016

Comparing comparables: an approach to accurate cross-country comparisons of health systems for effective healthcare planning and policy guidance

Authors
Lopes, MA; Soares, C; Almeida, A; Almada Lobo, B;

Publication
HEALTH SYSTEMS

Abstract
With rising healthcare costs, using health personnel and resources efficiently and effectively is critical. International cross-country and simple worker-to-population ratio comparisons are frequently used for improving the efficiency of health systems, planning of health human resources and guiding policy changes. These comparisons are made between countries typically of the same continental region. However, if used imprudently, inconsistencies arising from frail comparisons of health systems may outweigh the positive benefits brought by new policy insights. In this work, we propose a different approach to international health system comparisons. We present a methodology to group similar countries in terms of mortality, morbidity, utilisation levels, and human and physical resources, which are all factors that influence health gains. Instead of constructing an absolute rank or comparing against the average, the method finds countries that share similar ground, upon which more reliable comparisons can then be conducted, including performance analysis. We apply this methodology using data from the World Health Organization's Health for All database, and we present some interesting empirical relationships between indicators that may provide new insights into how such information can be used to promote better healthcare planning and policy guidance.

CloseRead Abstract

2015

Distance-Based Decision Tree Algorithms for Label Ranking

Authors
de Sa, CR; Rebelo, C; Soares, C; Knobbe, A;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE

Abstract
The problem of Label Ranking is receiving increasing attention from several research communities. The algorithms that have developed/adapted to treat rankings as the target object follow two different approaches: distribution-based (e.g., using Mallows model) or correlation-based (e.g., using Spearman's rank correlation coefficient). Decision trees have been adapted for label ranking following both approaches. In this paper we evaluate an existing correlation-based approach and propose a new one, Entropy-based Ranking trees. We then compare and discuss the results with a distribution-based approach. The results clearly indicate that both approaches are competitive.

CloseRead Abstract