Publications

Publications by Carlos Manuel Soares

2024

Systematic Analysis of the Impact of Label Noise Correction on ML Fairness

Authors
Silva, IOE; Soares, C; Sousa, I; Ghani, R;

Publication
ADVANCES IN ARTIFICIAL INTELLIGENCE, AI 2023, PT II

Abstract
Arbitrary, inconsistent, or faulty decision-making raises serious concerns, and preventing unfair models is an increasingly important challenge in Machine Learning. Data often reflect past discriminatory behavior, and models trained on such data may reflect bias on sensitive attributes, such as gender, race, or age. One approach to developing fair models is to preprocess the training data to remove the underlying biases while preserving the relevant information, for example, by correcting biased labels. While multiple label noise correction methods are available, the information about their behavior in identifying discrimination is very limited. In this work, we develop an empirical methodology to systematically evaluate the effectiveness of label noise correction techniques in ensuring the fairness of models trained on biased datasets. Our methodology involves manipulating the amount of label noise and can be used with fairness benchmarks but also with standard ML datasets. We apply the methodology to analyze six label noise correction methods according to several fairness metrics on standard OpenML datasets. Our results suggest that the Hybrid Label Noise Correction [20] method achieves the best trade-off between predictive performance and fairness. Clustering-Based Correction [14] can reduce discrimination the most, however, at the cost of lower predictive performance.

CloseRead Abstract

2023

Exploring the Reduction of Configuration Spaces of Workflows

Authors
Freitas, F; Brazdil, P; Soares, C;

Publication
DS

Abstract
Many current AutoML platforms include a very large space of alternatives (the configuration space) that make it difficult to identify the best alternative for a given dataset. In this paper we explore a method that can reduce a large configuration space to a significantly smaller one and so help to reduce the search time for the potentially best workflow. We empirically validate the method on a set of workflows that include four ML algorithms (SVM, RF, LogR and LD) with different sets of hyperparameters. Our results show that it is possible to reduce the given space by more than one order of magnitude, from a few thousands to tens of workflows, while the risk that the best workflow is eliminated is nearly zero. The system after reduction is about one order of magnitude faster than the original one, but still maintains the same predictive accuracy and loss.

CloseRead Abstract

2022

Machine Learning Data Markets: Trading Data using a Multi-Agent System

Authors
Baghcheband, H; Soares, C; Reis, LP;

Publication
2022 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY, WI-IAT

Abstract
The amount of data produced by distributed devices, such as smart devices and the IoT, is increasing continuously. The cost of transmitting data and also distributed computing power raise interest in distributed data mining (DDM). However, in a pure DDM scenario, data availability may not be enough to generate reliable models in a distributed environment. So, the ability to exchange data efficiently and effectively will become a crucial component of DDM. In this paper, we propose the concept of the Machine Learning Data Market (MLDM), a framework for the exchange of data among autonomous agents. We consider a set of learning agents in a cooperative distributed ML, where agents negotiate data to improve the models they use locally. In the proposed data market, the system's predictive accuracy is investigated, as well as the economic value of data. The question addressed in this paper is: How data exchange among the agents will improve the accuracy of the learning model. Agent budget is defined as a limitation of negotiation. We defined a multi-agent system with negotiation and assessed it against the multi-agent system baseline and the single-agent system. The proposed framework is analyzed based on the different sizes of batch data collected over time to find out how this changes the effect of the negotiation on the accuracy of the model. The results indicate that even simple negotiation among agents increases their learning accuracy.

CloseRead Abstract

2011

Combining Meta-learning and Active Selection of Datasetoids for Algorithm Selection

Authors
Prudencio, RBC; Soares, C; Ludermir, TB;

Publication
HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, PART I

Abstract
Several meta-learning approaches have been developed for the problem of algorithm selection. In this context, it is of central importance to collect a sufficient number of datasets to be used as meta-examples in order to provide reliable results. Recently, some proposals to generate datasets have addressed this issue with successful results. These proposals include datasetoids, which is a simple manipulation method to obtain new datasets from existing ones. However, the increase in the number of datasets raises another issue: in order to generate meta-examples for training, it is necessary to estimate the performance of the algorithms on the datasets. This typically requires running all candidate algorithms on all datasets, which is computationally very expensive. One approach to address this problem is the use of active learning, termed active meta-learning. In this paper we investigate the combined use of active meta-learning and datasetoids. Our results show that it is possible to significantly reduce the computational cost of generating meta-examples not only without loss of meta-learning accuracy but with potential gains.

CloseRead Abstract

2009

Metalearning - Applications to Data Mining

Authors
Brazdil, P; Giraud Carrier, CG; Soares, C; Vilalta, R;

Publication
Cognitive Technologies

Abstract

2003

Ranking learning algorithms: Using IBL and meta-learning on accuracy and time results

Authors
Brazdil, PB; Soares, C; Da Costa, JP;

Publication
MACHINE LEARNING

Abstract
We present a meta-learning method to support selection of candidate learning algorithms. It uses a k-Nearest Neighbor algorithm to identify the datasets that are most similar to the one at hand. The distance between datasets is assessed using a relatively small set of data characteristics, which was selected to represent properties that affect algorithm performance. The performance of the candidate algorithms on those datasets is used to generate a recommendation to the user in the form of a ranking. The performance is assessed using a multicriteria evaluation measure that takes not only accuracy, but also time into account. As it is not common in Machine Learning to work with rankings, we had to identify and adapt existing statistical techniques to devise an appropriate evaluation methodology. Using that methodology, we show that the meta-learning method presented leads to significantly better rankings than the baseline ranking method. The evaluation methodology is general and can be adapted to other ranking problems. Although here we have concentrated on ranking classification algorithms, the meta-learning framework presented can provide assistance in the selection of combinations of methods or more complex problem solving strategies.

CloseRead Abstract