Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Tópicos
de interesse
Detalhes

Detalhes

  • Nome

    Pedro Henriques Abreu
  • Cargo

    Investigador Colaborador Externo
  • Desde

    01 dezembro 2023
Publicações

2025

Unveiling Group-Specific Distributed Concept Drift: A Fairness Imperative in Federated Learning

Autores
Salazar, T; Gama, J; Araújo, H; Abreu, PH;

Publicação
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Abstract
In the evolving field of machine learning, ensuring group fairness has become a critical concern, prompting the development of algorithms designed to mitigate bias in decision-making processes. Group fairness refers to the principle that a model's decisions should be equitable across different groups defined by sensitive attributes such as gender or race, ensuring that individuals from privileged groups and unprivileged groups are treated fairly and receive similar outcomes. However, achieving fairness in the presence of group-specific concept drift remains an unexplored frontier, and our research represents pioneering efforts in this regard. Group-specific concept drift refers to situations where one group experiences concept drift over time, while another does not, leading to a decrease in fairness even if accuracy (ACC) remains fairly stable. Within the framework of federated learning (FL), where clients collaboratively train models, its distributed nature further amplifies these challenges since each client can experience group-specific concept drift independently while still sharing the same underlying concept, creating a complex and dynamic environment for maintaining fairness. The most significant contribution of our research is the formalization and introduction of the problem of group-specific concept drift and its distributed counterpart, shedding light on its critical importance in the field of fairness. In addition, leveraging insights from prior research, we adapt an existing distributed concept drift adaptation algorithm to tackle group-specific distributed concept drift, which uses a multimodel approach, a local group-specific drift detection mechanism, and continuous clustering of models over time. The findings from our experiments highlight the importance of addressing group-specific concept drift and its distributed counterpart to advance fairness in machine learning.

2025

Guidelines for designing visualization tools for group fairness analysis in binary classification

Autores
Cruz, A; Salazar, T; Carvalho, M; Maças, C; Machado, P; Abreu, PH;

Publicação
ARTIFICIAL INTELLIGENCE REVIEW

Abstract
The use of machine learning in decision-making has become increasingly pervasive across various fields, from healthcare to finance, enabling systems to learn from data and improve their performance over time. The transformative impact of these new technologies warrants several considerations that demand the development of modern solutions through responsible artificial intelligence-the incorporation of ethical principles into the creation and deployment of AI systems. Fairness is one such principle, ensuring that machine learning algorithms do not produce biased outcomes or discriminate against any group of the population with respect to sensitive attributes, such as race or gender. In this context, visualization techniques can help identify data imbalances and disparities in model performance across different demographic groups. However, there is a lack of guidance towards clear and effective representations that support entry-level users in fairness analysis, particularly when considering that the approaches to fairness visualization can vary significantly. In this regard, the goal of this work is to present a comprehensive analysis of current tools directed at visualizing and examining group fairness in machine learning, with a focus on both data and binary classification model outcomes. These visualization tools are reviewed and discussed, concluding with the proposition of a focused set of visualization guidelines directed towards improving the comprehensibility of fairness visualizations.

2025

Reparameterization convolutional neural networks for handling imbalanced datasets in solar panel fault classification

Autores
Guo, J; Chong, CF; Abreu, PH; Mao, C; Li, J; Lam, CT; Ng, BK;

Publicação
Eng. Appl. Artif. Intell.

Abstract
Solar photovoltaic technology has grown significantly as a renewable energy, with unmanned aerial vehicles equipped with thermal infrared cameras effectively inspecting solar panels. However, long-distance capture and low-resolution infrared cameras make the targets small, complicating feature extraction. Additionally, the large number of normal photovoltaic modules results in a significant imbalance in the dataset. Furthermore, limited computing resources on unmanned aerial vehicles further challenge real-time fault classification. These factors limit the performance of current fault classification systems for solar panels. The multi-scale and multi-branch Reparameterization of convolutional neural networks can improve model performance while reducing computational demands at the deployment stage, making them suitable for practical applications. This study proposes an efficient framework based on reparameterization for infrared solar panel fault classification. We propose a Proportional Balanced Weight asymmetric loss function to address the class imbalance and employ multi-branch, multi-scale convolutional kernels for extracting tiny features from low-resolution images. The designed models were trained with Exponential Moving Average for better performance and reparameterized for efficient deployment. We evaluated the designed models using the Infrared Solar Module dataset. The proposed framework achieved an accuracy of 83.8% for the 12-Class classification task and 74.0% for the 11-Class task, both without data augmentation to enhance generalization. The accuracy improvements of up to 16.4% and F1-Score gains of up to 18.7%. Additionally, we achieved an inference speed that is 3.4 times faster than the training speed, while maintaining high fault classification performance. © 2025 Elsevier Ltd

2025

Assessing Adversarial Effects of Noise in Missing Data Imputation

Autores
Mangussi, AD; Pereira, RC; Abreu, PH; Lorena, AC;

Publicação
INTELLIGENT SYSTEMS, BRACIS 2024, PT I

Abstract
In real-world scenarios, a wide variety of datasets contain inconsistencies. One example of such inconsistency is missing data (MD), which refers to the absence of information in one or more variables. Missing imputation strategies emerged as a possible solution for addressing this problem, which can replace the missing values based on mean, median, or Machine Learning (ML) techniques. The performance of such strategies depends on multiple factors. One factor that influences the missing value imputation (MVI) methods is the presence of noisy instances, described as anything that obscures the relationship between the features of an instance and its class, having an adversarial effect. However, the interaction between MD and noisy instances has received little attention in the literature. This work fills this gap by investigating missing and noisy data interplay. Our experimental setup begins with generating missingness under the Missing Not at Random (MNAR) mechanism in a multivariate scenario and performing imputation using seven state-of-the-art MVI methods. Our methodology involves applying a noise filter before performing the imputation task and evaluating the quality of the imputation directly. Additionally, we measure the classification performance with the new estimates. This approach is applied to both synthetic data and 11 real-world datasets. The effects of noise filtering before imputation are evaluated. The results show that noise preprocessing before the imputation task improves the imputation quality and the classification performance for imputed datasets.

2025

mdatagen: A python library for the artificial generation of missing data

Autores
Mangussi, AD; Santos, MS; Lopes, FL; Pereira, RC; Lorena, AC; Abreu, PH;

Publicação
NEUROCOMPUTING

Abstract
Missing data is characterized by the presence of absent values in data (i.e., missing values) and it is currently categorized into three different mechanisms: Missing Completely at Random, Missing At Random, and Missing Not At Random. When performing missing data experiments and evaluating techniques to handle absent values, these mechanisms are often artificially generated (a process referred to as data amputation) to assess the robustness and behavior of the used methods. Due to the lack of a standard benchmark for data amputation, different implementations of the mechanisms are used in related research (some are often not disclaimed), preventing the reproducibility of results and leading to an unfair or inaccurate comparison between existing and new methods. Moreover, for users outside the field, experimenting with missing data or simulating the appearance of missing values in real-world domains is unfeasible, impairing stress testing in machine learning systems. This work introduces mdatagen, an open source Python library for the generation of missing data mechanisms across 20 distinct scenarios, following different univariate and multivariate implementations of the established missing mechanisms. The package therefore fosters reproducible results across missing data experiments and enables the simulation of artificial missing data under flexible configurations, making it very versatile to mimic several real-world applications involving missing data. The source code and detailed documentation for mdatagen are available at https://github.com/ArthurMangussi/pymdatagen.