Details
Name
Pedro Henriques AbreuRole
External Research CollaboratorSince
01st December 2023
Nationality
PortugalCentre
Artificial Intelligence and Decision SupportContacts
+351220402963
pedro.h.abreu@inesctec.pt
2026
Authors
Salazar, T; Araujo, H; Cano, A; Abreu, PH;
Publication
ARTIFICIAL INTELLIGENCE REVIEW
Abstract
Group fairness in machine learning is an important area of research focused on achieving equitable outcomes across different groups defined by sensitive attributes such as race or gender. Federated learning, a decentralized approach to training machine learning models across multiple clients, amplifies the need for fairness methodologies due to its inherent heterogeneous data distributions that can exacerbate biases. The intersection of federated learning and group fairness has attracted significant interest, with 48 research works specifically dedicated to addressing this issue. However, no comprehensive survey has specifically focused on group fairness in Federated Learning. In this work, we analyze the key challenges of this topic, propose practices for its identification and benchmarking, and create a novel taxonomy based on criteria such as data partitioning, location, and strategy. Furthermore, we analyze broader concerns, review how different approaches handle the complexities of various sensitive attributes, examine common datasets and applications, and discuss the ethical, legal, and policy implications of group fairness in FL. We conclude by highlighting key areas for future research, emphasizing the need for more methods to address the complexities of achieving group fairness in federated systems.
2026
Authors
Salazar, T; Gama, J; Araújo, H; Abreu, PH;
Publication
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
Abstract
In the evolving field of machine learning, ensuring group fairness has become a critical concern, prompting the development of algorithms designed to mitigate bias in decision-making processes. Group fairness refers to the principle that a model's decisions should be equitable across different groups defined by sensitive attributes such as gender or race, ensuring that individuals from privileged groups and unprivileged groups are treated fairly and receive similar outcomes. However, achieving fairness in the presence of group-specific concept drift remains an unexplored frontier, and our research represents pioneering efforts in this regard. Group-specific concept drift refers to situations where one group experiences concept drift over time, while another does not, leading to a decrease in fairness even if accuracy (ACC) remains fairly stable. Within the framework of federated learning (FL), where clients collaboratively train models, its distributed nature further amplifies these challenges since each client can experience group-specific concept drift independently while still sharing the same underlying concept, creating a complex and dynamic environment for maintaining fairness. The most significant contribution of our research is the formalization and introduction of the problem of group-specific concept drift and its distributed counterpart, shedding light on its critical importance in the field of fairness. In addition, leveraging insights from prior research, we adapt an existing distributed concept drift adaptation algorithm to tackle group-specific distributed concept drift, which uses a multimodel approach, a local group-specific drift detection mechanism, and continuous clustering of models over time. The findings from our experiments highlight the importance of addressing group-specific concept drift and its distributed counterpart to advance fairness in machine learning.
2026
Authors
Chong, CF; Guo, JL; Yang, X; Ke, W; Abreu, PH; Wang, YP; Im, SK;
Publication
PATTERN RECOGNITION
Abstract
Multi-label image classification datasets are often partially labeled where many labels are missing, posing a significant challenge to training accurate deep classifiers. Most existing approaches assume the missing labels as negatives and/or exploit image and category relationships to regularize training. Orthogonally, this paper studies blending samples in such incomplete datasets as new samples, extending the training data magnitude to increase generalization. First, the proposed LogicMix mixes multiple partially labeled samples to produce new samples, where their unknown labels are naturally mixed by OR's logical equivalences, without replacement with constants. Subsequently, a Decouple Partial-Asymmetric Loss is proposed to assign separate label-focusing policies to original and new samples, addressing the learning imbalance from the different positive-negative label imbalances between original and augmented samples. Finally, we propose a complete learning framework called 2WayAug-PL. LogicMix and conventional data augmentation collaborate to extend the diversity of new samples in both the sample-sample relation and human prior knowledge, while pseudo-labeling compensates for the lack of labels to provide more supervision signals. 27 partially labeled dataset scenarios derived from three benchmarking datasets with various learning difficulties are utilized for comprehensive experiments. LogicMix has shown remarkable effectiveness and generality in improving mAP against compared sample-mixing data augmentation methods. In particular, 2WayAug-PL achieves state-of-the-art average mAP of 84.3%, 50.1 %, and 93.8% on MS-COCO, VG-200, and Pascal VOC 2007, respectively. It further pushes the previous best performance achieved by different frameworks by 0.6% (CFT), 0.6% (CFT), and 0.1 % (SR). Moreover, 2WayAug-PL significantly outperforms all compared frameworks, as shown by statistical tests. Code is available at: https://github.com/maxium0526/logic_mix.
2026
Authors
Chong, CF; Yang, X; Wang, YP; Abreu, PH;
Publication
NEUROCOMPUTING
Abstract
Multi-label image classification models often inevitably learn on partially labeled datasets, where a considerable proportion of labels are missing. However, the popular PyTorch deep learning ecosystem is less compatible with training on partially labeled datasets, as many built-in functions like loss functions and metrics do not work correctly or raise errors when unknown labels are present. To this end, we present an original and easy-to-install Python package called mlcpl, which expands the PyTorch ecosystem to offer a friendly environment for learning with partially labeled datasets. The package provides a series of multi-label loss functions and metrics that are compatible with unknown labels. Seven recently proposed approaches are also implemented for the convenient use of cutting-edge techniques. In addition, eleven dataset loading functions, followed by three partial label simulation schemes, expedite the development of experiments. Furthermore, these functions are simple to use, have a PyTorch-like interface, and can collaborate well with other PyTorch components. Several examples of experiments with mlcpl are also provided for demonstration. We wish the release of this package could facilitate relevant academic research and real-world applications. The source code is available at https://github.com/ maxium0526/mlcpl.
2026
Authors
Guo, JL; Ng, BK; Lam, CT; Abreu, PH;
Publication
INFORMATION FUSION
Abstract
Solar photovoltaic (PV) power generation has become one of the most widely adopted forms of clean energy worldwide. In large-scale PV farm operation and maintenance, unmanned aerial vehicles equipped with thermal infrared (TIR) cameras are increasingly used to enable automated fault detection and classification. However, the long imaging distance and the inherently low resolution of TIR images often lead to fault patterns appearing with low contrast, making subtle discriminative features difficult to extract and posing significant challenges to achieving highly accurate fault identification and classification. To address these challenges, we propose GEPFNet, a network that exploits Group Equivariant Convolutions to explicitly model the geometric structures of faults, incorporates multi-scale processing with unified local-global contextual representations, and adopts a parallel feature fusion strategy to integrate multi-level features and enhance contextual utilization effectively. The design of feature extraction and fusion mechanisms ensures the proposed GEPFNet achieves strong robustness and generalization under complex operational conditions. The effectiveness of GEPFNet was validated on two public datasets with distinct resolutions, class distributions, and feature characteristics: PVF-10 and the Infrared Solar Module (ISM) dataset. Extensive experiments and statistical analyses demonstrate that the proposed GEPFNet achieves state-of-the-art performance on the PVF-10 dataset, obtaining an accuracy of 96.05 %+/- 0.42 for the 2-Class task and 94.64 %+/- 0.35 for the 10-Class task. On the ISM dataset, GEPFNet achieves an improvement of approximately 5 % over the baseline models. Moreover, under highly imbalanced data distributions, the proposed GEPFNet achieves average accuracy improvements of 5.83% and 3.82% on PVF-10 and ISM, respectively, further demonstrating its capability to enhance class-wise performance. With only 9.51 GFLOPs, GEPFNet also exhibits notable computational efficiency, making it well suited for PV fault classification in TIR imagery.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.