Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por Pedro Henriques Abreu

2017

Influence of data distribution in missing data imputation

Autores
Santos M.S.; Soares J.P.; Abreu P.H.; Araújo H.; Santos J.;

Publicação
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
Dealing with missing data is a crucial step in the preprocessing stage of most data mining projects. Especially in healthcare contexts, addressing this issue is fundamental, since it may result in keeping or loosing critical patient information that can help physicians in their daily clinical practice. Over the years, many researchers have addressed this problem, basing their approach on the implementation of a set of imputation techniques and evaluating their performance in classification tasks. These classic approaches, however, do not consider some intrinsic data information that could be related to the performance of those algorithms, such as features’ distribution. Establishing a correspondence between data distribution and the most proper imputation method avoids the need of repeatedly testing a large set of methods, since it provides a heuristic on the best choice for each feature in the study. The goal of this work is to understand the relationship between data distribution and the performance of well-known imputation techniques, such as Mean, Decision Trees, k-Nearest Neighbours, Self-Organizing Maps and Support Vector Machines imputation. Several publicly available datasets, all complete, were selected attending to several characteristics such as number of distributions, features and instances. Missing values were artificially generated at different percentages and the imputation methods were evaluated in terms of Predictive and Distributional Accuracy. Our findings show that there is a relationship between features’ distribution and algorithms’ performance, although some factors must be taken into account, such as the number of features per distribution and the missing rate at state.

2024

Siamese Autoencoder Architecture for the Imputation of Data Missing Not at Random

Autores
Pereira, RC; Abreu, PH; Rodrigues, PP;

Publicação
JOURNAL OF COMPUTATIONAL SCIENCE

Abstract
Missing data is an issue that can negatively impact any task performed with the available data and it is often found in real -world domains such as healthcare. One of the most common strategies to address this issue is to perform imputation, where the missing values are replaced by estimates. Several approaches based on statistics and machine learning techniques have been proposed for this purpose, including deep learning architectures such as generative adversarial networks and autoencoders. In this work, we propose a novel siamese neural network suitable for missing data imputation, which we call Siamese Autoencoder-based Approach for Imputation (SAEI). Besides having a deep autoencoder architecture, SAEI also has a custom loss function and triplet mining strategy that are tailored for the missing data issue. The proposed SAEI approach is compared to seven state-of-the-art imputation methods in an experimental setup that comprises 14 heterogeneous datasets of the healthcare domain injected with Missing Not At Random values at a rate between 10% and 60%. The results show that SAEI significantly outperforms all the remaining imputation methods for all experimented settings, achieving an average improvement of 35%. This work is an extension of the article Siamese Autoencoder-Based Approach for Missing Data Imputation [1] presented at the International Conference on Computational Science 2023. It includes new experiments focused on runtime, generalization capabilities, and the impact of the imputation in classification tasks, where the results show that SAEI is the imputation method that induces the best classification results, improving the F1 scores for 50% of the used datasets.

2024

Imputation of data Missing Not at Random: Artificial generation and benchmark analysis

Autores
Pereira, RC; Abreu, PH; Rodrigues, PP; Figueiredo, MAT;

Publicação
EXPERT SYSTEMS WITH APPLICATIONS

Abstract
Experimental assessment of different missing data imputation methods often compute error rates between the original values and the estimated ones. This experimental setup relies on complete datasets that are injected with missing values. The injection process is straightforward for the Missing Completely At Random and Missing At Random mechanisms; however, the Missing Not At Random mechanism poses a major challenge, since the available artificial generation strategies are limited. Furthermore, the studies focused on this latter mechanism tend to disregard a comprehensive baseline of state-of-the-art imputation methods. In this work, both challenges are addressed: four new Missing Not At Random generation strategies are introduced and a benchmark study is conducted to compare six imputation methods in an experimental setup that covers 10 datasets and five missingness levels (10% to 80%). The overall findings are that, for most missing rates and datasets, the best imputation method to deal with Missing Not At Random values is the Multiple Imputation by Chained Equations, whereas for higher missingness rates autoencoders show promising results.

2025

Unveiling Group-Specific Distributed Concept Drift: A Fairness Imperative in Federated Learning

Autores
Salazar, T; Gama, J; Araújo, H; Abreu, PH;

Publicação
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Abstract
In the evolving field of machine learning, ensuring group fairness has become a critical concern, prompting the development of algorithms designed to mitigate bias in decision-making processes. Group fairness refers to the principle that a model's decisions should be equitable across different groups defined by sensitive attributes such as gender or race, ensuring that individuals from privileged groups and unprivileged groups are treated fairly and receive similar outcomes. However, achieving fairness in the presence of group-specific concept drift remains an unexplored frontier, and our research represents pioneering efforts in this regard. Group-specific concept drift refers to situations where one group experiences concept drift over time, while another does not, leading to a decrease in fairness even if accuracy (ACC) remains fairly stable. Within the framework of federated learning (FL), where clients collaboratively train models, its distributed nature further amplifies these challenges since each client can experience group-specific concept drift independently while still sharing the same underlying concept, creating a complex and dynamic environment for maintaining fairness. The most significant contribution of our research is the formalization and introduction of the problem of group-specific concept drift and its distributed counterpart, shedding light on its critical importance in the field of fairness. In addition, leveraging insights from prior research, we adapt an existing distributed concept drift adaptation algorithm to tackle group-specific distributed concept drift, which uses a multimodel approach, a local group-specific drift detection mechanism, and continuous clustering of models over time. The findings from our experiments highlight the importance of addressing group-specific concept drift and its distributed counterpart to advance fairness in machine learning.

2023

Evaluating Post-hoc Interpretability with Intrinsic Interpretability

Autores
Amorim, JP; Abreu, PH; Santos, JAM; Müller, H;

Publicação
CoRR

Abstract

2023

Bone Metastases Detection in Patients with Breast Cancer: Does Bone Scintigraphy Add Information to PET/CT?

Autores
Santos, JC; Abreu, MH; Santos, MS; Duarte, H; Alpoim, T; Próspero, I; Sousa, S; Abreu, PH;

Publicação
ONCOLOGIST

Abstract
This article compares the effectiveness of the PET/CT scan and bone scintigraphy for the detection of bone metastases in patients with breast cancer. Background Positron emission tomography/computed tomography (PET/CT) has become in recent years a tool for breast cancer (BC) staging. However, its accuracy to detect bone metastases is classically considered inferior to bone scintigraphy (BS). The purpose of this work is to compare the effectiveness of bone metastases detection between PET/CT and BS. Materials and Methods Prospective study of 410 female patients treated in a Comprehensive Cancer Center between 2014 and 2020 that performed PET/CT and BS for staging purposes. The image analysis was performed by 2 senior nuclear medicine physicians. The comparison was performed based on accuracy, sensitivity, and specificity on a patient and anatomical region level and was assessed using McNemar's Test. An average ROC was calculated for the anatomical region analysis. Results PET/CT presented higher values of accuracy and sensitivity (98.0% and 93.83%), surpassing BS (95.61% and 81.48%) in detecting bone disease. There was a significant difference in favor of PET/CT (sensitivity 93.83% vs. 81.48%), however, there is no significant difference in eliminating false positives (specificity 99.09% vs. 99.09%). PET/CT presented the highest accuracy and sensitivity values for most of the bone segments, only surpassed by BS for the cranium. There was a significant difference in favor of PET/CT in the upper limb, spine, thorax (sternum) and lower limb (pelvis and sacrum), and in favor of BS in the cranium. The ROC showed that PET/CT has a higher sensitivity and consistency across the bone segments. Conclusion With the correct imaging protocol, PET/CT does not require BS for patients with BC staging.

  • 11
  • 21