Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por Pedro Henriques Abreu

2022

Interpretability of Machine Intelligence in Medical Image Computing - 5th International Workshop, iMIMIC 2022, Held in Conjunction with MICCAI 2022, Singapore, Singapore, September 22, 2022, Proceedings

Autores
Reyes, M; Abreu, PH; Cardoso, JS;

Publicação
iMIMIC@MICCAI

Abstract

2021

Interpretability of Machine Intelligence in Medical Image Computing, and Topological Data Analysis and Its Applications for Medical Data - 4th International Workshop, iMIMIC 2021, and 1st International Workshop, TDA4MedicalData 2021, Held in Conjunction with MICCAI 2021, Strasbourg, France, September 27, 2021, Proceedings

Autores
Reyes, M; Abreu, PH; Cardoso, JS; Hajij, M; Zamzmi, G; Paul, R; Thakur, L;

Publicação
iMIMIC/TDA4MedicalData@MICCAI

Abstract

2025

Studying the robustness of data imputation methodologies against adversarial attacks

Autores
Mangussi, AD; Pereira, RC; Lorena, AC; Santos, MS; Abreu, PH;

Publicação
COMPUTERS & SECURITY

Abstract
Cybersecurity attacks, such as poisoning and evasion, can intentionally introduce false or misleading information in different forms into data, potentially leading to catastrophic consequences for critical infrastructures, like water supply or energy power plants. While numerous studies have investigated the impact of these attacks on model-based prediction approaches, they often overlook the impurities present in the data used to train these models. One of those forms is missing data, the absence of values in one or more features. This issue is typically addressed by imputing missing values with plausible estimates, which directly impacts the performance of the classifier. The goal of this work is to promote a Data-centric AI approach by investigating how different types of cybersecurity attacks impact the imputation process. To this end, we conducted experiments using four popular evasion and poisoning attacks strategies across 29 real-world datasets, including the NSL-KDD and Edge-IIoT datasets, which were used as case study. For the adversarial attack strategies, we employed the Fast Gradient Sign Method, Carlini & Wagner, Project Gradient Descent, and Poison Attack against Support Vector Machine algorithm. Also, four state-of-the-art imputation strategies were tested under Missing Not At Random, Missing Completely at Random, and Missing At Random mechanisms using three missing rates (5%, 20%, 40%). We assessed imputation quality using MAE, while data distribution shifts were analyzed with the Kolmogorov-Smirnov and Chi-square tests. Furthermore, we measured classification performance by training an XGBoost classifier on the imputed datasets, using F1-score, Accuracy, and AUC. To deepen our analysis, we also incorporated six complexity metrics to characterize how adversarial attacks and imputation strategies impact dataset complexity. Our findings demonstrate that adversarial attacks significantly impact the imputation process. In terms of imputation assessment in what concerns to quality error, the scenario that enrolees imputation with Project Gradient Descent attack proved to be more robust in comparison to other adversarial methods. Regarding data distribution error, results from the Kolmogorov-Smirnov test indicate that in the context of numerical features, all imputation strategies differ from the baseline (without missing data) however for the categorical context Chi-Squared test proved no difference between imputation and the baseline.

2025

QIDLEARNINGLIB: A Python library for quasi-identifier recognition and evaluation

Autores
Simoes, SA; Vilela, JP; Santos, MS; Abreu, PH;

Publicação
NEUROCOMPUTING

Abstract
Quasi-identifiers (QIDs) are attributes in a dataset that are not directly unique identifiers of the users/entities themselves but can be used, often in conjunction with other datasets or information, to identify individuals and thus present a privacy risk in data sharing and analysis. Identifying QIDs is important in developing proper strategies for anonymization and data sanitization. This paper proposes QIDLEARNINGLIB, a Python library that offers a set of metrics and tools to measure the qualities of QIDs and identify them in data sets. It incorporates metrics from different domains-causality, privacy, data utility, and performance-to offer a holistic assessment of the properties of attributes in a given tabular dataset. Furthermore, QIDLEARNINGLIB offers visual analysis tools to present how these metrics shift over a dataset and implements an extensible framework that employs multiple optimization algorithms such as an evolutionary algorithm, simulated annealing, and greedy search using these metrics to identify a meaningful set of QIDs.

2019

Going Back to Basics on Volumetric Segmentation of the Lungs in CT: A Fully Image Processing Based Technique

Autores
Oliveira, AC; Domingues, I; Duarte, H; Santos, J; Abreu, PH;

Publicação
PATTERN RECOGNITION AND IMAGE ANALYSIS, IBPRIA 2019, PT II

Abstract
Radiotherapy planning is a crucial task in cancer patients’ management. This task is, however, very time consuming and prone to a high intra and inter subject variance and human errors. In this way, the present line of work aims at developing a tool to help the specialists in this task. The developed tool will consider the delimitation of anatomical regions of interest, since it is crucial to identify the organs at risk and minimize the exposure of these organs to the radiation. This paper, in particular, presents a lung segmentation algorithm, based on image processing techniques, such as intensity projection and region growing, for Computed Tomography volumes. Our pipeline consists in first separating two halves of the volume to isolate each lung. Then, three techniques for seed placement are developed. Finally, a traditional region growing algorithm has been changed in order to automatically derive the value of the threshold parameter. The results obtained for the three different techniques for seed placement were, respectively, 74%, 74% and 92% of DICE with the Iterative Region Growing algorithm. Although the presented results have as use case the Hodgkin Lymphoma, we believe that the developed method is generalizable to any other pathology.

2018

Registration of CT with PET: A Comparison of Intensity-Based Approaches

Autores
Pereira, G; Domingues, I; Martins, P; Abreu, PH; Duarte, H; Santos, J;

Publicação
COMBINATORIAL IMAGE ANALYSIS, IWCIA 2018

Abstract
The integration of functional imaging modality provided by Positron Emission Tomography (PET) and associated anatomical imaging modality provided by Computed Tomography (CT) has become an essential procedure both in the evaluation of different types of malignancy and in radiotherapy planning. The alignment of these two exams is thus of great importance. In this research work, three registration approaches (1) intensity-based registration, (2) rigid translation followed by intensity-based registration and (3) coarse registration followed by fine-tuning were evaluated and compared. To characterize the performance of these methods, 161 real volume scans from patients involved in Hodgkin Lymphoma staging were used: CT volumes used for radiotherapy planning were registered with PET volumes before any treatment. Registration results achieved 78%, 60%, and 91% of accuracy for methods (1), (2) and (3), respectively. Registration methods validation was extended to a corresponding landmarks points distance calculation. Methods (1), (2) and (3) achieved a median improvement registration rate of 66% mm, 51% mm and 70% mm, respectively. The accuracy of the proposed methods was further confirmed by extending our experiments to other multimodal datasets and in a monomodal dataset with different acquisition conditions.

  • 5
  • 20