2022
Authors
Reyes, M; Abreu, PH; Cardoso, JS;
Publication
iMIMIC@MICCAI
Abstract
2021
Authors
Reyes, M; Abreu, PH; Cardoso, JS; Hajij, M; Zamzmi, G; Paul, R; Thakur, L;
Publication
iMIMIC/TDA4MedicalData@MICCAI
Abstract
2025
Authors
Mangussi, AD; Pereira, RC; Lorena, AC; Santos, MS; Abreu, PH;
Publication
COMPUTERS & SECURITY
Abstract
Cybersecurity attacks, such as poisoning and evasion, can intentionally introduce false or misleading information in different forms into data, potentially leading to catastrophic consequences for critical infrastructures, like water supply or energy power plants. While numerous studies have investigated the impact of these attacks on model-based prediction approaches, they often overlook the impurities present in the data used to train these models. One of those forms is missing data, the absence of values in one or more features. This issue is typically addressed by imputing missing values with plausible estimates, which directly impacts the performance of the classifier. The goal of this work is to promote a Data-centric AI approach by investigating how different types of cybersecurity attacks impact the imputation process. To this end, we conducted experiments using four popular evasion and poisoning attacks strategies across 29 real-world datasets, including the NSL-KDD and Edge-IIoT datasets, which were used as case study. For the adversarial attack strategies, we employed the Fast Gradient Sign Method, Carlini & Wagner, Project Gradient Descent, and Poison Attack against Support Vector Machine algorithm. Also, four state-of-the-art imputation strategies were tested under Missing Not At Random, Missing Completely at Random, and Missing At Random mechanisms using three missing rates (5%, 20%, 40%). We assessed imputation quality using MAE, while data distribution shifts were analyzed with the Kolmogorov-Smirnov and Chi-square tests. Furthermore, we measured classification performance by training an XGBoost classifier on the imputed datasets, using F1-score, Accuracy, and AUC. To deepen our analysis, we also incorporated six complexity metrics to characterize how adversarial attacks and imputation strategies impact dataset complexity. Our findings demonstrate that adversarial attacks significantly impact the imputation process. In terms of imputation assessment in what concerns to quality error, the scenario that enrolees imputation with Project Gradient Descent attack proved to be more robust in comparison to other adversarial methods. Regarding data distribution error, results from the Kolmogorov-Smirnov test indicate that in the context of numerical features, all imputation strategies differ from the baseline (without missing data) however for the categorical context Chi-Squared test proved no difference between imputation and the baseline.
2025
Authors
Simoes, SA; Vilela, JP; Santos, MS; Abreu, PH;
Publication
NEUROCOMPUTING
Abstract
Quasi-identifiers (QIDs) are attributes in a dataset that are not directly unique identifiers of the users/entities themselves but can be used, often in conjunction with other datasets or information, to identify individuals and thus present a privacy risk in data sharing and analysis. Identifying QIDs is important in developing proper strategies for anonymization and data sanitization. This paper proposes QIDLEARNINGLIB, a Python library that offers a set of metrics and tools to measure the qualities of QIDs and identify them in data sets. It incorporates metrics from different domains-causality, privacy, data utility, and performance-to offer a holistic assessment of the properties of attributes in a given tabular dataset. Furthermore, QIDLEARNINGLIB offers visual analysis tools to present how these metrics shift over a dataset and implements an extensible framework that employs multiple optimization algorithms such as an evolutionary algorithm, simulated annealing, and greedy search using these metrics to identify a meaningful set of QIDs.
2019
Authors
Oliveira, AC; Domingues, I; Duarte, H; Santos, J; Abreu, PH;
Publication
PATTERN RECOGNITION AND IMAGE ANALYSIS, IBPRIA 2019, PT II
Abstract
Radiotherapy planning is a crucial task in cancer patients’ management. This task is, however, very time consuming and prone to a high intra and inter subject variance and human errors. In this way, the present line of work aims at developing a tool to help the specialists in this task. The developed tool will consider the delimitation of anatomical regions of interest, since it is crucial to identify the organs at risk and minimize the exposure of these organs to the radiation. This paper, in particular, presents a lung segmentation algorithm, based on image processing techniques, such as intensity projection and region growing, for Computed Tomography volumes. Our pipeline consists in first separating two halves of the volume to isolate each lung. Then, three techniques for seed placement are developed. Finally, a traditional region growing algorithm has been changed in order to automatically derive the value of the threshold parameter. The results obtained for the three different techniques for seed placement were, respectively, 74%, 74% and 92% of DICE with the Iterative Region Growing algorithm. Although the presented results have as use case the Hodgkin Lymphoma, we believe that the developed method is generalizable to any other pathology.
2018
Authors
Pereira, G; Domingues, I; Martins, P; Abreu, PH; Duarte, H; Santos, J;
Publication
COMBINATORIAL IMAGE ANALYSIS, IWCIA 2018
Abstract
The integration of functional imaging modality provided by Positron Emission Tomography (PET) and associated anatomical imaging modality provided by Computed Tomography (CT) has become an essential procedure both in the evaluation of different types of malignancy and in radiotherapy planning. The alignment of these two exams is thus of great importance. In this research work, three registration approaches (1) intensity-based registration, (2) rigid translation followed by intensity-based registration and (3) coarse registration followed by fine-tuning were evaluated and compared. To characterize the performance of these methods, 161 real volume scans from patients involved in Hodgkin Lymphoma staging were used: CT volumes used for radiotherapy planning were registered with PET volumes before any treatment. Registration results achieved 78%, 60%, and 91% of accuracy for methods (1), (2) and (3), respectively. Registration methods validation was extended to a corresponding landmarks points distance calculation. Methods (1), (2) and (3) achieved a median improvement registration rate of 66% mm, 51% mm and 70% mm, respectively. The accuracy of the proposed methods was further confirmed by extending our experiments to other multimodal datasets and in a monomodal dataset with different acquisition conditions.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.