Publications

Publications by Pedro Henriques Abreu

2025

Assessing Adversarial Effects of Noise in Missing Data Imputation

Authors
Mangussi, AD; Pereira, RC; Abreu, PH; Lorena, AC;

Publication
INTELLIGENT SYSTEMS, BRACIS 2024, PT I

Abstract
In real-world scenarios, a wide variety of datasets contain inconsistencies. One example of such inconsistency is missing data (MD), which refers to the absence of information in one or more variables. Missing imputation strategies emerged as a possible solution for addressing this problem, which can replace the missing values based on mean, median, or Machine Learning (ML) techniques. The performance of such strategies depends on multiple factors. One factor that influences the missing value imputation (MVI) methods is the presence of noisy instances, described as anything that obscures the relationship between the features of an instance and its class, having an adversarial effect. However, the interaction between MD and noisy instances has received little attention in the literature. This work fills this gap by investigating missing and noisy data interplay. Our experimental setup begins with generating missingness under the Missing Not at Random (MNAR) mechanism in a multivariate scenario and performing imputation using seven state-of-the-art MVI methods. Our methodology involves applying a noise filter before performing the imputation task and evaluating the quality of the imputation directly. Additionally, we measure the classification performance with the new estimates. This approach is applied to both synthetic data and 11 real-world datasets. The effects of noise filtering before imputation are evaluated. The results show that noise preprocessing before the imputation task improves the imputation quality and the classification performance for imputed datasets.

CloseRead Abstract

2025

mdatagen: A python library for the artificial generation of missing data

Authors
Mangussi, AD; Santos, MS; Lopes, FL; Pereira, RC; Lorena, AC; Abreu, PH;

Publication
NEUROCOMPUTING

Abstract
Missing data is characterized by the presence of absent values in data (i.e., missing values) and it is currently categorized into three different mechanisms: Missing Completely at Random, Missing At Random, and Missing Not At Random. When performing missing data experiments and evaluating techniques to handle absent values, these mechanisms are often artificially generated (a process referred to as data amputation) to assess the robustness and behavior of the used methods. Due to the lack of a standard benchmark for data amputation, different implementations of the mechanisms are used in related research (some are often not disclaimed), preventing the reproducibility of results and leading to an unfair or inaccurate comparison between existing and new methods. Moreover, for users outside the field, experimenting with missing data or simulating the appearance of missing values in real-world domains is unfeasible, impairing stress testing in machine learning systems. This work introduces mdatagen, an open source Python library for the generation of missing data mechanisms across 20 distinct scenarios, following different univariate and multivariate implementations of the established missing mechanisms. The package therefore fosters reproducible results across missing data experiments and enables the simulation of artificial missing data under flexible configurations, making it very versatile to mimic several real-world applications involving missing data. The source code and detailed documentation for mdatagen are available at https://github.com/ArthurMangussi/pymdatagen.

CloseRead Abstract

2025

The Role of Deep Learning in Medical Image Inpainting: A Systematic Review

Authors
Santos, JC; Tomás Pereira Alexandre, H; Seoane Santos, M; Henriques Abreu, P;

Publication
ACM Transactions on Computing for Healthcare

Abstract
Image inpainting is a crucial technique in computer vision, particularly for reconstructing corrupted images. In medical imaging, it addresses issues from instrumental errors, artifacts, or human factors. The development of deep learning techniques has revolutionized image inpainting, allowing for the generation of high-level semantic information to ensure structural and textural consistency in restored images. This paper presents a comprehensive review of 53 studies on deep image inpainting in medical imaging, analyzing its evolution, impact, and limitations. The findings highlight the significance of deep image inpainting in artifact removal and enhancing the performance of multi-task approaches by localizing and inpainting regions of interest. Furthermore, the study identifies magnetic resonance imaging and computed tomography as the predominant modalities and highlights generative adversarial networks and U-Net as preferred architectures. Future research directions include the development of blind inpainting techniques, the exploration of techniques suitable for 3D/4D images, multiple artifacts, and multi-task applications, and the improvement of architectures.

CloseRead Abstract

2024

A Perspective on the Missing at Random Problem: Synthetic Generation and Benchmark Analysis

Authors
Cabrera-Sánchez, JF; Pereira, RC; Abreu, PH; Silva-Ramírez, EL;

Publication
IEEE ACCESS

Abstract
Progressively more advanced and complex models are proposed to address problems related to computer vision, forecasting, Internet of Things, Big Data and so on. However, these disciplines require preprocessing steps to obtain meaningful results. One of the most common problems addressed in this stage is the presence of missing values. Understanding the reason why missingness occurs helps to select data imputation methods that are more adequate to complete these missing values. Missing at Random synthetic generation presents challenges such as achieving extreme missingness rates and preserving the consistency of the mechanism. To address these shortcomings, three new methods that generate synthetic missingness under the Missing at Random mechanism are proposed in this work and compared to a baseline model. This comparison considers a benchmark covering 33 data sets and five missingness rates $(10\%, 20\%, 40\%, 60\%, 80\%)$ . Seven data imputation methods are compared to evaluate the proposals, ranging from traditional methods to deep learning methods. The results demonstrate that the proposals are aligned with the baseline method in terms of the performance and ranking of data imputation methods. Thus, three new feasible and consistent alternatives for synthetic missingness generation under Missing at Random are presented.

CloseRead Abstract

2024

Enhancing mammography: a comprehensive review of computer methods for improving image quality

Authors
Santos, JC; Santos, MS; Abreu, PH;

Publication
PROGRESS IN BIOMEDICAL ENGINEERING

Abstract
Mammography imaging remains the gold standard for breast cancer detection and diagnosis, but challenges in image quality can lead to misdiagnosis, increased radiation exposure, and higher healthcare costs. This comprehensive review evaluates traditional and machine learning-based techniques for improving mammography image quality, aiming to benefit clinicians and enhance diagnostic accuracy. Our literature search, spanning 2015 - 2024, identified 115 articles focusing on contrast enhancement and noise reduction methods, including histogram equalization, filtering, unsharp masking, fuzzy logic, transform-based techniques, and advanced machine learning approaches. Machine learning, particularly architectures integrating denoising autoencoders with convolutional neural networks, emerged as highly effective in enhancing image quality without compromising detail. The discussion highlights the success of these techniques in improving mammography images' visual quality. However, challenges such as high noise ratios, inconsistent evaluation metrics, and limited open-source datasets persist. Addressing these issues offers opportunities for future research to further advance mammography image enhancement methodologies.

CloseRead Abstract

2025

A Label Propagation Approach for Missing Data Imputation

Authors
Lopes, FL; Mangussi, AD; Pereira, RC; Santos, MS; Abreu, PH; Lorena, AC;

Publication
IEEE Access

Abstract
Missing data is a common challenge in real-world datasets and can arise for various reasons. This has led to the classification of missing data mechanisms as missing completely at random, missing at random, or missing not at random. Currently, the literature offers various algorithms for imputing missing data, each with advantages tailored to specific mechanisms and levels of missingness. This paper introduces a novel approach to missing data imputation using the well-established label propagation algorithm, named Label Propagation for Missing Data Imputation (LPMD). The method combines, weighs, and propagates known feature values to impute missing data. Experiments on benchmark datasets highlight its effectiveness across various missing data scenarios, demonstrating more stable results compared to baseline methods under different missingness mechanisms and levels. The algorithms were evaluated based on processing time, imputation quality (measured by mean absolute error), and impact on classification performance. A variant of the algorithm (LPMD2) generally achieved the fastest processing time compared to other five imputation algorithms from the literature, with speed-ups ranging from 0.7 to 23 times. The results of LPMD were also stable regarding the mean absolute error of the imputed values compared to their original counterparts, for different missing data mechanisms and rates of missing values. In real applications, missingness can behave according to different and unknown mechanisms, so an imputation algorithm that behaves stably for different mechanisms is advantageous. The results regarding ML models produced using the imputed datasets were also comparable to the baselines. © 2013 IEEE.

CloseRead Abstract