Publications

Publications by LIAAD

2025

mdatagen: A python library for the artificial generation of missing data

Authors
Mangussi, AD; Santos, MS; Lopes, FL; Pereira, RC; Lorena, AC; Abreu, PH;

Publication
NEUROCOMPUTING

Abstract
Missing data is characterized by the presence of absent values in data (i.e., missing values) and it is currently categorized into three different mechanisms: Missing Completely at Random, Missing At Random, and Missing Not At Random. When performing missing data experiments and evaluating techniques to handle absent values, these mechanisms are often artificially generated (a process referred to as data amputation) to assess the robustness and behavior of the used methods. Due to the lack of a standard benchmark for data amputation, different implementations of the mechanisms are used in related research (some are often not disclaimed), preventing the reproducibility of results and leading to an unfair or inaccurate comparison between existing and new methods. Moreover, for users outside the field, experimenting with missing data or simulating the appearance of missing values in real-world domains is unfeasible, impairing stress testing in machine learning systems. This work introduces mdatagen, an open source Python library for the generation of missing data mechanisms across 20 distinct scenarios, following different univariate and multivariate implementations of the established missing mechanisms. The package therefore fosters reproducible results across missing data experiments and enables the simulation of artificial missing data under flexible configurations, making it very versatile to mimic several real-world applications involving missing data. The source code and detailed documentation for mdatagen are available at https://github.com/ArthurMangussi/pymdatagen.

CloseRead Abstract

2025

The Role of Deep Learning in Medical Image Inpainting: A Systematic Review

Authors
Santos, JC; Alexandre, HTP; Santos, MS; Abreu, PH;

Publication
ACM TRANSACTIONS ON COMPUTING FOR HEALTHCARE

Abstract
Image inpainting is a crucial technique in computer vision, particularly for reconstructing corrupted images. In medical imaging, it addresses issues from instrumental errors, artifacts, or human factors. The development of deep learning techniques has revolutionized image inpainting, allowing for the generation of high-level semantic information to ensure structural and textural consistency in restored images. This article presents a comprehensive review of 53 studies on deep image inpainting in medical imaging, analyzing its evolution, impact, and limitations. The findings highlight the significance of deep image inpainting in artifact removal and enhancing the performance of multi-task approaches by localizing and inpainting regions of interest. Furthermore, the study identifies magnetic resonance imaging and computed tomography as the predominant modalities and highlights generative adversarial networks and U-Net as preferred architectures. Future research directions include the development of blind inpainting techniques, the exploration of techniques suitable for 3D/4D images, multiple artifacts, and multi-task applications, and the improvement of architectures.

CloseRead Abstract

2025

Pycol: A Python package for dataset complexity measures

Authors
Apóstolo, D; Santos, MS; Lorena, AC; Abreu, PH;

Publication
NEUROCOMPUTING

Abstract
Class overlap presents a significant challenge to machine learning algorithms, especially when class imbalance is present. These factors contribute substantially to the complexity of classification tasks, particularly in realworld scenarios. As a result, measuring overlap is crucial, yet it remains difficult to quantify due to its intricate nature, since it can manifest and be measured in multiple ways. To help mitigate this, recent research has conceptualized a new taxonomy of class overlap measures, divided into multiple families, which allows researchers to obtain a more complete overview of the complexity of the datasets. In line with recent research, we introduce a new Python package for class overlap measurement named pycol. This package implements 29 overlap measures, divided into four overlap families specifically designed to capture class overlap in imbalanced real-world scenarios. This makes pycol an essential tool for researchers dealing with complex classification problems, providing robust solutions to quantify the joint-effect of class overlap and class imbalance effectively.

CloseRead Abstract

2025

Category-wise Fine-Tuning: Resisting incorrect pseudo-labels in multi-label image classification with partial labels

Authors
Chong, CF; Fang, XY; Guo, JL; Abreu, PH; Wang, YP; Yang, X; Kea, W; Im, SK;

Publication
NEUROCOMPUTING

Abstract
Large-scale image datasets are often partially labeled, where only a few categories' labels are known for each image. Assigning pseudo-labels to unknown labels to gain additional training signals has become prevalent for training deep classification models. However, some pseudo-labels are inevitably incorrect, leading to a notable decline in the model classification performance. In this paper, we propose a new method called Category-wise Fine-Tuning (CFT), aiming to reduce model inaccuracies caused by the wrong pseudo-labels. In particular, CFT employs known labels without pseudo-labels to fine-tune the logistic regressions of trained models individually to calibrate each category's model predictions. Genetic Algorithm, seldom used for training deep models, is also utilized in CFT to maximize the classification performance directly. CFT is applied to well-trained models, unlike most existing methods that train models from scratch. Hence, CFT is general and compatible with models trained with different methods and schemes, as demonstrated through extensive experiments. CFT requires only a few seconds for each category for calibration with consumer-grade GPUs. We achieve state-of-the-art results on three benchmarking datasets, including the CheXpert chest X-ray competition dataset (ensemble mAUC 93.33%, single model 91.82%), partially labeled MS-COCO (average mAP 83.69%), and Open Image V3 (mAP 85.31%), outperforming the previous bests by 0.28%, 2.21%, 2.50%, and 0.91%, respectively. The single model on CheXpert has been officially evaluated by the competition server, endorsing the correctness of the result. The outstanding results and generalizability indicate that CFT could be substantial and prevalent for classification model development. Code is available at: https://github.com/maxium0526/category-wise-fine-tuning.

CloseRead Abstract

2025

A Systematic Review and Comparison of Calibration Techniques for UWB Localization Anchors

Authors
Simoes, SA; Araújo, H; Abreu, PH;

Publication
2025 9TH INTERNATIONAL YOUNG ENGINEERS FORUM ON ELECTRICAL AND COMPUTER ENGINEERING, YEF-ECE

Abstract
Ultra-wideband (UWB) systems are critical for indoor positioning in robotics, industrial tracking, and asset management due to their accuracy in multipath-prone environments. Like GPS satellites requiring precise orbital data, UWB systems depend on well-calibrated anchors-fixed reference points whose positional accuracy directly impacts location estimates. We systematically evaluate and compare computational calibration methods, such as Genetic Algorithms, Maximum Likelihood, and the Extended Kalman Filter, using synthetic data, assessing both efficiency and error reduction in calibration and location. Nonlinear Least Squares (NLS) outperformed other approaches from this review as well as state-of-the-art methods, reducing anchor calibration errors to 10.7cm (86.03% improvement from 1-meter initial uncertainty) and tag localization errors to 5.6cm (88.35% reduction). NLS maintained computational efficiency (mean execution time of 0.011s, proving ideal for real-world deployments where efficiency and accuracy are critical.

CloseRead Abstract

2025

Integrating artificial intelligence into scenario analysis: a validated framework for strategic planning under economic uncertainty

Authors
Bessa, G; Barbosa, B;

Publication
Global Economics Research

Abstract