Publications

2025

mdatagen: A python library for the artificial generation of missing data

Authors
Mangussi, AD; Santos, MS; Lopes, FL; Pereira, RC; Lorena, AC; Abreu, PH;

Publication
NEUROCOMPUTING

Abstract
Missing data is characterized by the presence of absent values in data (i.e., missing values) and it is currently categorized into three different mechanisms: Missing Completely at Random, Missing At Random, and Missing Not At Random. When performing missing data experiments and evaluating techniques to handle absent values, these mechanisms are often artificially generated (a process referred to as data amputation) to assess the robustness and behavior of the used methods. Due to the lack of a standard benchmark for data amputation, different implementations of the mechanisms are used in related research (some are often not disclaimed), preventing the reproducibility of results and leading to an unfair or inaccurate comparison between existing and new methods. Moreover, for users outside the field, experimenting with missing data or simulating the appearance of missing values in real-world domains is unfeasible, impairing stress testing in machine learning systems. This work introduces mdatagen, an open source Python library for the generation of missing data mechanisms across 20 distinct scenarios, following different univariate and multivariate implementations of the established missing mechanisms. The package therefore fosters reproducible results across missing data experiments and enables the simulation of artificial missing data under flexible configurations, making it very versatile to mimic several real-world applications involving missing data. The source code and detailed documentation for mdatagen are available at https://github.com/ArthurMangussi/pymdatagen.

CloseRead Abstract

2025

Evaluating the Impact of Pulse Oximetry Bias in Machine Learning Under Counterfactual Thinking

Authors
Martins, I; Matos, J; Goncalves, T; Celi, LA; Wong, AKI; Cardoso, JS;

Publication
APPLICATIONS OF MEDICAL ARTIFICIAL INTELLIGENCE, AMAI 2024

Abstract
Algorithmic bias in healthcare mirrors existing data biases. However, the factors driving unfairness are not always known. Medical devices capture significant amounts of data but are prone to errors; for instance, pulse oximeters overestimate the arterial oxygen saturation of darker-skinned individuals, leading to worse outcomes. The impact of this bias in machine learning (ML) models remains unclear. This study addresses the technical challenges of quantifying the impact of medical device bias in downstream ML. Our experiments compare a perfect world, without pulse oximetry bias, using SaO(2) (blood-gas), to the actual world, with biased measurements, using SpO(2) (pulse oximetry). Under this counterfactual design, two models are trained with identical data, features, and settings, except for the method of measuring oxygen saturation: models using SaO(2) are a control and models using SpO(2) a treatment. The blood-gas oximetry linked dataset was a suitable testbed, containing 163,396 nearly-simultaneous SpO(2) - SaO(2) paired measurements, aligned with a wide array of clinical features and outcomes. We studied three classification tasks: in-hospital mortality, respiratory SOFA score in the next 24 h, and SOFA score increase by two points. Models using SaO(2) instead of SpO(2) generally showed better performance. Patients with overestimation of O-2 by pulse oximetry of >= 3% had significant decreases in mortality prediction recall, from 0.63 to 0.59, P < 0.001. This mirrors clinical processes where biased pulse oximetry readings provide clinicians with false reassurance of patients' oxygen levels. A similar degradation happened in ML models, with pulse oximetry biases leading to more false negatives in predicting adverse outcomes.

CloseRead Abstract

2025

Evaluation of the vision mamba model for detecting diabetic retinopathy

Authors
Ferreira, M; Cardoso, L; Camara, J; Pires, S; Correia, N; Junior, GB; Cunha, A;

Publication
Procedia Computer Science

Abstract
Diabetic retinopathy is an eye disease that affects people with diabetes mellitus, causing lesions that affect the retina, leading to progressive vision loss. In Portugal, it is estimated that 1.5 million people between the ages of 20 and 79 have diabetes, a figure that is expected to rise in the coming years. This increase is also likely to raise the total number of people affected by diabetic retinopathy, who will need to be identified in the early stages of the disease to receive clinical treatment aimed at reducing the likelihood of visual impairment due to the disease. Detection and classification of the stage of severity is carried out by specialists using medical images of the retina and patients' clinical data. Fundus photographs are the standard for detecting and monitoring the progression of the disease, as they make it possible to see biomarkers that characterize the stages of the disease. The manual task of analyzing and tracing images is time-consuming and subjective, which can lead to interpretation errors. Artificial intelligence (AI) models, such as convolutional neural networks, have been proposed to aid specialists in medical image analysis tasks, some of which have already been approved for clinical use. To overcome the limitations of convolutional networks, new AI models have been proposed to develop computer vision applications, achieving promising results in image classification. The vision mamba model was recently introduced, which uses bidirectional state space to obtain an efficient visual representation. In this work, we evaluate the vision mamba model's ability to detect cases in the moderate and advanced stages of diabetic retinopathy in fundus photographs and compare its performance with models based on convolutional networks. As the best result, the model achieved a recall value of 0.95 in the APTOS dataset. © 2025 The Author(s).

CloseRead Abstract

2025

Hydrogen Optical Sensors Based on Magnesium Thin Films for Leak Detection in Industrial Settings

Authors
Santos, AD; de Almeida, JMMM; Mendes, JP; Almeida, MAS; Coelho, LC;

Publication
29TH INTERNATIONAL CONFERENCE ON OPTICAL FIBER SENSORS

Abstract
Hydrogen (H-2) infrastructure is the focus of many initiatives for the planned energetic transition, but its volatility and flammability require extensive safety measures to prevent leakages and explosions. Magnesium thin films have been investigated not only for H-2 storage but also as switchable mirrors, which drastically change their optical properties when hydrogenated. Due to their lower cost compared to other hydride-forming or plasmonic metals commonly used in optical sensing, Mg-based H-2 fiber sensors have the potential to be both affordable and effective for scalable deployment in industrial settings. To this end, multilayer thin-film structures with Mg and palladium as adsorption catalyst were deposited on single-mode fiber tips, and H-2 loading/unloading processes were tested in a controlled flow gas setup. In parallel, an optical interrogation system prototype was developed, enabling fast data acquisition of fiber-tip reflectivity across multiple sensing probes at a wavelength of 1550 nm. Preliminary testing suggests fast response times of a few seconds for significant drops in reflectivity, facilitating straightforward detection of H-2 leaks using thresholding methods. Planned future work includes performance comparison with simpler sensing structures, durability and contaminant testing, and response time optimization.

CloseRead Abstract

2025

Swin Transformer Applied to Breast MRI Super-Resolution in a Cross-Cohort Dataset

Authors
Sousa, P; Sousa, H; Pereira, T; Batista, E; Gouveia, P; Oliveira, HP;

Publication
2025 IEEE 38TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, CBMS

Abstract
Advancements in the care for patients with breast cancer have demanded the development of biomechanical breast models for the planning and risk mitigation of such invasive surgical procedures. However, these approaches require large amounts of high-quality magnetic resonance imaging (MRI) training data that is of difficult acquisition and availability. Although this can be solved using synthetic data, generating high resolution images comes at the price of very high computational constraints and tipically low performances. On the other hand, producing lower resolution samples yields better results and efficiency but falls short of meeting health professional standards. Therefore, this work aims to validate a joint approach between lower resolution generative models and the proposed super-resolution architecture, titled Shifted Window Image Restoration (SWinIR), which was used to achieve a 4x increase in image size of breast cancer patient MRI samples. Results prove to be promising and to further expand upon the super-resolution state-of-the-art, achieving good maximum peak signal-to-noise ratio of 41.36 and structural similarity index values of 0.962 and thus beating traditional methods and other machine learning architectures.

CloseRead Abstract

2025

Theoretical Model Validation of the Multisensory Role on Subjective Realism, Presence and Involvement in Immersive Virtual Reality

Authors
Gonçalves, G; Peixoto, B; Melo, M; Bessa, M;

Publication
COMPUTER GRAPHICS FORUM

Abstract
With the consistent adoption of iVR and growing research on the topic, it becomes fundamental to understand how the perception of Realism plays a role in the potential of iVR. This work puts forwards a hypothesis-driven theoretical model of how the perception of each multisensory stimulus (Visual, Audio, Haptic and Scent) is related to the perception of Realism of the whole experience (Subjective Realism) and, in turn, how this Subjective Realism is related to Involvement and Presence. The model was validated using a sample of 216 subjects in a multisensory iVR experience. The results indicated a good model fit and provided evidence on how the perception of Realism of Visual, Audio and Scent individually is linked to Subjective Realism. Furthermore, the results demonstrate strong evidence that Subjective Realism is strongly associated with Involvement and Presence. These results put forwards a validated questionnaire for the perception of Realism of different aspects of the virtual experience and a robust theoretical model on the interconnections of these constructs. We provide empirical evidence that can be used to optimise iVR systems for Presence, Involvement and Subjective Realism, thereby enhancing the effectiveness of iVR experiences and opening new research avenues.

CloseRead Abstract

230
4387