Publications

Publications by Miriam Seoane Santos

2025

A Label Propagation Approach for Missing Data Imputation

Authors
Lopes, FL; Mangussi, AD; Pereira, RC; Santos, MS; Abreu, PH; Lorena, AC;

Publication
IEEE ACCESS

Abstract
Missing data is a common challenge in real-world datasets and can arise for various reasons. This has led to the classification of missing data mechanisms as missing completely at random, missing at random, or missing not at random. Currently, the literature offers various algorithms for imputing missing data, each with advantages tailored to specific mechanisms and levels of missingness. This paper introduces a novel approach to missing data imputation using the well-established label propagation algorithm, named Label Propagation for Missing Data Imputation (LPMD). The method combines, weighs, and propagates known feature values to impute missing data. Experiments on benchmark datasets highlight its effectiveness across various missing data scenarios, demonstrating more stable results compared to baseline methods under different missingness mechanisms and levels. The algorithms were evaluated based on processing time, imputation quality (measured by mean absolute error), and impact on classification performance. A variant of the algorithm (LPMD2) generally achieved the fastest processing time compared to other five imputation algorithms from the literature, with speed-ups ranging from 0.7 to 23 times. The results of LPMD were also stable regarding the mean absolute error of the imputed values compared to their original counterparts, for different missing data mechanisms and rates of missing values. In real applications, missingness can behave according to different and unknown mechanisms, so an imputation algorithm that behaves stably for different mechanisms is advantageous. The results regarding ML models produced using the imputed datasets were also comparable to the baselines.

CloseRead Abstract

2024

Reconstruction of Mammography Projections using Image-to-Image Translation Techniques

Authors
Santos, JC; Santos, MS; Abreu, PH;

Publication
ESANN

Abstract
Mammography imaging is the gold standard for breast cancer detection and involves capturing two projections: mediolateral oblique and craniocaudal projections. The implementation of an approach that allows the acquisition of only one projection and reconstructs the other could mitigate patient burden, minimize radiation exposure, and reduce costs. Image-to-image translation has showcased the ability to generate realistic synthetic images in different medical imaging modalities which make these techniques a great candidate for the novel application in mammography. This study aims to compare five image-to-image translation approaches to assess the feasibility of reconstructing a mammography projection from its counterpart. The results indicate that ResViT shows the best overall performance in translating between both projections.

CloseRead Abstract

2023

ydata-profiling: Accelerating data-centric AI with high-quality data

Authors
Clemente, F; Ribeiro, GM; Quemy, A; Santos, MS; Pereira, RC; Barros, A;

Publication
NEUROCOMPUTING

Abstract
ydata-profiling is an open-source Python package for advanced exploratory data analysis that enables users to generate data profiling reports in a simple, fast, and efficient manner, fostering a standardized and visual understanding of the data. Beyond traditional descriptive properties and statistics, ydata-profiling follows a Data-Centric AI approach to exploratory analysis, as it focuses on the automatic detection and highlighting of complex data characteristics often associated with potential data quality issues, such as high ratios of missing or imbalanced data, infinite, unique, or constant values, skewness, high correlation, high cardinality, non-stationarity, seasonality, duplicate records, and other inconsistencies. The source code, documentation, and examples are available in the GitHub repository: https://github.com/ydataai/ydata-profiling.

CloseRead Abstract

2015

Simulation of Cellular Changes on Optical Coherence Tomography of Human Retina

Authors
Santos, M; Araujo, A; Barbeiro, S; Caramelo, F; Correia, A; Marques, MI; Pinto, L; Serranho, P; Bernardes, R; Morgado, M;

Publication
2015 37TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC)

Abstract
We present a methodology to assess cell level alterations on the human retina responsible for functional changes observable in the Optical Coherence Tomography data in healthy ageing and in disease conditions, in the absence of structural alterations. The methodology is based in a 3D multilayer Monte Carlo computational model of the human retina. The optical properties of each layer are obtained by solving the Maxwell's equations for 3D domains representative of small regions of those layers, using a Discontinuous Galerkin Finite Element Method (DG-FEM). Here we present the DG-FEM Maxwell 3D model and its validation against Mie's theory for spherical scatterers. We also present an application of our methodology to the assessment of cell level alterations responsible for the OCT data in Diabetic Macular Edema. It was possible to identify which alterations are responsible for the changes observed in the OCT scans of the diseased groups.

CloseRead Abstract

2015

Maxwell's Equations based 3D model of Light Scattering in the Retina

Authors
Santos, M; Araujo, A; Barbeiro, S; Caramelo, F; Correia, A; Marques, MI; Morgado, M; Pinto, L; Serranho, P; Bernardes, R;

Publication
2015 IEEE 4TH PORTUGUESE MEETING ON BIOENGINEERING (ENBENG)

Abstract
The goal of this work is to develop a computational model of the human retina and simulate light scattering through its structure aiming to shed light on data obtained by optical coherence tomography in human retinas. Currently, light propagation in scattering media is often described by Mie's solution to Maxwell's equations, which only describes the scattering patterns for homogeneous spheres, thus limiting its application for scatterers of more complex shapes. In this work, we propose a discontinuous Galerkin method combined with a low-storage Runge-Kutta method as an accurate and efficient way to numerically solve the time-dependent Maxwell's equations. In this work, we report on the validation of the proposed methodology by comparison with Mie's solution, a mandatory step before further elaborating the numerical scheme towards the propagation of electromagnetic waves through the human retina.

CloseRead Abstract

2025

mdatagen: A python library for the artificial generation of missing data

Authors
Mangussi, AD; Santos, MS; Lopes, FL; Pereira, RC; Lorena, AC; Abreu, PH;

Publication
NEUROCOMPUTING

Abstract
Missing data is characterized by the presence of absent values in data (i.e., missing values) and it is currently categorized into three different mechanisms: Missing Completely at Random, Missing At Random, and Missing Not At Random. When performing missing data experiments and evaluating techniques to handle absent values, these mechanisms are often artificially generated (a process referred to as data amputation) to assess the robustness and behavior of the used methods. Due to the lack of a standard benchmark for data amputation, different implementations of the mechanisms are used in related research (some are often not disclaimed), preventing the reproducibility of results and leading to an unfair or inaccurate comparison between existing and new methods. Moreover, for users outside the field, experimenting with missing data or simulating the appearance of missing values in real-world domains is unfeasible, impairing stress testing in machine learning systems. This work introduces mdatagen, an open source Python library for the generation of missing data mechanisms across 20 distinct scenarios, following different univariate and multivariate implementations of the established missing mechanisms. The package therefore fosters reproducible results across missing data experiments and enables the simulation of artificial missing data under flexible configurations, making it very versatile to mimic several real-world applications involving missing data. The source code and detailed documentation for mdatagen are available at https://github.com/ArthurMangussi/pymdatagen.

CloseRead Abstract