Publications

Publications by Miriam Seoane Santos

2017

Influence of Data Distribution in Missing Data Imputation

Authors
Santos, MS; Soares, JP; Abreu, PH; Araújo, H; Santos, JAM;

Publication
Artificial Intelligence in Medicine - 16th Conference on Artificial Intelligence in Medicine, AIME 2017, Vienna, Austria, June 21-24, 2017, Proceedings

Abstract

2018

Exploring the Effects of Data Distribution in Missing Data Imputation

Authors
Soares, JP; Santos, MS; Abreu, PH; Araújo, H; Santos, JAM;

Publication
Advances in Intelligent Data Analysis XVII - 17th International Symposium, IDA 2018, 's-Hertogenbosch, The Netherlands, October 24-26, 2018, Proceedings

Abstract

2018

Missing Data Imputation via Denoising Autoencoders: The Untold Story

Authors
Costa, AF; Santos, MS; Soares, JP; Abreu, PH;

Publication
Advances in Intelligent Data Analysis XVII - 17th International Symposium, IDA 2018, 's-Hertogenbosch, The Netherlands, October 24-26, 2018, Proceedings

Abstract

2021

FAWOS: Fairness-Aware Oversampling Algorithm Based on Distributions of Sensitive Attributes

Authors
Salazar, T; Santos, MS; Araújo, H; Abreu, PH;

Publication
IEEE Access

Abstract

2022

The impact of heterogeneous distance functions on missing data imputation and classification performance

Authors
Santos, MS; Abreu, PH; Fernández, A; Luengo, J; Santos, JAM;

Publication
Eng. Appl. Artif. Intell.

Abstract

2025

mdatagen: A python library for the artificial generation of missing data

Authors
Mangussi, AD; Santos, MS; Lopes, FL; Pereira, RC; Lorena, AC; Abreu, PH;

Publication
NEUROCOMPUTING

Abstract
Missing data is characterized by the presence of absent values in data (i.e., missing values) and it is currently categorized into three different mechanisms: Missing Completely at Random, Missing At Random, and Missing Not At Random. When performing missing data experiments and evaluating techniques to handle absent values, these mechanisms are often artificially generated (a process referred to as data amputation) to assess the robustness and behavior of the used methods. Due to the lack of a standard benchmark for data amputation, different implementations of the mechanisms are used in related research (some are often not disclaimed), preventing the reproducibility of results and leading to an unfair or inaccurate comparison between existing and new methods. Moreover, for users outside the field, experimenting with missing data or simulating the appearance of missing values in real-world domains is unfeasible, impairing stress testing in machine learning systems. This work introduces mdatagen, an open source Python library for the generation of missing data mechanisms across 20 distinct scenarios, following different univariate and multivariate implementations of the established missing mechanisms. The package therefore fosters reproducible results across missing data experiments and enables the simulation of artificial missing data under flexible configurations, making it very versatile to mimic several real-world applications involving missing data. The source code and detailed documentation for mdatagen are available at https://github.com/ArthurMangussi/pymdatagen.

CloseRead Abstract