2017
Authors
Santos, MS; Soares, JP; Abreu, PH; Araújo, H; Santos, JAM;
Publication
Artificial Intelligence in Medicine - 16th Conference on Artificial Intelligence in Medicine, AIME 2017, Vienna, Austria, June 21-24, 2017, Proceedings
Abstract
2018
Authors
Soares, JP; Santos, MS; Abreu, PH; Araújo, H; Santos, JAM;
Publication
Advances in Intelligent Data Analysis XVII - 17th International Symposium, IDA 2018, 's-Hertogenbosch, The Netherlands, October 24-26, 2018, Proceedings
Abstract
2018
Authors
Costa, AF; Santos, MS; Soares, JP; Abreu, PH;
Publication
Advances in Intelligent Data Analysis XVII - 17th International Symposium, IDA 2018, 's-Hertogenbosch, The Netherlands, October 24-26, 2018, Proceedings
Abstract
2021
Authors
Salazar, T; Santos, MS; Araújo, H; Abreu, PH;
Publication
IEEE Access
Abstract
2022
Authors
Santos, MS; Abreu, PH; Fernández, A; Luengo, J; Santos, JAM;
Publication
Eng. Appl. Artif. Intell.
Abstract
2025
Authors
Mangussi, AD; Santos, MS; Lopes, FL; Pereira, RC; Lorena, AC; Abreu, PH;
Publication
NEUROCOMPUTING
Abstract
Missing data is characterized by the presence of absent values in data (i.e., missing values) and it is currently categorized into three different mechanisms: Missing Completely at Random, Missing At Random, and Missing Not At Random. When performing missing data experiments and evaluating techniques to handle absent values, these mechanisms are often artificially generated (a process referred to as data amputation) to assess the robustness and behavior of the used methods. Due to the lack of a standard benchmark for data amputation, different implementations of the mechanisms are used in related research (some are often not disclaimed), preventing the reproducibility of results and leading to an unfair or inaccurate comparison between existing and new methods. Moreover, for users outside the field, experimenting with missing data or simulating the appearance of missing values in real-world domains is unfeasible, impairing stress testing in machine learning systems. This work introduces mdatagen, an open source Python library for the generation of missing data mechanisms across 20 distinct scenarios, following different univariate and multivariate implementations of the established missing mechanisms. The package therefore fosters reproducible results across missing data experiments and enables the simulation of artificial missing data under flexible configurations, making it very versatile to mimic several real-world applications involving missing data. The source code and detailed documentation for mdatagen are available at https://github.com/ArthurMangussi/pymdatagen.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.