Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Interest
Topics
Details

Details

  • Name

    Miriam Seoane Santos
  • Role

    Senior Researcher
  • Since

    01st January 2024
Publications

2025

mdatagen: A python library for the artificial generation of missing data

Authors
Mangussi, AD; Santos, MS; Lopes, FL; Pereira, RC; Lorena, AC; Abreu, PH;

Publication
NEUROCOMPUTING

Abstract
Missing data is characterized by the presence of absent values in data (i.e., missing values) and it is currently categorized into three different mechanisms: Missing Completely at Random, Missing At Random, and Missing Not At Random. When performing missing data experiments and evaluating techniques to handle absent values, these mechanisms are often artificially generated (a process referred to as data amputation) to assess the robustness and behavior of the used methods. Due to the lack of a standard benchmark for data amputation, different implementations of the mechanisms are used in related research (some are often not disclaimed), preventing the reproducibility of results and leading to an unfair or inaccurate comparison between existing and new methods. Moreover, for users outside the field, experimenting with missing data or simulating the appearance of missing values in real-world domains is unfeasible, impairing stress testing in machine learning systems. This work introduces mdatagen, an open source Python library for the generation of missing data mechanisms across 20 distinct scenarios, following different univariate and multivariate implementations of the established missing mechanisms. The package therefore fosters reproducible results across missing data experiments and enables the simulation of artificial missing data under flexible configurations, making it very versatile to mimic several real-world applications involving missing data. The source code and detailed documentation for mdatagen are available at https://github.com/ArthurMangussi/pymdatagen.

2025

A Label Propagation Approach for Missing Data Imputation

Authors
Lopes, FL; Mangussi, AD; Pereira, RC; Santos, MS; Abreu, PH; Lorena, AC;

Publication
IEEE Access

Abstract
Missing data is a common challenge in real-world datasets and can arise for various reasons. This has led to the classification of missing data mechanisms as missing completely at random, missing at random, or missing not at random. Currently, the literature offers various algorithms for imputing missing data, each with advantages tailored to specific mechanisms and levels of missingness. This paper introduces a novel approach to missing data imputation using the well-established label propagation algorithm, named Label Propagation for Missing Data Imputation (LPMD). The method combines, weighs, and propagates known feature values to impute missing data. Experiments on benchmark datasets highlight its effectiveness across various missing data scenarios, demonstrating more stable results compared to baseline methods under different missingness mechanisms and levels. The algorithms were evaluated based on processing time, imputation quality (measured by mean absolute error), and impact on classification performance. A variant of the algorithm (LPMD2) generally achieved the fastest processing time compared to other five imputation algorithms from the literature, with speed-ups ranging from 0.7 to 23 times. The results of LPMD were also stable regarding the mean absolute error of the imputed values compared to their original counterparts, for different missing data mechanisms and rates of missing values. In real applications, missingness can behave according to different and unknown mechanisms, so an imputation algorithm that behaves stably for different mechanisms is advantageous. The results regarding ML models produced using the imputed datasets were also comparable to the baselines. © 2013 IEEE.

2024

An Interpretable Human-in-the-Loop Process to Improve Medical Image Classification

Authors
Santos, JC; Santos, MS; Abreu, PH;

Publication
ADVANCES IN INTELLIGENT DATA ANALYSIS XXII, PT I, IDA 2024

Abstract
Medical imaging classification improves patient prognoses by providing information on disease assessment, staging, and treatment response. The high demand for medical imaging acquisition requires the development of effective classification methodologies, occupying deep learning technologies, the pool position for this task. However, the major drawback of such techniques relies on their black-box nature which has delayed their use in real-world scenarios. Interpretability methodologies have emerged as a solution for this problem due to their capacity to translate black-box models into clinical understandable information. The most promising interpretability methodologies are concept-based techniques that can understand the predictions of a deep neural network through user-specified concepts. Concept activation regions and concept activation vectors are concept-based implementations that provide global explanations for the prediction of neural networks. The explanations provided allow the identification of the relationships that the network learned and can be used to identify possible errors during training. In this work, concept activation vectors and concept activation regions are used to identify flaws in neural network training and how this weakness can be mitigated in a human-in-the-loop process automatically improving the performance and trustworthiness of the classifier. To reach such a goal, three phases have been defined: training baseline classifiers, applying the concept-based interpretability, and implementing a human-in-the-loop approach to improve classifier performance. Four medical imaging datasets of different modalities are included in this study to prove the generality of the proposed method. The results identified concepts in each dataset that presented flaws in the classifier training and consequently, the human-in-the-loop approach validated by a team of 2 clinicians team achieved a statistically significant improvement.

2024

Enhancing mammography: a comprehensive review of computer methods for improving image quality

Authors
Santos, JC; Santos, MS; Abreu, PH;

Publication
PROGRESS IN BIOMEDICAL ENGINEERING

Abstract
Mammography imaging remains the gold standard for breast cancer detection and diagnosis, but challenges in image quality can lead to misdiagnosis, increased radiation exposure, and higher healthcare costs. This comprehensive review evaluates traditional and machine learning-based techniques for improving mammography image quality, aiming to benefit clinicians and enhance diagnostic accuracy. Our literature search, spanning 2015 - 2024, identified 115 articles focusing on contrast enhancement and noise reduction methods, including histogram equalization, filtering, unsharp masking, fuzzy logic, transform-based techniques, and advanced machine learning approaches. Machine learning, particularly architectures integrating denoising autoencoders with convolutional neural networks, emerged as highly effective in enhancing image quality without compromising detail. The discussion highlights the success of these techniques in improving mammography images' visual quality. However, challenges such as high noise ratios, inconsistent evaluation metrics, and limited open-source datasets persist. Addressing these issues offers opportunities for future research to further advance mammography image enhancement methodologies.

2024

Reconstruction of Mammography Projections using Image-to-Image Translation Techniques

Authors
Santos, JC; Santos, MS; Abreu, PH;

Publication
32nd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2024, Bruges, Belgium, October 9-11, 2024

Abstract