Publications

Publications by Miriam Seoane Santos

2018

Analysing the Footprint of Classifiers in Overlapped and Imbalanced Contexts

Authors
Mercier, M; Santos, MS; Abreu, PH; Soares, C; Soares, JP; Santos, J;

Publication
Advances in Intelligent Data Analysis XVII - 17th International Symposium, IDA 2018, 's-Hertogenbosch, The Netherlands, October 24-26, 2018, Proceedings

Abstract
It is recognised that the imbalanced data problem is aggravated by other difficulty factors, such as class overlap. Over the years, several research works have focused on this problematic, although presenting two major hitches: the limitation of test domains and the lack of a formulation of the overlap degree, which makes results hard to generalise. This work studies the performance degradation of classifiers with distinct learning biases in overlap and imbalanced contexts, focusing on the characteristics of the test domains (shape, dimensionality and imbalance ratio) and on to what extent our proposed overlapping measure (degOver) is aligned with the performance results observed. Our results show that MLP and CART classifiers are the most robust to high levels of class overlap, even for complex domains, and that KNN and linear SVM are the most aligned with degOver. Furthermore, we found that the dimensionality of data also plays an important role in explaining performance results. © Springer Nature Switzerland AG 2018.

CloseRead Abstract

2020

Reviewing Autoencoders for Missing Data Imputation: Technical Trends, Applications and Outcomes

Authors
Pereira, RC; Santos, MS; Rodrigues, PP; Abreu, PH;

Publication
J. Artif. Intell. Res.

Abstract

2019

MNAR Imputation with Distributed Healthcare Data

Authors
Pereira, RC; Santos, MS; Rodrigues, PP; Abreu, PH;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, PT II

Abstract
Missing data is a problem found in real-world datasets that has a considerable impact on the learning process of classifiers. Although extensive work has been done in this field, the MNAR mechanism still remains a challenge for the existing imputation methods, mainly because it is not related with any observed information. Focusing on healthcare contexts, MNAR is present in multiple scenarios such as clinical trials where the participants may be quitting the study for reasons related to the outcome that is being measured. This work proposes an approach that uses different sources of information from the same healthcare context to improve the imputation quality and classification performance for datasets with missing data under MNAR. The experiment was performed with several databases from the medical context and the results show that the use of multiple sources of data has a positive impact in the imputation error and classification performance. © 2019, Springer Nature Switzerland AG.

CloseRead Abstract

2020

Reviewing Autoencoders for Missing Data Imputation: Technical Trends, Applications and Outcomes

Authors
Pereira, RC; Santos, MS; Rodrigues, PP; Abreu, PH;

Publication
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH

Abstract
Missing data is a problem often found in real-world datasets and it can degrade the performance of most machine learning models. Several deep learning techniques have been used to address this issue, and one of them is the Autoencoder and its Denoising and Variational variants. These models are able to learn a representation of the data with missing values and generate plausible new ones to replace them. This study surveys the use of Autoencoders for the imputation of tabular data and considers 26 works published between 2014 and 2020. The analysis is mainly focused on discussing patterns and recommendations for the architecture, hyperparameters and training settings of the network, while providing a detailed discussion of the results obtained by Autoencoders when compared to other state-of-the-art methods, and of the data contexts where they have been applied. The conclusions include a set of recommendations for the technical settings of the network, and show that Denoising Autoencoders outperform their competitors, particularly the often used statistical methods.

CloseRead Abstract

2022

On the joint-effect of class imbalance and overlap: a critical review

Authors
Santos, MS; Abreu, PH; Japkowicz, N; Fernandez, A; Soares, C; Wilk, S; Santos, J;

Publication
ARTIFICIAL INTELLIGENCE REVIEW

Abstract
Current research on imbalanced data recognises that class imbalance is aggravated by other data intrinsic characteristics, among which class overlap stands out as one of the most harmful. The combination of these two problems creates a new and difficult scenario for classification tasks and has been discussed in several research works over the past two decades. In this paper, we argue that despite some insightful information can be derived from related research, the joint-effect of class overlap and imbalance is still not fully understood, and advocate for the need to move towards a unified view of the class overlap problem in imbalanced domains. To that end, we start by performing a thorough analysis of existing literature on the joint-effect of class imbalance and overlap, elaborating on important details left undiscussed on the original papers, namely the impact of data domains with different characteristics and the behaviour of classifiers with distinct learning biases. This leads to the hypothesis that class overlap comprises multiple representations, which are important to accurately measure and analyse in order to provide a full characterisation of the problem. Accordingly, we devise two novel taxonomies, one for class overlap measures and the other for class overlap-based approaches, both resonating with the distinct representations of class overlap identified. This paper therefore presents a global and unique view on the joint-effect of class imbalance and overlap, from precursor work to recent developments in the field. It meticulously discusses some concepts taken as implicit in previous research, explores new perspectives in light of the limitations found, and presents new ideas that will hopefully inspire researchers to move towards a unified view on the problem and the development of suitable strategies for imbalanced and overlapped domains.

CloseRead Abstract

2023

Bone Metastases Detection in Patients with Breast Cancer: Does Bone Scintigraphy Add Information to PET/CT?

Authors
Santos, JC; Abreu, MH; Santos, MS; Duarte, H; Alpoim, T; Próspero, I; Sousa, S; Abreu, PH;

Publication
ONCOLOGIST

Abstract
This article compares the effectiveness of the PET/CT scan and bone scintigraphy for the detection of bone metastases in patients with breast cancer. Background Positron emission tomography/computed tomography (PET/CT) has become in recent years a tool for breast cancer (BC) staging. However, its accuracy to detect bone metastases is classically considered inferior to bone scintigraphy (BS). The purpose of this work is to compare the effectiveness of bone metastases detection between PET/CT and BS. Materials and Methods Prospective study of 410 female patients treated in a Comprehensive Cancer Center between 2014 and 2020 that performed PET/CT and BS for staging purposes. The image analysis was performed by 2 senior nuclear medicine physicians. The comparison was performed based on accuracy, sensitivity, and specificity on a patient and anatomical region level and was assessed using McNemar's Test. An average ROC was calculated for the anatomical region analysis. Results PET/CT presented higher values of accuracy and sensitivity (98.0% and 93.83%), surpassing BS (95.61% and 81.48%) in detecting bone disease. There was a significant difference in favor of PET/CT (sensitivity 93.83% vs. 81.48%), however, there is no significant difference in eliminating false positives (specificity 99.09% vs. 99.09%). PET/CT presented the highest accuracy and sensitivity values for most of the bone segments, only surpassed by BS for the cranium. There was a significant difference in favor of PET/CT in the upper limb, spine, thorax (sternum) and lower limb (pelvis and sacrum), and in favor of BS in the cranium. The ROC showed that PET/CT has a higher sensitivity and consistency across the bone segments. Conclusion With the correct imaging protocol, PET/CT does not require BS for patients with BC staging.

CloseRead Abstract