Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por Miriam Seoane Santos

2018

Exploring the Effects of Data Distribution in Missing Data Imputation

Autores
Soares, JP; Santos, MS; Abreu, PH; Araújo, H; Santos, JAM;

Publicação
Advances in Intelligent Data Analysis XVII - 17th International Symposium, IDA 2018, 's-Hertogenbosch, The Netherlands, October 24-26, 2018, Proceedings

Abstract

2021

FAWOS: Fairness-Aware Oversampling Algorithm Based on Distributions of Sensitive Attributes

Autores
Salazar, T; Santos, MS; Araújo, H; Abreu, PH;

Publicação
IEEE Access

Abstract

2024

An Interpretable Human-in-the-Loop Process to Improve Medical Image Classification

Autores
Santos, JC; Santos, MS; Abreu, PH;

Publicação
ADVANCES IN INTELLIGENT DATA ANALYSIS XXII, PT I, IDA 2024

Abstract
Medical imaging classification improves patient prognoses by providing information on disease assessment, staging, and treatment response. The high demand for medical imaging acquisition requires the development of effective classification methodologies, occupying deep learning technologies, the pool position for this task. However, the major drawback of such techniques relies on their black-box nature which has delayed their use in real-world scenarios. Interpretability methodologies have emerged as a solution for this problem due to their capacity to translate black-box models into clinical understandable information. The most promising interpretability methodologies are concept-based techniques that can understand the predictions of a deep neural network through user-specified concepts. Concept activation regions and concept activation vectors are concept-based implementations that provide global explanations for the prediction of neural networks. The explanations provided allow the identification of the relationships that the network learned and can be used to identify possible errors during training. In this work, concept activation vectors and concept activation regions are used to identify flaws in neural network training and how this weakness can be mitigated in a human-in-the-loop process automatically improving the performance and trustworthiness of the classifier. To reach such a goal, three phases have been defined: training baseline classifiers, applying the concept-based interpretability, and implementing a human-in-the-loop approach to improve classifier performance. Four medical imaging datasets of different modalities are included in this study to prove the generality of the proposed method. The results identified concepts in each dataset that presented flaws in the classifier training and consequently, the human-in-the-loop approach validated by a team of 2 clinicians team achieved a statistically significant improvement.

2017

Influence of data distribution in missing data imputation

Autores
Santos M.S.; Soares J.P.; Abreu P.H.; Araújo H.; Santos J.;

Publicação
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
Dealing with missing data is a crucial step in the preprocessing stage of most data mining projects. Especially in healthcare contexts, addressing this issue is fundamental, since it may result in keeping or loosing critical patient information that can help physicians in their daily clinical practice. Over the years, many researchers have addressed this problem, basing their approach on the implementation of a set of imputation techniques and evaluating their performance in classification tasks. These classic approaches, however, do not consider some intrinsic data information that could be related to the performance of those algorithms, such as features’ distribution. Establishing a correspondence between data distribution and the most proper imputation method avoids the need of repeatedly testing a large set of methods, since it provides a heuristic on the best choice for each feature in the study. The goal of this work is to understand the relationship between data distribution and the performance of well-known imputation techniques, such as Mean, Decision Trees, k-Nearest Neighbours, Self-Organizing Maps and Support Vector Machines imputation. Several publicly available datasets, all complete, were selected attending to several characteristics such as number of distributions, features and instances. Missing values were artificially generated at different percentages and the imputation methods were evaluated in terms of Predictive and Distributional Accuracy. Our findings show that there is a relationship between features’ distribution and algorithms’ performance, although some factors must be taken into account, such as the number of features per distribution and the missing rate at state.

2023

Bone Metastases Detection in Patients with Breast Cancer: Does Bone Scintigraphy Add Information to PET/CT?

Autores
Santos, JC; Abreu, MH; Santos, MS; Duarte, H; Alpoim, T; Próspero, I; Sousa, S; Abreu, PH;

Publicação
ONCOLOGIST

Abstract
This article compares the effectiveness of the PET/CT scan and bone scintigraphy for the detection of bone metastases in patients with breast cancer. Background Positron emission tomography/computed tomography (PET/CT) has become in recent years a tool for breast cancer (BC) staging. However, its accuracy to detect bone metastases is classically considered inferior to bone scintigraphy (BS). The purpose of this work is to compare the effectiveness of bone metastases detection between PET/CT and BS. Materials and Methods Prospective study of 410 female patients treated in a Comprehensive Cancer Center between 2014 and 2020 that performed PET/CT and BS for staging purposes. The image analysis was performed by 2 senior nuclear medicine physicians. The comparison was performed based on accuracy, sensitivity, and specificity on a patient and anatomical region level and was assessed using McNemar's Test. An average ROC was calculated for the anatomical region analysis. Results PET/CT presented higher values of accuracy and sensitivity (98.0% and 93.83%), surpassing BS (95.61% and 81.48%) in detecting bone disease. There was a significant difference in favor of PET/CT (sensitivity 93.83% vs. 81.48%), however, there is no significant difference in eliminating false positives (specificity 99.09% vs. 99.09%). PET/CT presented the highest accuracy and sensitivity values for most of the bone segments, only surpassed by BS for the cranium. There was a significant difference in favor of PET/CT in the upper limb, spine, thorax (sternum) and lower limb (pelvis and sacrum), and in favor of BS in the cranium. The ROC showed that PET/CT has a higher sensitivity and consistency across the bone segments. Conclusion With the correct imaging protocol, PET/CT does not require BS for patients with BC staging.

2023

A unifying view of class overlap and imbalance: Key concepts, multi-view panorama, and open avenues for research

Autores
Santos, MS; Abreu, PH; Japkowicz, N; Fernández, A; Santos, J;

Publicação
INFORMATION FUSION

Abstract
The combination of class imbalance and overlap is currently one of the most challenging issues in machine learning. While seminal work focused on establishing class overlap as a complicating factor for classification tasks in imbalanced domains, ongoing research mostly concerns the study of their synergy over real-word applications. However, given the lack of a well-formulated definition and measurement of class overlap in real-world domains, especially in the presence of class imbalance, the research community has not yet reached a consensus on the characterisation of both problems. This naturally complicates the evaluation of existing approaches to address these issues simultaneously and prevents future research from moving towards the devise of specialised solutions. In this work, we advocate for a unified view of the problem of class overlap in imbalanced domains. Acknowledging class overlap as the overarching problem - since it has proven to be more harmful for classification tasks than class imbalance - we start by discussing the key concepts associated to its definition, identification, and measurement in real-world domains, while advocating for a characterisation of the problem that attends to multiple sources of complexity. We then provide an overview of existing data complexity measures and establish the link to what specific types of class overlap problems these measures cover, proposing a novel taxonomy of class overlap complexity measures. Additionally, we characterise the relationship between measures, the insights they provide, and discuss to what extent they account for class imbalance. Finally, we systematise the current body of knowledge on the topic across several branches of Machine Learning (Data Analysis, Data Preprocessing, Algorithm Design, and Meta-learning), identifying existing limitations and discussing possible lines for future research.

  • 4
  • 7