Publications

Publications by Pedro Henriques Abreu

2019

Generating Synthetic Missing Data: A Review by Missing Mechanism

Authors
Santos, MS; Pereira, RC; Costa, AF; Soares, JP; Santos, J; Abreu, PH;

Publication
IEEE ACCESS

Abstract
The performance evaluation of imputation algorithms often involves the generation of missing values. Missing values can be inserted in only one feature (univariate configuration) or in several features (multivariate configuration) at different percentages (missing rates) and according to distinct missing mechanisms, namely, missing completely at random, missing at random, and missing not at random. Since the missing data generation process defines the basis for the imputation experiments (configuration, missing rate, and missing mechanism), it is essential that it is appropriately applied; otherwise, conclusions derived from ill-defined setups may be invalid. The goal of this paper is to review the different approaches to synthetic missing data generation found in the literature and discuss their practical details, elaborating on their strengths and weaknesses. Our analysis revealed that creating missing at random and missing not at random scenarios in datasets comprising qualitative features is the most challenging issue in the related work and, therefore, should be the focus of future work in the field.

CloseRead Abstract

2018

Evaluation of Oversampling Data Balancing Techniques in the Context of Ordinal Classification

Authors
Domingues, I; Amorim, JP; Abreu, PH; Duarte, H; Santos, JAM;

Publication
IJCNN

Abstract
Data imbalance is characterized by a discrepancy in the number of examples per class of a dataset. This phenomenon is known to deteriorate the performance of classifiers, since they are less able to learn the characteristics of the less represented classes. For most imbalanced datasets, the application of sampling techniques improves the classifier's performance. For small datasets, oversampling has been shown to be the most appropriate strategy since it augments the original set of samples. Although several oversampling strategies have been proposed and tested over the years, the work has mostly focused on binary or multi-class tasks. Motivated by medical applications, where there is often an order associated with the classes (increasing likelihood of malignancy, for instance), the present work tests some existing oversampling techniques in ordinal contexts. Moreover, four new oversampling techniques are proposed. Experiments were made both on private and public datasets. Private datasets concern the assessment of response to treatment on oncologic diseases. The 15 public datasets were chosen since they are widely used in the literature. Results show that data balance techniques improve classification results on ordinal imbalanced datasets, even when these techniques are not specifically designed for ordinal problems. With our pipeline, better or equal to published results were obtained for 10 out of the 15 public datasets with improvements upon a decrease of 0.43 on MMAE.

CloseRead Abstract

2019

Computer Vision in Esophageal Cancer: A Literature Review

Authors
Domingues, I; Sampaio, IL; Duarte, H; Santos, JAM; Abreu, PH;

Publication
IEEE ACCESS

Abstract
Esophageal cancer is a disease with a high prevalence that can be evaluated by a variety of imaging modalities, including endoscopy, computed tomography, and positron emission tomography. Computer-aided techniques could provide a valuable help in the analysis of these images, decreasing the medical workflow time and human errors. The goal of this paper is to review the existing literature on the application of computer vision techniques in the domain of esophageal cancer. After an initial phase where a set of keywords was chosen, the selected terms were used to retrieve papers from four well-known databases: Web of Science, Scopus, PubMed, and Springer. The results were scanned by merging identical entries, and eliminating the out of scope works, resulting in 47 selected papers. These were organized according to the image modality. Major results were then summarized and compared, and main shortcomings were identified. It could be concluded that, even though the scientific community has already paid attention to the esophageal cancer problem, there are still several open issues. Two majorfindings of this review are the nonexistence of works on MRI data and the under-exploration of recent techniques using deep learning strategies, showing the need for further investigation.

CloseRead Abstract

2023

Evaluating the faithfulness of saliency maps in explaining deep learning models using realistic perturbations

Authors
Amorim, JP; Abreu, PH; Santos, J; Cortes, M; Vila, V;

Publication
INFORMATION PROCESSING & MANAGEMENT

Abstract
Deep Learning has reached human-level performance in several medical tasks including clas-sification of histopathological images. Continuous effort has been made at finding effective strategies to interpret these types of models, among them saliency maps, which depict the weights of the pixels on the classification as an heatmap of intensity values, have been by far the most used for image classification. However, there is a lack of tools for the systematic evaluation of saliency maps, and existing works introduce non-natural noise such as random or uniform values. To address this issue, we propose an approach to evaluate the faithfulness of the saliency maps by introducing natural perturbations in the image, based on oppose-class substitution, and studying their impact on evaluation metrics adapted from saliency models. We validate the proposed approach on a breast cancer metastases detection dataset PatchCamelyon with 327,680 patches of histopathological images of sentinel lymph node sections. Results show that GradCAM, Guided-GradCAM and gradient-based saliency map methods are sensitive to natural perturbations and correlate to the presence of tumor evidence in the image. Overall, this approach proves to be a solution for the validation of saliency map methods without introducing confounding variables and shows potential for application on other medical imaging tasks.

CloseRead Abstract

2023

Interpreting Deep Machine Learning Models: An Easy Guide for Oncologists

Authors
Amorim, JP; Abreu, PH; Fernandez, A; Reyes, M; Santos, J; Abreu, MH;

Publication
IEEE REVIEWS IN BIOMEDICAL ENGINEERING

Abstract
Healthcare agents, in particular in the oncology field, are currently collecting vast amounts of diverse patient data. In this context, some decision-support systems, mostly based on deep learning techniques, have already been approved for clinical purposes. Despite all the efforts in introducing artificial intelligence methods in the workflow of clinicians, its lack of interpretability - understand how the methods make decisions - still inhibits their dissemination in clinical practice. The aim of this article is to present an easy guide for oncologists explaining how these methods make decisions and illustrating the strategies to explain them. Theoretical concepts were illustrated based on oncological examples and a literature review of research works was performed from PubMed between January 2014 to September 2020, using deep learning techniques, interpretability and oncology as keywords. Overall, more than 60% are related to breast, skin or brain cancers and the majority focused on explaining the importance of tumor characteristics (e.g. dimension, shape) in the predictions. The most used computational methods are multilayer perceptrons and convolutional neural networks. Nevertheless, despite being successfully applied in different cancers scenarios, endowing deep learning techniques with interpretability, while maintaining their performance, continues to be one of the greatest challenges of artificial intelligence.

CloseRead Abstract

2019

Multiple-Choice Questions in Programming Courses: Can We Use Them and Are Students Motivated by Them?

Authors
Abreu, PH; Silva, DC; Gomes, A;

Publication
ACM TRANSACTIONS ON COMPUTING EDUCATION

Abstract
Low performance of nontechnical engineering students in programming courses is a problem that remains unsolved. Over the years, many authors have tried to identify the multiple causes for that failure, but there is unanimity on the fact that motivation is a key factor for the acquisition of knowledge by students. To better understand motivation, a new evaluation strategy has been adopted in a second programming course of a nontechnical degree, consisting of 91 students. The goals of the study were to identify if those students felt more motivated to answer multiple-choice questions in comparison to development questions, and what type of question better allows for testing student knowledge acquisition. Possibilities around the motivational qualities of multiple-choice questions in programming courses will be discussed in light of the results. In conclusion, it seems clear that student performance varies according to the type of question. Our study points out that multiple-choice questions can be seen as a motivational factor for engineering students and it might also be a good way to test acquired programming concepts. Therefore, this type of question could be further explored in the evaluation points.

CloseRead Abstract