2022
Authors
Santos, MS; Abreu, PH; Fernandez, A; Luengo, J; Santos, J;
Publication
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE
Abstract
This work performs an in-depth study of the impact of distance functions on K-Nearest Neighbours imputation of heterogeneous datasets. Missing data is generated at several percentages, on a large benchmark of 150 datasets (50 continuous, 50 categorical and 50 heterogeneous datasets) and data imputation is performed using different distance functions (HEOM, HEOM-R, HVDM, HVDM-R, HVDM-S, MDE and SIMDIST) and k values (1, 3, 5 and 7). The impact of distance functions on kNN imputation is then evaluated in terms of classification performance, through the analysis of a classifier learned from the imputed data, and in terms of imputation quality, where the quality of the reconstruction of the original values is assessed. By analysing the properties of heterogeneous distance functions over continuous and categorical datasets individually, we then study their behaviour over heterogeneous data. We discuss whether datasets with different natures may benefit from different distance functions and to what extent the component of a distance function that deals with missing values influences such choice. Our experiments show that missing data has a significant impact on distance computation and the obtained results provide guidelines on how to choose appropriate distance functions depending on data characteristics (continuous, categorical or heterogeneous datasets) and the objective of the study (classification or imputation tasks).
2017
Authors
Nogueira, MA; Abreu, PH; Martins, P; Machado, P; Duarte, H; Santos, J;
Publication
ARTIFICIAL INTELLIGENCE REVIEW
Abstract
Clinical decisions are sometimes based on a variety of patient's information such as: age, weight or information extracted from image exams, among others. Depending on the nature of the disease or anatomy, clinicians can base their decisions on different image exams like mammographies, positron emission tomography scans or magnetic resonance images. However, the analysis of those exams is far from a trivial task. Over the years, the use of image descriptors-computational algorithms that present a summarized description of image regions-became an important tool to assist the clinician in such tasks. This paper presents an overview of the use of image descriptors in healthcare contexts, attending to different image exams. In the making of this review, we analyzed over 70 studies related to the application of image descriptors of different natures-e.g., intensity, texture, shape-in medical image analysis. Four imaging modalities are featured: mammography, PET, CT and MRI. Pathologies typically covered by these modalities are addressed: breast masses and microcalcifications in mammograms, head and neck cancer and Alzheimer's disease in the case of PET images, lung nodules regarding CTs and multiple sclerosis and brain tumors in the MRI section.
2019
Authors
Santos, MS; Pereira, RC; Costa, AF; Soares, JP; Santos, J; Abreu, PH;
Publication
IEEE ACCESS
Abstract
The performance evaluation of imputation algorithms often involves the generation of missing values. Missing values can be inserted in only one feature (univariate configuration) or in several features (multivariate configuration) at different percentages (missing rates) and according to distinct missing mechanisms, namely, missing completely at random, missing at random, and missing not at random. Since the missing data generation process defines the basis for the imputation experiments (configuration, missing rate, and missing mechanism), it is essential that it is appropriately applied; otherwise, conclusions derived from ill-defined setups may be invalid. The goal of this paper is to review the different approaches to synthetic missing data generation found in the literature and discuss their practical details, elaborating on their strengths and weaknesses. Our analysis revealed that creating missing at random and missing not at random scenarios in datasets comprising qualitative features is the most challenging issue in the related work and, therefore, should be the focus of future work in the field.
2018
Authors
Domingues, I; Amorim, JP; Abreu, PH; Duarte, H; Santos, JAM;
Publication
2018 International Joint Conference on Neural Networks, IJCNN 2018, Rio de Janeiro, Brazil, July 8-13, 2018
Abstract
Data imbalance is characterized by a discrepancy in the number of examples per class of a dataset. This phenomenon is known to deteriorate the performance of classifiers, since they are less able to learn the characteristics of the less represented classes. For most imbalanced datasets, the application of sampling techniques improves the classifier's performance. For small datasets, oversampling has been shown to be the most appropriate strategy since it augments the original set of samples. Although several oversampling strategies have been proposed and tested over the years, the work has mostly focused on binary or multi-class tasks. Motivated by medical applications, where there is often an order associated with the classes (increasing likelihood of malignancy, for instance), the present work tests some existing oversampling techniques in ordinal contexts. Moreover, four new oversampling techniques are proposed. Experiments were made both on private and public datasets. Private datasets concern the assessment of response to treatment on oncologic diseases. The 15 public datasets were chosen since they are widely used in the literature. Results show that data balance techniques improve classification results on ordinal imbalanced datasets, even when these techniques are not specifically designed for ordinal problems. With our pipeline, better or equal to published results were obtained for 10 out of the 15 public datasets with improvements upon a decrease of 0.43 on MMAE.
2019
Authors
Domingues, I; Sampaio, IL; Duarte, H; Santos, JAM; Abreu, PH;
Publication
IEEE ACCESS
Abstract
Esophageal cancer is a disease with a high prevalence that can be evaluated by a variety of imaging modalities, including endoscopy, computed tomography, and positron emission tomography. Computer-aided techniques could provide a valuable help in the analysis of these images, decreasing the medical workflow time and human errors. The goal of this paper is to review the existing literature on the application of computer vision techniques in the domain of esophageal cancer. After an initial phase where a set of keywords was chosen, the selected terms were used to retrieve papers from four well-known databases: Web of Science, Scopus, PubMed, and Springer. The results were scanned by merging identical entries, and eliminating the out of scope works, resulting in 47 selected papers. These were organized according to the image modality. Major results were then summarized and compared, and main shortcomings were identified. It could be concluded that, even though the scientific community has already paid attention to the esophageal cancer problem, there are still several open issues. Two majorfindings of this review are the nonexistence of works on MRI data and the under-exploration of recent techniques using deep learning strategies, showing the need for further investigation.
2023
Authors
Amorim, JP; Abreu, PH; Santos, J; Cortes, M; Vila, V;
Publication
INFORMATION PROCESSING & MANAGEMENT
Abstract
Deep Learning has reached human-level performance in several medical tasks including clas-sification of histopathological images. Continuous effort has been made at finding effective strategies to interpret these types of models, among them saliency maps, which depict the weights of the pixels on the classification as an heatmap of intensity values, have been by far the most used for image classification. However, there is a lack of tools for the systematic evaluation of saliency maps, and existing works introduce non-natural noise such as random or uniform values. To address this issue, we propose an approach to evaluate the faithfulness of the saliency maps by introducing natural perturbations in the image, based on oppose-class substitution, and studying their impact on evaluation metrics adapted from saliency models. We validate the proposed approach on a breast cancer metastases detection dataset PatchCamelyon with 327,680 patches of histopathological images of sentinel lymph node sections. Results show that GradCAM, Guided-GradCAM and gradient-based saliency map methods are sensitive to natural perturbations and correlate to the presence of tumor evidence in the image. Overall, this approach proves to be a solution for the validation of saliency map methods without introducing confounding variables and shows potential for application on other medical imaging tasks.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.