Publications

Publications by Pedro Henriques Abreu

2014

An Interface for Fitness Function Design

Authors
Machado, P; Martins, T; Amaro, H; Abreu, PH;

Publication
Evolutionary and Biologically Inspired Music, Sound, Art and Design - Third European Conference, EvoMUSART 2014, Granada, Spain, April 23-25, 2014, Revised Selected Papers

Abstract
Fitness assignment is one of the biggest challenges in evolutionary art. Interactive evolutionary computation approaches put a significant burden on the user, leading to human fatigue. On the other hand, autonomous evolutionary art systems usually fail to give the users the opportunity to express and convey their artistic goals and preferences. Our approach empowers the users by allowing them to express their intentions through the design of fitness functions. We present a novel responsive interface for designing fitness function in the scope of evolutionary ant paintings. Once the evolutionary runs are concluded, further control is given to the users by allowing them to specify the rendering details of selected pieces. The analysis of the experimental results highlights how fitness function design influences the outcomes of the evolutionary runs, conveying the intentions of the user and enabling the evolution of a wide variety of images. © 2014 Springer-Verlag.

CloseRead Abstract

2014

Overall survival prediction for women breast cancer using ensemble methods and incomplete clinical data

Authors
Abreu, PH; Amaro, H; Silva, DC; Machado, P; Abreu, MH; Afonso, N; Dourado, A;

Publication
IFMBE Proceedings

Abstract
Breast Cancer is the most common type of cancer in women worldwide. In spite of this fact, there are insufficient studies that, using data mining techniques, are capable of helping medical doctors in their daily practice. This paper presents a comparative study of three ensemble methods (TreeBagger, LPBoost and Subspace) using a clinical dataset with 25% missing values to predict the overall survival of women with breast cancer. To complete the absent values, the k-nearest neighbor (k-NN) algorithm was used with four distinct neighbor values, trying to determine the best one for this particular scenario. Tests were performed for each of the three ensemble methods and each k-NN configuration, and their performance compared using a Friedman test. Despite the complexity of this challenge, the produced results are promising and the best algorithmconfiguration (TreeBagger using 3 neighbors) presents a prediction accuracy of 73%. © Springer International Publishing Switzerland 2014.

CloseRead Abstract

2015

A new cluster-based oversampling method for improving survival prediction of hepatocellular carcinoma patients

Authors
Santos, MS; Abreu, PH; Garcia Laencina, PJ; Simao, A; Carvalho, A;

Publication
JOURNAL OF BIOMEDICAL INFORMATICS

Abstract
Liver cancer is the sixth most frequently diagnosed cancer and, particularly, Hepatocellular Carcinoma (HCC) represents more than 90% of primary liver cancers. Clinicians assess each patient's treatment on the basis of evidence-based medicine, which may not always apply to a specific patient, given the biological variability among individuals. Over the years, and for the particular case of Hepatocellular Carcinoma, some research studies have been developing strategies for assisting clinicians in decision making, using computational methods (e.g. machine learning techniques) to extract knowledge from the clinical data. However, these studies have some limitations that have not yet been addressed: some do not focus entirely on Hepatocellular Carcinoma patients, others have strict application boundaries, and none considers the heterogeneity between patients nor the presence of missing data, a common drawback in healthcare contexts. In this work, a real complex Hepatocellular Carcinoma database composed of heterogeneous clinical features is studied. We propose a new cluster-based oversampling approach robust to small and imbalanced datasets, which accounts for the heterogeneity of patients with Hepatocellular Carcinoma. The preprocessing procedures of this work are based on data imputation considering appropriate distance metrics for both heterogeneous and missing data (HEOM) and clustering studies to assess the underlying patient groups in the studied dataset (K-means). The final approach is applied in order to diminish the impact of underlying patient profiles with reduced sizes on survival prediction. It is based on K-means clustering and the SMOTE algorithm to build a representative dataset and use it as training example for different machine learning procedures (logistic regression and neural networks). The results are evaluated in terms of survival prediction and compared across baseline approaches that do not consider clustering and/or oversampling using the Friedman rank test. Our proposed methodology coupled with neural networks outperformed all others, suggesting an improvement over the classical approaches currently used in Hepatocellular Carcinoma prediction models.

CloseRead Abstract

2024

Siamese Autoencoder Architecture for the Imputation of Data Missing Not at Random

Authors
Pereira, RC; Abreu, PH; Rodrigues, PP;

Publication
JOURNAL OF COMPUTATIONAL SCIENCE

Abstract
Missing data is an issue that can negatively impact any task performed with the available data and it is often found in real -world domains such as healthcare. One of the most common strategies to address this issue is to perform imputation, where the missing values are replaced by estimates. Several approaches based on statistics and machine learning techniques have been proposed for this purpose, including deep learning architectures such as generative adversarial networks and autoencoders. In this work, we propose a novel siamese neural network suitable for missing data imputation, which we call Siamese Autoencoder-based Approach for Imputation (SAEI). Besides having a deep autoencoder architecture, SAEI also has a custom loss function and triplet mining strategy that are tailored for the missing data issue. The proposed SAEI approach is compared to seven state-of-the-art imputation methods in an experimental setup that comprises 14 heterogeneous datasets of the healthcare domain injected with Missing Not At Random values at a rate between 10% and 60%. The results show that SAEI significantly outperforms all the remaining imputation methods for all experimented settings, achieving an average improvement of 35%. This work is an extension of the article Siamese Autoencoder-Based Approach for Missing Data Imputation [1] presented at the International Conference on Computational Science 2023. It includes new experiments focused on runtime, generalization capabilities, and the impact of the imputation in classification tasks, where the results show that SAEI is the imputation method that induces the best classification results, improving the F1 scores for 50% of the used datasets.

CloseRead Abstract

2024

Imputation of data Missing Not at Random: Artificial generation and benchmark analysis

Authors
Pereira, RC; Abreu, PH; Rodrigues, PP; Figueiredo, MAT;

Publication
EXPERT SYSTEMS WITH APPLICATIONS

Abstract
Experimental assessment of different missing data imputation methods often compute error rates between the original values and the estimated ones. This experimental setup relies on complete datasets that are injected with missing values. The injection process is straightforward for the Missing Completely At Random and Missing At Random mechanisms; however, the Missing Not At Random mechanism poses a major challenge, since the available artificial generation strategies are limited. Furthermore, the studies focused on this latter mechanism tend to disregard a comprehensive baseline of state-of-the-art imputation methods. In this work, both challenges are addressed: four new Missing Not At Random generation strategies are introduced and a benchmark study is conducted to compare six imputation methods in an experimental setup that covers 10 datasets and five missingness levels (10% to 80%). The overall findings are that, for most missing rates and datasets, the best imputation method to deal with Missing Not At Random values is the Multiple Imputation by Chained Equations, whereas for higher missingness rates autoencoders show promising results.

CloseRead Abstract

2024

An Interpretable Human-in-the-Loop Process to Improve Medical Image Classification

Authors
Santos, JC; Santos, MS; Abreu, PH;

Publication
ADVANCES IN INTELLIGENT DATA ANALYSIS XXII, PT I, IDA 2024

Abstract
Medical imaging classification improves patient prognoses by providing information on disease assessment, staging, and treatment response. The high demand for medical imaging acquisition requires the development of effective classification methodologies, occupying deep learning technologies, the pool position for this task. However, the major drawback of such techniques relies on their black-box nature which has delayed their use in real-world scenarios. Interpretability methodologies have emerged as a solution for this problem due to their capacity to translate black-box models into clinical understandable information. The most promising interpretability methodologies are concept-based techniques that can understand the predictions of a deep neural network through user-specified concepts. Concept activation regions and concept activation vectors are concept-based implementations that provide global explanations for the prediction of neural networks. The explanations provided allow the identification of the relationships that the network learned and can be used to identify possible errors during training. In this work, concept activation vectors and concept activation regions are used to identify flaws in neural network training and how this weakness can be mitigated in a human-in-the-loop process automatically improving the performance and trustworthiness of the classifier. To reach such a goal, three phases have been defined: training baseline classifiers, applying the concept-based interpretability, and implementing a human-in-the-loop approach to improve classifier performance. Four medical imaging datasets of different modalities are included in this study to prove the generality of the proposed method. The results identified concepts in each dataset that presented flaws in the classifier training and consequently, the human-in-the-loop approach validated by a team of 2 clinicians team achieved a statistically significant improvement.

CloseRead Abstract