Publicacoes - INESC TEC

Publicações

Publicações por Pedro Henriques Abreu

2019

Denial of Service Attacks: Detecting the Frailties of Machine Learning Algorithms in the Classification Process

Autores
Frazao, I; Abreu, PH; Cruz, T; Araújo, H; Simoes, P;

Publicação
CRITICAL INFORMATION INFRASTRUCTURES SECURITY (CRITIS 2018)

Abstract
Denial of Service attacks, which have become commonplace on the Information and Communications Technologies domain, constitute a class of threats whose main objective is to degrade or disable a service or functionality on a target. The increasing reliance of Cyber-Physical Systems upon these technologies, together with their progressive interconnection with other infrastructure and/or organizational domains, has contributed to increase their exposure to these attacks, with potentially catastrophic consequences. Despite the potential impact of such attacks, the lack of generality regarding the related works in the attack prevention and detection fields has prevented its application in real-world scenarios. This paper aims at reducing that effect by analyzing the behavior of classification algorithms with different dataset characteristics.

FecharLer Abstract

2014

An Interface for Fitness Function Design

Autores
Machado, P; Martins, T; Amaro, H; Abreu, PH;

Publicação
EvoMUSART

Abstract
Fitness assignment is one of the biggest challenges in evolutionary art. Interactive evolutionary computation approaches put a significant burden on the user, leading to human fatigue. On the other hand, autonomous evolutionary art systems usually fail to give the users the opportunity to express and convey their artistic goals and preferences. Our approach empowers the users by allowing them to express their intentions through the design of fitness functions. We present a novel responsive interface for designing fitness function in the scope of evolutionary ant paintings. Once the evolutionary runs are concluded, further control is given to the users by allowing them to specify the rendering details of selected pieces. The analysis of the experimental results highlights how fitness function design influences the outcomes of the evolutionary runs, conveying the intentions of the user and enabling the evolution of a wide variety of images. © 2014 Springer-Verlag.

FecharLer Abstract

2014

Overall survival prediction for women breast cancer using ensemble methods and incomplete clinical data

Autores
Abreu, PH; Amaro, H; Silva, DC; Machado, P; Abreu, MH; Afonso, N; Dourado, A;

Publicação
IFMBE Proceedings

Abstract
Breast Cancer is the most common type of cancer in women worldwide. In spite of this fact, there are insufficient studies that, using data mining techniques, are capable of helping medical doctors in their daily practice. This paper presents a comparative study of three ensemble methods (TreeBagger, LPBoost and Subspace) using a clinical dataset with 25% missing values to predict the overall survival of women with breast cancer. To complete the absent values, the k-nearest neighbor (k-NN) algorithm was used with four distinct neighbor values, trying to determine the best one for this particular scenario. Tests were performed for each of the three ensemble methods and each k-NN configuration, and their performance compared using a Friedman test. Despite the complexity of this challenge, the produced results are promising and the best algorithmconfiguration (TreeBagger using 3 neighbors) presents a prediction accuracy of 73%. © Springer International Publishing Switzerland 2014.

FecharLer Abstract

2015

A new cluster-based oversampling method for improving survival prediction of hepatocellular carcinoma patients

Autores
Santos, MS; Abreu, PH; García Laencina, PJ; Simao, A; Carvalho, A;

Publicação
JOURNAL OF BIOMEDICAL INFORMATICS

Abstract
Liver cancer is the sixth most frequently diagnosed cancer and, particularly, Hepatocellular Carcinoma (HCC) represents more than 90% of primary liver cancers. Clinicians assess each patient's treatment on the basis of evidence-based medicine, which may not always apply to a specific patient, given the biological variability among individuals. Over the years, and for the particular case of Hepatocellular Carcinoma, some research studies have been developing strategies for assisting clinicians in decision making, using computational methods (e.g. machine learning techniques) to extract knowledge from the clinical data. However, these studies have some limitations that have not yet been addressed: some do not focus entirely on Hepatocellular Carcinoma patients, others have strict application boundaries, and none considers the heterogeneity between patients nor the presence of missing data, a common drawback in healthcare contexts. In this work, a real complex Hepatocellular Carcinoma database composed of heterogeneous clinical features is studied. We propose a new cluster-based oversampling approach robust to small and imbalanced datasets, which accounts for the heterogeneity of patients with Hepatocellular Carcinoma. The preprocessing procedures of this work are based on data imputation considering appropriate distance metrics for both heterogeneous and missing data (HEOM) and clustering studies to assess the underlying patient groups in the studied dataset (K-means). The final approach is applied in order to diminish the impact of underlying patient profiles with reduced sizes on survival prediction. It is based on K-means clustering and the SMOTE algorithm to build a representative dataset and use it as training example for different machine learning procedures (logistic regression and neural networks). The results are evaluated in terms of survival prediction and compared across baseline approaches that do not consider clustering and/or oversampling using the Friedman rank test. Our proposed methodology coupled with neural networks outperformed all others, suggesting an improvement over the classical approaches currently used in Hepatocellular Carcinoma prediction models.

FecharLer Abstract

2023

Automatic Delta-Adjustment Method Applied to Missing Not At Random Imputation

Autores
Pereira, RC; Rodrigues, PP; Figueiredo, MAT; Abreu, PH;

Publicação
ICCS (1)

Abstract
Missing data can be described by the absence of values in a dataset, which can be a critical issue in domains such as healthcare. A common solution for this problem is imputation, where the missing values are replaced by estimations. Most imputation methods are suitable for the Missing Completely At Random (MCAR) and Missing At Random (MAR) mechanisms but produce biased results for Missing Not At Random (MNAR) values. An effective approach to mitigate this bias effect is to use the delta-adjustment method. This method assumes the imputation is performed for the MAR mechanism and adjusts the imputed values to become valid under MNAR assumptions by applying a correction factor. Such adjustment is usually defined manually by a domain expert, which often makes this method unfeasible. In this work, we propose an automatic procedure to find an approximate delta adjustment value for every feature of the dataset, which we call Automatic Delta-Adjustment Method. The proposed procedure is validated in an experimental setup comprising 10 datasets of the healthcare domain injected with MNAR values. The results from seven state-of-the-art imputation methods are compared with and without the adjustment, and applying the correction provides a significantly lower imputation error for all methods.

FecharLer Abstract

2023

Siamese Autoencoder-Based Approach for Missing Data Imputation

Autores
Pereira, RC; Abreu, PH; Rodrigues, PP;

Publicação
ICCS (1)

Abstract
Missing data is an issue that can negatively impact any task performed with the available data and it is often found in real-world domains such as healthcare. One of the most common strategies to address this issue is to perform imputation, where the missing values are replaced by estimates. Several approaches based on statistics and machine learning techniques have been proposed for this purpose, including deep learning architectures such as generative adversarial networks and autoencoders. In this work, we propose a novel siamese neural network suitable for missing data imputation, which we call Siamese Autoencoder-based Approach for Imputation (SAEI). Besides having a deep autoencoder architecture, SAEI also has a custom loss function and triplet mining strategy that are tailored for the missing data issue. The proposed SAEI approach is compared to seven state-of-the-art imputation methods in an experimental setup that comprises 14 heterogeneous datasets of the healthcare domain injected with Missing Not At Random values at a rate between 10% and 60%. The results show that SAEI significantly outperforms all the remaining imputation methods for all experimented settings, achieving an average improvement of 35%.

FecharLer Abstract