Publicacoes - INESC TEC

Publicações

Publicações por CRACS

2024

Floralens: a Deep Learning Model for the Portuguese Native Flora

Autores
Filgueiras, A; Marques, ERB; Lopes, LMB; Marques, M; Silva, H;

Publicação
CoRR

Abstract

2024

Yet Another Lock-Free Atom Table Design for Scalable Symbol Management in Prolog

Autores
Moreno, P; Areias, M; Rocha, R; Costa, VS;

Publicação
INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING

Abstract
Prolog systems rely on an atom table for symbol management, which is usually implemented as a dynamically resizeable hash table. This is ideal for single threaded execution, but can become a bottleneck in a multi-threaded scenario. In this work, we replace the original atom table implementation in the YAP Prolog system with a lock-free hash-based data structure, named Lock-free Hash Tries (LFHT), in order to provide efficient and scalable symbol management. Being lock-free, the new implementation also provides better guarantees, namely, immunity to priority inversion, to deadlocks and to livelocks. Performance results show that the new lock-free LFHT implementation has better results in single threaded execution and much better scalability than the original lock based dynamically resizing hash table.

FecharLer Abstract

2024

The Impact of Feature Selection on Balancing, Based on Diabetes Data

Autores
Machado, D; Costa, VS; Brandao, P;

Publicação
BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES, BIOSTEC 2023

Abstract
Diabetes management data is composed of diverse factors and glycaemia indicators. Glycaemia predictive models tend to focus solely on glycaemia values. A comprehensive understanding of diabetes management requires the consideration of several aspects of diabetes management, beyond glycaemia. However, the inclusion of every aspect of diabetes management can create an overly high-dimensional data set. Excessive feature spaces increase computational complexity and may introduce over-fitting. Additionally, the inclusion of inconsequential features introduces noise that hinders a model's performance. Feature importance is a process that evaluates a feature's value, and can be used to identify optimal feature sub-sets. Depending on the context, multiple methods can be used. The drop feature method, in the literature, is considered to be the best approach to evaluate individual feature importance. To reach an optimal set, the best approach is branch and bound, albeit its heavy computational cost. This overhead can be addressed through a trade-off between the feature set's optimisation level and the process' computational feasibility. The improvement of the feature space has implications on the effectiveness of data balancing approaches. Whilst, in this study, the observed impact was not substantial, it warrants the need to reconsider the balancing approach given a superior feature space.

FecharLer Abstract

2024

Unveiling Malicious Network Flows Using Benford's Law

Autores
Fernandes, P; Ciardhuáin, SO; Antunes, M;

Publicação
MATHEMATICS

Abstract
The increasing proliferation of cyber-attacks threatening the security of computer networks has driven the development of more effective methods for identifying malicious network flows. The inclusion of statistical laws, such as Benford's Law, and distance functions, applied to the first digits of network flow metadata, such as IP addresses or packet sizes, facilitates the detection of abnormal patterns in the digits. These techniques also allow for quantifying discrepancies between expected and suspicious flows, significantly enhancing the accuracy and speed of threat detection. This paper introduces a novel method for identifying and analyzing anomalies within computer networks. It integrates Benford's Law into the analysis process and incorporates a range of distance functions, namely the Mean Absolute Deviation (MAD), the Kolmogorov-Smirnov test (KS), and the Kullback-Leibler divergence (KL), which serve as dispersion measures for quantifying the extent of anomalies detected in network flows. Benford's Law is recognized for its effectiveness in identifying anomalous patterns, especially in detecting irregularities in the first digit of the data. In addition, Bayes' Theorem was implemented in conjunction with the distance functions to enhance the detection of malicious traffic flows. Bayes' Theorem provides a probabilistic perspective on whether a traffic flow is malicious or benign. This approach is characterized by its flexibility in incorporating new evidence, allowing the model to adapt to emerging malicious behavior patterns as they arise. Meanwhile, the distance functions offer a quantitative assessment, measuring specific differences between traffic flows, such as frequency, packet size, time between packets, and other relevant metadata. Integrating these techniques has increased the model's sensitivity in detecting malicious flows, reducing the number of false positives and negatives, and enhancing the resolution and effectiveness of traffic analysis. Furthermore, these techniques expedite decisions regarding the nature of traffic flows based on a solid statistical foundation and provide a better understanding of the characteristics that define these flows, contributing to the comprehension of attack vectors and aiding in preventing future intrusions. The effectiveness and applicability of this joint method have been demonstrated through experiments with the CICIDS2017 public dataset, which was explicitly designed to simulate real scenarios and provide valuable information to security professionals when analyzing computer networks. The proposed methodology opens up new perspectives in investigating and detecting anomalies and intrusions in computer networks, which are often attributed to cyber-attacks. This development culminates in creating a promising model that stands out for its effectiveness and speed, accurately identifying possible intrusions with an F1 of nearly 80%, a recall of 99.42%, and an accuracy of 65.84%.

FecharLer Abstract

2024

Dvorak: A Browser Credential Dumping Malware

Autores
Areia, J; Santos, B; Antunes, M;

Publicação
SECRYPT

Abstract
Memorising passwords poses a significant challenge for individuals, leading to the increasing adoption of password managers, particularly browser password managers. Despite their benefits to users’ daily routines, the use of these tools introduces new vulnerabilities to web and network security. This paper aims to investigate these vulnerabilities and analyse the security mechanisms of browser-based password managers integrated into Google Chrome, Microsoft Edge, Opera GX, Mozilla Firefox, and Brave. Through malware development and deployment, Dvorak is capable of extracting essential files from the browser’s password manager for subsequent decryption. To assess Dvorak functionalities we conducted a controlled security analysis across all aforementioned browsers. Our findings reveal that the designed malware successfully retrieves all stored passwords from the tested browsers when no master password is used. However, the results differ depending on whether a master password is used. A comparison between browsers is made, based on the results of the malware. The paper ends with recommendations for potential strategies to mitigate these security concerns.

FecharLer Abstract

2024

Uncovering Manipulated Files Using Mathematical Natural Laws

Autores
Fernandes, P; Ciardhuáin, SO; Antunes, M;

Publicação
PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2023, PT I

Abstract
The data exchange between different sectors of society has led to the development of electronic documents supported by different reading formats, namely portable PDF format. These documents have characteristics similar to those used in programming languages, allowing the incorporation of potentially malicious code, which makes them a vector for cyberattacks. Thus, detecting anomalies in digital documents, such as PDF files, has become crucial in several domains, such as finance, digital forensic analysis and law enforcement. Currently, detection methods are mostly based on machine learning and are characterised by being complex, slow and mainly inefficient in detecting zero-day attacks. This paper aims to propose a Benford Law (BL) based model to uncover manipulated PDF documents by analysing potential anomalies in the first digit extracted from the PDF document's characteristics. The proposed model was evaluated using the CIC Evasive PDFMAL-2022 dataset, consisting of 1191 documents (278 benign and 918 malicious). To classify the PDF documents, based on BL, into malicious or benign documents, three statistical models were used in conjunction with the mean absolute deviation: the parametric Pearson and the non-parametric Spearman and Cramer-Von Mises models. The results show a maximum F1 score of 87.63% in detecting malicious documents using Pearson's model, demonstrating the suitability and effectiveness of applying Benford's Law in detecting anomalies in digital documents to maintain the accuracy and integrity of information and promoting trust in systems and institutions.

FecharLer Abstract