Publications

Publications by Mário João Antunes

2025

An Automated Repository for the Efficient Management of Complex Documentation

Authors
Frade, J; Antunes, M;

Publication
INFORMATION

Abstract
The accelerating digitalization of the public and private sectors has made information technologies (IT) indispensable in modern life. As services shift to digital platforms and technologies expand across industries, the complexity of legal, regulatory, and technical requirement documentation is growing rapidly. This increase presents significant challenges in managing, gathering, and analyzing documents, as their dispersion across various repositories and formats hinders accessibility and efficient processing. This paper presents the development of an automated repository designed to streamline the collection, classification, and analysis of cybersecurity-related documents. By harnessing the capabilities of natural language processing (NLP) models-specifically Generative Pre-Trained Transformer (GPT) technologies-the system automates text ingestion, extraction, and summarization, providing users with visual tools and organized insights into large volumes of data. The repository facilitates the efficient management of evolving cybersecurity documentation, addressing issues of accessibility, complexity, and time constraints. This paper explores the potential applications of NLP in cybersecurity documentation management and highlights the advantages of integrating automated repositories equipped with visualization and search tools. By focusing on legal documents and technical guidelines from Portugal and the European Union (EU), this applied research seeks to enhance cybersecurity governance, streamline document retrieval, and deliver actionable insights to professionals. Ultimately, the goal is to develop a scalable, adaptable platform capable of extending beyond cybersecurity to serve other industries that rely on the effective management of complex documentation.

CloseRead Abstract

2025

Multi-Class Intrusion Detection in Internet of Vehicles: Optimizing Machine Learning Models on Imbalanced Data

Authors
Palma, A; Antunes, M; Bernardino, J; Alves, A;

Publication
FUTURE INTERNET

Abstract
The Internet of Vehicles (IoV) presents complex cybersecurity challenges, particularly against Denial-of-Service (DoS) and spoofing attacks targeting the Controller Area Network (CAN) bus. This study leverages the CICIoV2024 dataset, comprising six distinct classes of benign traffic and various types of attacks, to evaluate advanced machine learning techniques for instrusion detection systems (IDS). The models XGBoost, Random Forest, AdaBoost, Extra Trees, Logistic Regression, and Deep Neural Network were tested under realistic, imbalanced data conditions, ensuring that the evaluation reflects real-world scenarios where benign traffic dominates. Using hyperparameter optimization with Optuna, we achieved significant improvements in detection accuracy and robustness. Ensemble methods such as XGBoost and Random Forest consistently demonstrated superior performance, achieving perfect accuracy and macro-average F1-scores, even when detecting minority attack classes, in contrast to previous results for the CICIoV2024 dataset. The integration of optimized hyperparameter tuning and a broader methodological scope culminated in an IDS framework capable of addressing diverse attack scenarios with exceptional precision.

CloseRead Abstract

2025

Distance-based feature selection using Benford's law for malware detection

Authors
Fernandes, P; Ciardhuáin, SO; Antunes, M;

Publication
COMPUTERS & SECURITY

Abstract
Detecting malware in computer networks and data streams from Android devices remains a critical challenge for cybersecurity researchers. While machine learning and deep learning techniques have shown promising results, these approaches often require large volumes of labelled data, offer limited interpretability, and struggle to adapt to sophisticated threats such as zero-day attacks. Moreover, their high computational requirements restrict their applicability in resource-constrained environments. This research proposes an innovative approach that advances the state of the art by offering practical solutions for dynamic and data-limited security scenarios. By integrating natural statistical laws, particularly Benford's law, with dissimilarity functions, a lightweight, fast, and scalable model is developed that eliminates the need for extensive training and large labelled datasets while improving resilience to data imbalance and scalability for large-scale cybersecurity applications. Although Benford's law has demonstrated potential in anomaly detection, its effectiveness is limited by the difficulty of selecting relevant features. To overcome this, the study combines Benford's law with several distance functions, including Median Absolute Deviation, Kullback-Leibler divergence, Euclidean distance, and Pearson correlation, enabling statistically grounded feature selection. Additional metrics, such as the Kolmogorov test, Jensen-Shannon divergence, and Z statistics, were used for model validation. This approach quantifies discrepancies between expected and observed distributions, addressing classic feature selection challenges like redundancy and imbalance. Validated on both balanced and unbalanced datasets, the model achieved strong results: 88.30% accuracy and 85.08% F1-score in the balanced set, 92.75% accuracy and 95.29% F1-score in the unbalanced set. The integration of Benford's law with distance functions significantly reduced false positives and negatives. Compared to traditional Machine Learning methods, which typically require extensive training and large datasets to achieve F1 scores between 92% and 99%, the proposed approach delivers competitive performance while enhancing computational efficiency, robustness, and interpretability. This balance makes it a practical and scalable alternative for real-time or resource-constrained cybersecurity environments.

CloseRead Abstract

2025

Enhancing IoMT Security by Using Benford's Law and Distance Functions

Authors
Fernandes, P; Ciardhuáin, SO; Antunes, M;

Publication
Pattern Recognition and Image Analysis - 12th Iberian Conference, IbPRIA 2025, Coimbra, Portugal, June 30 - July 3, 2025, Proceedings, Part I

Abstract
The increasing connectivity of Internet of Medical Things (IoMT) devices has accentuated their susceptibility to cyberattacks. The sensitive data they handle makes them prime targets for information theft and extortion, while outdated and insecure communication protocols further elevate security risks. This paper presents a lightweight and innovative approach that combines Benford’s law with statistical distance functions to detect attacks in IoMT devices. The methodology uses Benford’s law to analyze digit frequency and classify IoMT devices traffic as benign or malicious, regardless of attack type. It employs distance-based statistical functions like Jensen-Shannon divergence, Kullback-Leibler divergence, Pearson correlation, and the Kolmogorov test to detect anomalies. Experimental validation was conducted on the CIC-IoMT-2024 benchmark dataset, comprising 45 features and multiple attack types. The best performance was achieved with the Kolmogorov test (a=0.01), particularly in DoS ICMP attacks, yielding a precision of 99.24%, a recall of 98.73%, an F1 score of 98.97%, and an accuracy of 97.81%. Jensen-Shannon divergence also performed robustly in detecting SYN-based attacks, demonstrating strong detection with minimal computational cost. These findings confirm that Benford’s law, when combined with well-chosen statistical distances, offers a viable and efficient alternative to machine learning models for anomaly detection in constrained environments like IoMT. © 2025 Elsevier B.V., All rights reserved.

CloseRead Abstract

2025

An Optimized Multi-class Classification for Industrial Control Systems

Authors
Palma, A; Antunes, M; Alves, A;

Publication
Pattern Recognition and Image Analysis - 12th Iberian Conference, IbPRIA 2025, Coimbra, Portugal, June 30 - July 3, 2025, Proceedings, Part I

Abstract
Ensuring the security of Industrial Control Systems (ICS) is increasingly critical due to increasing connectivity and cyber threats. Traditional security measures often fail to detect evolving attacks, necessitating more effective solutions. This paper evaluates machine learning (ML) methods for ICS cybersecurity, using the ICS-Flow dataset and Optuna for hyperparameter tuning. The selected models, namely Random Forest (RF), AdaBoost, XGBoost, Deep Neural Networks, Artificial Neural Networks, ExtraTrees (ET), and Logistic Regression, are assessed using macro-averaged F1-score to handle class imbalance. Experimental results demonstrate that ensemble-based methods (RF, XGBoost, and ET) offer the highest overall detection performance, particularly in identifying commonly occurring attack types. However, minority classes, such as IP-Scan, remain difficult to detect accurately, indicating that hyperparameter tuning alone is insufficient to fully deal with imbalanced ICS data. These findings highlight the importance of complementary measures, such as focused feature selection, to enhance classification capabilities and protect industrial networks against a wider array of threats. © 2025 Elsevier B.V., All rights reserved.

CloseRead Abstract