Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by CRACS

2025

Program Synthesis Using Inductive Logic Programming for the Abstraction and Reasoning Corpus

Authors
Rocha, FM; Dutra, I; Costa, VS;

Publication
INTELLIGENZA ARTIFICIALE

Abstract
The Abstraction and Reasoning Corpus (ARC-AGI) is an Artificial General Intelligence benchmark that is currently unsolved. It demands strong generalization and reasoning capabilities, which are known to be weaknesses of Neural Network based systems. In this work, we propose a Program synthesis system to solve it, which casts an ARC-AGI task as a sequence of Inductive Logic Programming tasks. We have implemented a simple Domain Specific Language that corresponds to a small set of object-centric abstractions relevant to the benchmark. This allows for adequate representations to be used to create logic programs, which provide reasoning capabilities to our system. When solving each task, the proposed system can generalize from a few training pairs of input-output grids. The obtained logic programs are able to generate objects present in the output grids and can transform the test input grid into the output grid solution. We developed our system based on some ARC-AGI tasks that do not require more than the small number of primitives that we implemented and showed that our system can solve unseen tasks that require different reasoning.

2025

Regular Typed Unification

Authors
Barbosa, J; Florido, M; Costa, VS;

Publication
ELECTRONIC PROCEEDINGS IN THEORETICAL COMPUTER SCIENCE

Abstract
Here we define a new unification algorithm for terms interpreted in semantic domains denoted by a subclass of regular types here called deterministic regular types. This reflects our intention not to handle the semantic universe as a homogeneous collection of values, but instead, to partition it in a way that is similar to data types in programming languages. We first define the new unification algorithm which is based on constraint generation and constraint solving, and then prove its main properties: termination, soundness, and completeness with respect to the semantics. Finally, we discuss how to apply this algorithm to a dynamically typed version of Prolog.

2025

Distance-based feature selection using Benford's law for malware detection

Authors
Fernandes, P; Ciardhuáin, SO; Antunes, M;

Publication
COMPUTERS & SECURITY

Abstract
Detecting malware in computer networks and data streams from Android devices remains a critical challenge for cybersecurity researchers. While machine learning and deep learning techniques have shown promising results, these approaches often require large volumes of labelled data, offer limited interpretability, and struggle to adapt to sophisticated threats such as zero-day attacks. Moreover, their high computational requirements restrict their applicability in resource-constrained environments. This research proposes an innovative approach that advances the state of the art by offering practical solutions for dynamic and data-limited security scenarios. By integrating natural statistical laws, particularly Benford's law, with dissimilarity functions, a lightweight, fast, and scalable model is developed that eliminates the need for extensive training and large labelled datasets while improving resilience to data imbalance and scalability for large-scale cybersecurity applications. Although Benford's law has demonstrated potential in anomaly detection, its effectiveness is limited by the difficulty of selecting relevant features. To overcome this, the study combines Benford's law with several distance functions, including Median Absolute Deviation, Kullback-Leibler divergence, Euclidean distance, and Pearson correlation, enabling statistically grounded feature selection. Additional metrics, such as the Kolmogorov test, Jensen-Shannon divergence, and Z statistics, were used for model validation. This approach quantifies discrepancies between expected and observed distributions, addressing classic feature selection challenges like redundancy and imbalance. Validated on both balanced and unbalanced datasets, the model achieved strong results: 88.30% accuracy and 85.08% F1-score in the balanced set, 92.75% accuracy and 95.29% F1-score in the unbalanced set. The integration of Benford's law with distance functions significantly reduced false positives and negatives. Compared to traditional Machine Learning methods, which typically require extensive training and large datasets to achieve F1 scores between 92% and 99%, the proposed approach delivers competitive performance while enhancing computational efficiency, robustness, and interpretability. This balance makes it a practical and scalable alternative for real-time or resource-constrained cybersecurity environments.

2025

Multi-Class Intrusion Detection in Internet of Vehicles: Optimizing Machine Learning Models on Imbalanced Data

Authors
Palma, A; Antunes, M; Bernardino, J; Alves, A;

Publication
FUTURE INTERNET

Abstract
The Internet of Vehicles (IoV) presents complex cybersecurity challenges, particularly against Denial-of-Service (DoS) and spoofing attacks targeting the Controller Area Network (CAN) bus. This study leverages the CICIoV2024 dataset, comprising six distinct classes of benign traffic and various types of attacks, to evaluate advanced machine learning techniques for instrusion detection systems (IDS). The models XGBoost, Random Forest, AdaBoost, Extra Trees, Logistic Regression, and Deep Neural Network were tested under realistic, imbalanced data conditions, ensuring that the evaluation reflects real-world scenarios where benign traffic dominates. Using hyperparameter optimization with Optuna, we achieved significant improvements in detection accuracy and robustness. Ensemble methods such as XGBoost and Random Forest consistently demonstrated superior performance, achieving perfect accuracy and macro-average F1-scores, even when detecting minority attack classes, in contrast to previous results for the CICIoV2024 dataset. The integration of optimized hyperparameter tuning and a broader methodological scope culminated in an IDS framework capable of addressing diverse attack scenarios with exceptional precision.

2025

An Automated Repository for the Efficient Management of Complex Documentation

Authors
Frade, J; Antunes, M;

Publication
INFORMATION

Abstract
The accelerating digitalization of the public and private sectors has made information technologies (IT) indispensable in modern life. As services shift to digital platforms and technologies expand across industries, the complexity of legal, regulatory, and technical requirement documentation is growing rapidly. This increase presents significant challenges in managing, gathering, and analyzing documents, as their dispersion across various repositories and formats hinders accessibility and efficient processing. This paper presents the development of an automated repository designed to streamline the collection, classification, and analysis of cybersecurity-related documents. By harnessing the capabilities of natural language processing (NLP) models-specifically Generative Pre-Trained Transformer (GPT) technologies-the system automates text ingestion, extraction, and summarization, providing users with visual tools and organized insights into large volumes of data. The repository facilitates the efficient management of evolving cybersecurity documentation, addressing issues of accessibility, complexity, and time constraints. This paper explores the potential applications of NLP in cybersecurity documentation management and highlights the advantages of integrating automated repositories equipped with visualization and search tools. By focusing on legal documents and technical guidelines from Portugal and the European Union (EU), this applied research seeks to enhance cybersecurity governance, streamline document retrieval, and deliver actionable insights to professionals. Ultimately, the goal is to develop a scalable, adaptable platform capable of extending beyond cybersecurity to serve other industries that rely on the effective management of complex documentation.

2025

A Risk Manager for Intrusion Tolerant Systems: Enhancing HAL 9000 With New Scoring and Data Sources

Authors
Freitas, T; Novo, C; Dutra, I; Soares, J; Correia, ME; Shariati, B; Martins, R;

Publication
SOFTWARE-PRACTICE & EXPERIENCE

Abstract
Background Intrusion Tolerant Systems (ITS) aim to maintain system security despite adversarial presence by limiting the impact of successful attacks. Current ITS risk managers rely heavily on public databases like NVD and Exploit DB, which suffer from long delays in vulnerability evaluation, reducing system responsiveness.Objective This work extends the HAL 9000 Risk Manager to integrate additional real-time threat intelligence sources and employ machine learning techniques to automatically predict and reassess vulnerability risk scores, addressing limitations of existing solutions.Methods A custom-built scraper collects diverse cybersecurity data from multiple Open Source Intelligence (OSINT) platforms, such as NVD, CVE, AlienVault OTX, and OSV. HAL 9000 uses machine learning models for CVE score prediction, vulnerability clustering through scalable algorithms, and reassessment incorporating exploit likelihood and patch availability to dynamically evaluate system configurations.Results Integration of newly scraped data significantly enhances the risk management capabilities, enabling faster detection and mitigation of emerging vulnerabilities with improved resilience and security. Experiments show HAL 9000 provides lower risk and more resilient configurations compared to prior methods while maintaining scalability and automation.Conclusions The proposed enhancements position HAL 9000 as a next-generation autonomous Risk Manager capable of effectively incorporating diverse intelligence sources and machine learning to improve ITS security posture in dynamic threat environments. Future work includes expanding data sources, addressing misinformation risks, and real-world deployments.

  • 4
  • 207