Publicacoes - INESC TEC

Publicações

2025

Exploring ChatGPT Efficiency in Automatic Test Generation for Python: A Comparative Analysis

Autores
Guerino, LR; Rizzo Vincenzi, AM;

Publicação
SBQS

Abstract
Context: Large language models (LLMs) like ChatGPT have gained attention in automated software testing. This study evaluates ChatGPT-3.5-turbo’s ability to generate test sets for Python programs, comparing it with Pynguin and pre-existing test sets. Problem: Automated testing remains challenging for dynamically typed languages like Python, requiring adaptable tools for diverse code structures. Solution: We assessed ChatGPT-3.5-turbo’s test generation using different prompt configurations and temperature settings. Method: Using 40 Python programs, we generated Pytestcompliant tests via the OpenAI API, varying temperature settings (0.0 to 1.0). Tests were validated using Pytest, with coverage and mutation scores measured via Coverage, MutPy, and Cosmic-Ray. Pynguin-generated and pre-existing test sets served as baselines. Summary of Results: ChatGPT-3.5-turbo successfully generated valid tests for simpler programs, but averaged below 28% overall, with a low cost. Higher temperatures (0.5–1.0) improved results, but combining test cases from all temperatures introduces diversity in the LLM-generated test sets, making it possible to overcome both Pynguin and pre-existing test sets in terms of decision coverage and mutation score.

FecharLer Abstract

2025

A Conceptual Framework for AI-based Decision Systems in Critical Infrastructures

Autores
Leyli-Abadi M.; Bessa R.J.; Viebahn J.; Boos D.; Borst C.; Castagna A.; Chavarriaga R.; Hassouna M.; Lemetayer B.; Leto G.; Marot A.; Meddeb M.; Meyer M.; Schiaffonati V.; Schneider M.; Waefler T.; Yagoubi M.;

Publicação
Conference Proceedings IEEE International Conference on Systems Man and Cybernetics

Abstract
The interaction between humans and AI in safety-critical systems presents a unique set of challenges that remain partially addressed by existing frameworks. These challenges stem from the complex interplay of requirements for transparency, trust, and explainability, coupled with the necessity for robust and safe decision-making. A framework that holistically integrates human and AI capabilities while addressing these concerns is notably required, bridging the critical gaps in designing, deploying, and maintaining safe and effective systems. This paper proposes a holistic conceptual framework for critical infrastructures by adopting an interdisciplinary approach. It integrates traditionally distinct fields such as mathematics, decision theory, computer science, philosophy, psychology, and cognitive engineering and draws on specialized engineering domains, particularly energy, mobility, and aeronautics. Its flexibility is further demonstrated through a case study on power grid management.

FecharLer Abstract

2025

Active Attribute Inference Against Well-Generalized Models In Federated Learning

Autores
Gomes, C; Mendes, R; Vilela, JP;

Publicação
2025 IEEE 10TH EUROPEAN SYMPOSIUM ON SECURITY AND PRIVACY, EUROS&P

Abstract
Federated Learning (FL), a distributed learning mechanism where data is decentralized across multiple devices and periodic gradient updates are shared, is an alternative to centralized training that aims to address privacy issues arising from raw data sharing. Despite the expected privacy benefits, prior research showcases the potential privacy leakage derived from overfitting, exploited by passive attacks. However, limited attention has been given to understanding and defending against active threats that increase model leakage by interfering with the training process, instead of relying on overfitting. This work addresses this gap by introducing Active Attribute Inference (AAI*), a novel active attack that encodes sensitive attribute information by making any targeted training sample leave a distinguishable footprint on the gradient of maliciously modified neurons [8]. Results, using two real-world datasets, show that it is possible to successfully encode sensitive information incurring a small error in terms of neuron activation. More importantly, on a practical scenario, AAI. can improve upon a state-of-theart approach by achieving over 90% of restricted ROC AUC, therefore increasing model leakage. To defend against such active attacks, this work introduces several attack detection strategies tailored for different levels of the defender's knowledge. Including the novel White-box Attack Detection Mechanism (WADM*) that detects abnormal changes in weights distribution, and two black-box strategies based on the monitorization of model performance. Results show that the detection rate can be 100% on both datasets. Remarkably, WADM. reduces any attack to random guessing while preserving model utility, offering significant improvements over existing defenses, particularly when clients are non-IID. By proposing active attacks against well-generalized models and effective countermeasures, this research contributes to a better understanding of privacy in FL systems.

FecharLer Abstract

2025

Fusion Strategies for Breast Cancer Characterization Using Traditional and Deep Learning Models

Autores
Lima, PV; Cardoso, JS; Oliveira, HP;

Publicação
BIBE

Abstract
Breast cancer remains one of the most prevalent and deadly cancers worldwide, making accurate evaluation of molecular markers important for effective disease management. Biomarkers such as ER, PR, and HER2 are typically assessed because they help inform prognosis and guide treatment decisions. Predicting these characteristics from imaging can support earlier clinical intervention, reduce reliance on invasive procedures, and contribute to more personalized care. While radiomics and deep learning approaches have demonstrated potential, comprehensive comparisons across these methods are still limited. This study evaluated handcrafted features, deep features, and end-to-end deep learning models for predicting ER, PR, and HER2 status from DCE-MRI. Each feature type was first assessed individually and then combined using early and late fusion. Handcrafted and deep features were processed through a pipeline that included resampling, dimensionality reduction, and model selection, while end-to-end models were trained using different initialization strategies and loss functions. The best models achieved AUCs of 0.659 for ER, 0.679 for PR, and 0.686 for HER2. Although late fusion generally improved performance, bias toward the majority classes persisted. Overall, the results suggest that combining different modeling strategies may enhance robustness in breast cancer characterization.

FecharLer Abstract

2025

Arbutus Berry Detection and Classification for Harvesting

Autores
Pereira, J; Baltazar, AR; Pinheiro, I; da Silva, DQ; Frazao, ML; Neves Dos Santos, FN;

Publicação
IEEE International Conference on Emerging Technologies and Factory Automation, ETFA

Abstract
Automated fruit harvesting systems rely heavily on accurate visual perception, particularly for crops such as the Arbutus tree (Arbutus unedo), which holds both ecological and economic significance. However, this species poses considerable challenges for computer vision due to its dense foliage and the morphological variability of its berries across different ripening stages. Despite its importance, the Arbutus tree remains under-explored in the context of precision agriculture and robotic harvesting. This study addresses that gap by evaluating a computer vision-based approach to detect and classify Arbutus berries into three ripeness categories: green, yellow-orange, and red. A significant contribution of this work is the release of two fully annotated open-access datasets, Arbutus Berry Detection Dataset and Arbutus Berry Ripeness Level Detection Dataset, developed through a structured manual labeling process. Additionally, we benchmarked four YOLO architectures - YOLOv8n, YOLOv9t, YOLOv10n, and YOLO11n - as well as the RT-DETR models, using these datasets. Among these, RT-DETR-L demonstrated the most consistent performance in terms of precision, recall, and generalization, outperforming the lighter YOLO models in both speed and accuracy. This highlights RT-DETR's strong potential for deployment in real-time automated harvesting systems, where robust detection and efficient inference are critical. © 2025 IEEE.

FecharLer Abstract

2025

AR/VR Digital Twin for simulation and data collection of robotic environments

Autores
Martins, JG; Nutonen, K; Costa, P; Kuts, V; Otto, T; Sousa, A; Petry, MR;

Publicação
2025 IEEE INTERNATIONAL CONFERENCE ON AUTONOMOUS ROBOT SYSTEMS AND COMPETITIONS, ICARSC

Abstract
Digital twins enable real-time modeling, simulation, and monitoring of complex systems, driving advancements in automation, robotics, and industrial applications. This study presents a large-scale digital twin-testing facility for evaluating mobile robots and pilot robotic systems in a research laboratory environment. The platform integrates high-fidelity physical and environmental models, providing a controlled yet dynamic setting for analyzing robotic behavior. A key feature of the system is its comprehensive data collection framework, capturing critical parameters such as position, orientation, and velocity, which can be leveraged for machine learning, performance optimization, and decision-making. The facility also supports the simulation of discrete operational systems, using predictive modeling to bridge informational gaps when real-time data updates are unavailable. The digital twin was validated through a matrix manufacturing system simulation, with an Augmented Reality (AR) interface on the HoloLens 2 to overlay digital information onto mobile platform controllers, enhancing situational awareness. The main contributions include a digital twin framework for deploying data-driven robotic systems and three key AR/VR integration optimization methods. Demonstrated in a laboratory setting, the system is a versatile tool for research and industrial applications, fostering insights into robotic automation and digital twin scalability while reducing costs and risks associated with real-world testing.

FecharLer Abstract

147
4493