Publicacoes - INESC TEC

Publicações

Publicações por HumanISE

2025

Do We Need 3D to See? Impact of Dimensionality of the Virtual Environment on Attention

Autores
Matos, T; Mendes, D; Jacob, J; de Sousa, AA; Rodrigues, R;

Publicação
2025 IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES ABSTRACTS AND WORKSHOPS, VRW

Abstract
Virtual Reality allows users to experience realistic environments in an immersive and controlled manner, particularly beneficial for contexts where the real scenario is not easily or safely accessible. The choice between 360 content and 3D models impacts outcomes such as perceived quality and computational cost, but can also affect user attention. This study explores how attention manifests in VR using a 3D model or a 360 image rendered from said model during visuospatial tasks. User tests revealed no significant difference in workload or cybersickness between these types of content, while sense of presence was reportedly higher in the 3D environment.

FecharLer Abstract

2025

Acceptance Test Generation with Large Language Models: An Industrial Case Study

Autores
Ferreira, M; Viegas, L; Faria, JP; Lima, B;

Publicação
2025 IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATION OF SOFTWARE TEST, AST

Abstract
Large language model (LLM)-powered assistants are increasingly used for generating program code and unit tests, but their application in acceptance testing remains underexplored. To help address this gap, this paper explores the use of LLMs for generating executable acceptance tests for web applications through a two-step process: (i) generating acceptance test scenarios in natural language (in Gherkin) from user stories, and (ii) converting these scenarios into executable test scripts (in Cypress), knowing the HTML code of the pages under test. This two-step approach supports acceptance test-driven development, enhances tester control, and improves test quality. The two steps were implemented in the AutoUAT and Test Flow tools, respectively, powered by GPT-4 Turbo, and integrated into a partner company's workflow and evaluated on real-world projects. The users found the acceptance test scenarios generated by AutoUAT helpful 95% of the time, even revealing previously overlooked cases. Regarding Test Flow, 92% of the acceptance test cases generated by Test Flow were considered helpful: 60% were usable as generated, 8% required minor fixes, and 24% needed to be regenerated with additional inputs; the remaining 8% were discarded due to major issues. These results suggest that LLMs can, in fact, help improve the acceptance test process, with appropriate tooling and supervision.

FecharLer Abstract

2025

LLM Prompt Engineering for Automated White-Box Integration Test Generation in REST APIs

Autores
Rincon, AM; Vincenzi, AMR; Faria, JP;

Publicação
2025 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION WORKSHOPS, ICSTW

Abstract
This study explores prompt engineering for automated white-box integration testing of RESTful APIs using Large Language Models (LLMs). Four versions of prompts were designed and tested across three OpenAI models (GPT-3.5 Turbo, GPT-4 Turbo, and GPT-4o) to assess their impact on code coverage, token consumption, execution time, and financial cost. The results indicate that different prompt versions, especially with more advanced models, achieved up to 90% coverage, although at higher costs. Additionally, combining test sets from different models increased coverage, reaching 96% in some cases. We also compared the results with EvoMaster, a specialized tool for generating tests for REST APIs, where LLM-generated tests achieved comparable or higher coverage in the benchmark projects. Despite higher execution costs, LLMs demonstrated superior adaptability and flexibility in test generation.

FecharLer Abstract

2025

Automated Social Media Feedback Analysis for Software Requirements Elicitation: A Case Study in the Streaming Industry

Autores
Silva, M; Faria, JP;

Publicação
ENASE

Abstract

2025

Automatic Generation of Loop Invariants in Dafny with Large Language Models

Autores
Faria, JP; Trigo, E; Abreu, R;

Publicação
FUNDAMENTALS OF SOFTWARE ENGINEERING, FSEN 2025

Abstract
Recent verification tools aim to make formal verification more accessible for software engineers by automating most of the verification process. However, the manual work and expertise required to write verification helper code, such as loop invariants and auxiliary lemmas and assertions, remains a barrier. This paper explores the use of Large Language Models (LLMs) to automate the generation of loop invariants for programs in Dafny. We tested the approach on a curated dataset of 100 programs in Dafny involving arrays, strings, and numeric types. Using a multimodel approach that combines GPT-4o and Claude 3.5 Sonnet, correct loop invariants (passing the Dafny verifier) were generated at the first attempt for 92% of the programs, and in at most five attempts for 95% of the programs. Additionally, we developed an extension to the Dafny plugin for Visual Studio Code to incorporate automatic loop invariant generation into the IDE. Our work stands out from related approaches by handling a broader class of problems and offering IDE integration.

FecharLer Abstract

2025

Streamlining Acceptance Test Generation for Mobile Applications Through Large Language Models: An Industrial Case Study

Autores
Fonseca, PL; Lima, B; Faria, JP;

Publicação
2025 40TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE

Abstract
Mobile acceptance testing remains a bottleneck in modern software development, particularly for cross-platform mobile development using frameworks like Flutter. While developers increasingly rely on automated testing tools, creating and maintaining acceptance test artifacts still demands significant manual effort. To help tackle this issue, we introduce AToMIC, an automated framework leveraging specialized Large Language Models to generate Gherkin scenarios, Page Objects, and executable UI test scripts directly from requirements (JIRA tickets) and recent code changes. Applied to BMW's MyBMW app, covering 13 real-world issues in a 170+ screen codebase, AToMIC produced executable test artifacts in under five minutes per feature on standard hardware. The generated artifacts were of high quality: 93.3% of Gherkin scenarios were syntactically correct upon generation, 78.8% of PageObjects ran without manual edits, and 100% of generated UI tests executed successfully. In a survey, all practitioners reported time savings (often a full developer-day per feature) and strong confidence in adopting the approach. These results confirm AToMIC as a scalable, practical solution for streamlining acceptance test creation and maintenance in industrial mobile projects.

FecharLer Abstract