Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Interest
Topics
Details

Details

  • Name

    Auri Vincenzi
  • Role

    Senior Researcher
  • Since

    01st December 2025
Publications

2026

Designing Blockchain-Based Systems with Clean Architecture

Authors
Ricardo, FSD; Valente, FJ; de Camargo, VV; Vincenzi, AMR;

Publication
Lecture Notes in Networks and Systems - Proceedings of 20th Iberian Conference on Information Systems and Technologies (CISTI 2025)

Abstract

2025

LLM Prompt Engineering for Automated White-Box Integration Test Generation in REST APIs

Authors
Rincon, AM; Vincenzi, AMR; Faria, JP;

Publication
2025 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION WORKSHOPS, ICSTW

Abstract
This study explores prompt engineering for automated white-box integration testing of RESTful APIs using Large Language Models (LLMs). Four versions of prompts were designed and tested across three OpenAI models (GPT-3.5 Turbo, GPT-4 Turbo, and GPT-4o) to assess their impact on code coverage, token consumption, execution time, and financial cost. The results indicate that different prompt versions, especially with more advanced models, achieved up to 90% coverage, although at higher costs. Additionally, combining test sets from different models increased coverage, reaching 96% in some cases. We also compared the results with EvoMaster, a specialized tool for generating tests for REST APIs, where LLM-generated tests achieved comparable or higher coverage in the benchmark projects. Despite higher execution costs, LLMs demonstrated superior adaptability and flexibility in test generation.

2025

METFORD - Mutation tEsTing Framework fOR anDroid

Authors
Vincenzi, AMR; Kuroishi, PH; Bispo, J; da Veiga, ARC; da Mata, DRC; Azevedo, FB; Paiva, ACR;

Publication
JOURNAL OF SYSTEMS AND SOFTWARE

Abstract
Mutation testing maybe used to guide test case generation and as a technique to assess the quality of test suites. Despite being used frequently, mutation testing is not so commonly applied in the mobile world. One critical challenge in mutation testing is dealing with its computational cost. Generating mutants, running test cases over each mutant, and analyzing the results may require significant time and resources. This research aims to contribute to reducing Android mutation testing costs. It implements mutation testing operators (traditional and Android-specific) according to mutant schemata (implementing multiple mutants into a single code file). It also describes an Android mutation testing framework developed to execute test cases and determine mutation scores. Additional mutation operators can be implemented in JavaScript and easily integrated into the framework. The overall approach is validated through case studies showing that mutant schemata have advantages over the traditional mutation strategy (one file per mutant). The results show mutant schemata overcome traditional mutation in all evaluated aspects with no additional cost: it takes 8.50% less time for mutant generation, requires 99.78% less disk space, and runs, on average, 6.45% faster than traditional mutation. Moreover, considering sustainability metrics, mutant schemata have 8,18% less carbon footprint than traditional strategy.

2025

Testing infrastructures to support mobile application testing: A systematic mapping study

Authors
Kuroishi, PH; Paiva, ACR; Maldonado, JC; Vincenzi, AMR;

Publication
INFORMATION AND SOFTWARE TECHNOLOGY

Abstract
Context: Testing activities are essential for the quality assurance of mobile applications under development. Despite its importance, some studies show that testing is not widely applied in mobile applications. Some characteristics of mobile devices and a varied market of mobile devices with different operating system versions lead to a highly fragmented mobile ecosystem. Thus, researchers put some effort into proposing different solutions to optimize mobile application testing. Objective: The main goal of this paper is to provide a categorization and classification of existing testing infrastructures to support mobile application testing. Methods: To this aim, the study provides a Systematic Mapping Study of 27 existing primary studies. Results: We present a new classification and categorization of existing types of testing infrastructure, the types of supported devices and operating systems, whether the testing infrastructure is available for usage or experimentation, and supported testing types and applications. Conclusion: Our findings show a need for mobile testing infrastructures that support multiple phases of the testing process. Moreover, we showed a need for testing infrastructure for context-aware applications and support for both emulators and real devices. Finally, we pinpoint the need to make the research available to the community whenever possible.

2025

Exploring ChatGPT Efficiency in Automatic Test Generation for Python: A Comparative Analysis

Authors
Guerino, LR; Rizzo Vincenzi, AM;

Publication
SBQS

Abstract
Context: Large language models (LLMs) like ChatGPT have gained attention in automated software testing. This study evaluates ChatGPT-3.5-turbo’s ability to generate test sets for Python programs, comparing it with Pynguin and pre-existing test sets. Problem: Automated testing remains challenging for dynamically typed languages like Python, requiring adaptable tools for diverse code structures. Solution: We assessed ChatGPT-3.5-turbo’s test generation using different prompt configurations and temperature settings. Method: Using 40 Python programs, we generated Pytestcompliant tests via the OpenAI API, varying temperature settings (0.0 to 1.0). Tests were validated using Pytest, with coverage and mutation scores measured via Coverage, MutPy, and Cosmic-Ray. Pynguin-generated and pre-existing test sets served as baselines. Summary of Results: ChatGPT-3.5-turbo successfully generated valid tests for simpler programs, but averaged below 28% overall, with a low cost. Higher temperatures (0.5–1.0) improved results, but combining test cases from all temperatures introduces diversity in the LLM-generated test sets, making it possible to overcome both Pynguin and pre-existing test sets in terms of decision coverage and mutation score.