Publications

Publications by HumanISE

2025

Exploring ChatGPT Efficiency in Automatic Test Generation for Python: A Comparative Analysis

Authors
Guerino, LR; Rizzo Vincenzi, AM;

Publication
SBQS

Abstract
Context: Large language models (LLMs) like ChatGPT have gained attention in automated software testing. This study evaluates ChatGPT-3.5-turbo’s ability to generate test sets for Python programs, comparing it with Pynguin and pre-existing test sets. Problem: Automated testing remains challenging for dynamically typed languages like Python, requiring adaptable tools for diverse code structures. Solution: We assessed ChatGPT-3.5-turbo’s test generation using different prompt configurations and temperature settings. Method: Using 40 Python programs, we generated Pytestcompliant tests via the OpenAI API, varying temperature settings (0.0 to 1.0). Tests were validated using Pytest, with coverage and mutation scores measured via Coverage, MutPy, and Cosmic-Ray. Pynguin-generated and pre-existing test sets served as baselines. Summary of Results: ChatGPT-3.5-turbo successfully generated valid tests for simpler programs, but averaged below 28% overall, with a low cost. Higher temperatures (0.5–1.0) improved results, but combining test cases from all temperatures introduces diversity in the LLM-generated test sets, making it possible to overcome both Pynguin and pre-existing test sets in terms of decision coverage and mutation score.

CloseRead Abstract

2025

Automated Generation of End-to-End Web Test Cases via a Generic AI Agent: A Comparative Study of DeepSeek V3 and Claude Sonnet 5

Authors
Monteiro, CEO; Guerino, LR; Fernandes, GF; Pereira, MH; Souza-Zinader, JPd; Braga, RD; Pocivi, VCB; Vincenzi, AMR;

Publication
Proceedings of the 31st Brazilian Symposium on Multimedia and the Web (WebMedia 2025)

Abstract
Web applications are widespread and can be accessed from anywhere, in theory, using aweb browser on a computer or smartphone. Primarily due to the diversity of web browsers and frameworks available for developing web application interfaces, testing such applications is a challenging task. With the advent of large language models, several works are utilizing them to automate software engineering tasks, including test case generation. This use of LLMs for test case generation prioritizes unit testing. More recently, we have seen the advent of Generic Artificial Intelligence Agents, which are tools that utilize LLMs and also possess the ability to run additional tools, such as cloning repositories, navigating websites, and compiling programs. In this work, which is part of a research and development project, we evaluate a specific Generic AI Agent Assistant regarding its capability to navigate web applications and create fully automated end-to-end test cases, utilizing Selenium WebDriver and JUnit 5 framework. Results show that, considering a set of nine websites, in overall end-to-end test case generation, Suna configured with DeepSeek V3 produced 165 successful test cases out of 481 generated tests, a success rate of 34.3%. On the other hand, Suna configured with Claude Sonnet 4 produced 336 successful test cases out of 479 generated tests, a success rate of 70.1%, which is very impressive, mainly due to the complexity of creating end-to-end testing. In terms of cost, we used a free and a paid LLM model. The paid model generates successful test cases at an average price of $ 0.15 per test case.

CloseRead Abstract

2025

Exploring Documentation Strategies for NFR in Agile Software Development

Authors
Moreira, I; Adolfo, LB; Melegati, J; Choma, J; Guerra, E; Zaina, L;

Publication
XP

Abstract
Abstract Companies adopt agile methodologies for various reasons, primarily due to their adaptability to change and evolving business demands. In this context, addressing non-functional requirements (NFRs) may not always be a priority and can present challenges for agile teams. The focus on User Stories present in agile methods and tools often does not offer explicit alternatives for documenting NFRs. In this research, we perform a survey to explore five different strategies for documenting NFRs, to identify which fits better for different types of quality attributes and to understand the strengths and drawbacks of each one. As a result, the participants considered certain strategies as being more or less suitable for specifying different types of quality attributes. For instance, while Story Labeling was rarely recommended for security requirements, using Story Sub-sections or Verification Rules were highly recommended for this kind of quality attribute. Our results also evaluated the strategies considering several factors, such as the level of detail and requirement duplication. As a practical implication, the results of this work can provide guidance to agile development teams in choosing the most suitable alternative for each NFR documentation.

CloseRead Abstract

2025

Exploratory Test-Driven Development Study with ChatGPT in Different Scenarios

Authors
Pancher, JC; Melegati, J; Guerra, EM;

Publication
XP

Abstract
Abstract Generative AI has been rapidly adopted by the software development industry in various ways, offering innovative approaches to transforming requirements into working software. Combining Generative AI with Test-Driven Development (TDD) presents a creative method to accelerate this transformation. However, questions remain about ChatGPT’s readiness for this challenge, including the techniques and best practices required for success and the scenarios where this approach can consistently deliver results. To explore these questions, we designed a study where a group of master’s students performed programming assignments using TDD, first independently and then with the support of ChatGPT. The three assignments represent distinct scenarios: mathematical calculations (function), text processing (class), and system integration (class with dependencies). We performed a qualitative analysis of the submitted code and reports identifying key strategies that significantly influence success rates, such as providing contextual information, separating instructions in prompts following an iterative process, and assisting AI in fixing errors. Among the scenarios, the integration task achieved the highest performance. This study highlights the potential of leveraging Generative AI in TDD for software development and presents a list of effective strategies to maximize its impact. By applying these positive strategies and avoiding identified pitfalls, this research marks a step toward establishing best practices for integrating Generative AI with TDD in software engineering.

CloseRead Abstract

2025

Leveraging Multi-Task Learning to Improve the Detection of SATD and Vulnerability

Authors
Russo, B; Melegati, J; Mock, M;

Publication
ICPC

Abstract

2025

Applying a Prompt Pattern Sequence for Decision-Making in Microservices Architectures

Authors
Maranhão Jr., JJ; Melegati, J; Guerra, E;

Publication
ESOCC

Abstract