Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by HumanISE

2025

Alloy Repair Hint Generation Based on Historical Data

Authors
Barros, A; Neto, H; Cunha, A; Macedo, N; Paiva, ACR;

Publication
FORMAL METHODS, PT II, FM 2024

Abstract
Platforms to support novices learning to program are often accompanied by automated next-step hints that guide them towards correct solutions. Many of those approaches are data-driven, building on historical data to generate higher quality hints. Formal specifications are increasingly relevant in software engineering activities, but very little support exists to help novices while learning. Alloy is a formal specification language often used in courses on formal software development methods, and a platform-Alloy4Fun-has been proposed to support autonomous learning. While non-data-driven specification repair techniques have been proposed for Alloy that could be leveraged to generate next-step hints, no data-driven hint generation approach has been proposed so far. This paper presents the first data-driven hint generation technique for Alloy and its implementation as an extension to Alloy4Fun, being based on the data collected by that platform. This historical data is processed into graphs that capture past students' progress while solving specification challenges. Hint generation can be customized with policies that take into consideration diverse factors, such as the popularity of paths in those graphs successfully traversed by previous students. Our evaluation shows that the performance of this new technique is competitive with non-data-driven repair techniques. To assess the quality of the hints, and help select the most appropriate hint generation policy, we conducted a survey with experienced Alloy instructors.

2025

Testing infrastructures to support mobile application testing: A systematic mapping study

Authors
Kuroishi, PH; Paiva, ACR; Maldonado, JC; Vincenzi, AMR;

Publication
INFORMATION AND SOFTWARE TECHNOLOGY

Abstract
Context: Testing activities are essential for the quality assurance of mobile applications under development. Despite its importance, some studies show that testing is not widely applied in mobile applications. Some characteristics of mobile devices and a varied market of mobile devices with different operating system versions lead to a highly fragmented mobile ecosystem. Thus, researchers put some effort into proposing different solutions to optimize mobile application testing. Objective: The main goal of this paper is to provide a categorization and classification of existing testing infrastructures to support mobile application testing. Methods: To this aim, the study provides a Systematic Mapping Study of 27 existing primary studies. Results: We present a new classification and categorization of existing types of testing infrastructure, the types of supported devices and operating systems, whether the testing infrastructure is available for usage or experimentation, and supported testing types and applications. Conclusion: Our findings show a need for mobile testing infrastructures that support multiple phases of the testing process. Moreover, we showed a need for testing infrastructure for context-aware applications and support for both emulators and real devices. Finally, we pinpoint the need to make the research available to the community whenever possible.

2025

GAMFLEW: serious game to teach white-box testing

Authors
Silva, M; Paiva, ACR; Mendes, A;

Publication
SOFTWARE QUALITY JOURNAL

Abstract
Software testing plays a fundamental role in software engineering, involving the systematic evaluation of software to identify defects, errors, and vulnerabilities from the early stages of the development process. Education in software testing is essential for students and professionals, as it promotes quality and favours the construction of reliable software solutions. However, motivating students to learn software testing may be a challenge. To overcome this, educators may incorporate some strategies into the teaching and learning process, such as real-world examples, interactive learning, and gamification. Gamification aims to make learning software testing more engaging for students by creating a more enjoyable experience. One approach that has proven effective is to use serious games. This paper presents a novel serious game to teach white-box testing test case design techniques, named GAMFLEW (GAMe For LEarning White-box testing). It describes the design, game mechanics, and its implementation. It also presents a preliminary evaluation experiment with students to assess the usability, learnability, and perceived problems, among other aspects. The results obtained are encouraging.

2025

METFORD - Mutation tEsTing Framework fOR anDroid

Authors
Vincenzi, AMR; Kuroishi, PH; Bispo, J; da Veiga, ARC; da Mata, DRC; Azevedo, FB; Paiva, ACR;

Publication
JOURNAL OF SYSTEMS AND SOFTWARE

Abstract
Mutation testing maybe used to guide test case generation and as a technique to assess the quality of test suites. Despite being used frequently, mutation testing is not so commonly applied in the mobile world. One critical challenge in mutation testing is dealing with its computational cost. Generating mutants, running test cases over each mutant, and analyzing the results may require significant time and resources. This research aims to contribute to reducing Android mutation testing costs. It implements mutation testing operators (traditional and Android-specific) according to mutant schemata (implementing multiple mutants into a single code file). It also describes an Android mutation testing framework developed to execute test cases and determine mutation scores. Additional mutation operators can be implemented in JavaScript and easily integrated into the framework. The overall approach is validated through case studies showing that mutant schemata have advantages over the traditional mutation strategy (one file per mutant). The results show mutant schemata overcome traditional mutation in all evaluated aspects with no additional cost: it takes 8.50% less time for mutant generation, requires 99.78% less disk space, and runs, on average, 6.45% faster than traditional mutation. Moreover, considering sustainability metrics, mutant schemata have 8,18% less carbon footprint than traditional strategy.

2025

Can Llama 3 Accurately Assess Readability? A Comparative Study Using Lead Sections from Wikipedia

Authors
Rodrigues, JF; Cardoso, HL; Lopes, CT;

Publication
Research Challenges in Information Science - 19th International Conference, RCIS 2025, Seville, Spain, May 20-23, 2025, Proceedings, Part II

Abstract
Text readability is vital for effective communication and learning, especially for those with lower information literacy. This research aims to assess Llama 3’s ability to grade readability and compare its alignment with established metrics. For that purpose, we create a new dataset of article lead sections from English and Simple English Wikipedia, covering nine categories. The model is prompted to rate the readability of the texts on a grade-level scale, and an in-depth analysis of the results is conducted. While Llama 3 correlates strongly with most metrics, it may underestimate text grade levels. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

2025

Evaluating Llama 3 for Text Simplification: A Study on Wikipedia Lead Sections

Authors
Rodrigues, JF; Cardoso, HL; Lopes, CT;

Publication
Companion Proceedings of the ACM on Web Conference 2025, WWW 2025, Sydney, NSW, Australia, 28 April 2025 - 2 May 2025

Abstract
Text simplification converts complex text into simpler language, improving readability and comprehension. This study evaluates the effectiveness of open-source large language models for text simplification across various categories. We created a dataset of 66, 620 lead section pairs from English and Simple English Wikipedia, spanning nine categories, and tested Llama 3 for text simplification. We assessed its output for readability, simplicity, and meaning preservation. Results show improved readability, with simplification varying by category. Texts on Time were the most shortened, while Leisure-related texts had the greatest reduction of words/characters and syllables per sentence. Meaning preservation was most effective for the Objects and Education categories. © 2025 Copyright held by the owner/author(s). Publication rights licensed to ACM.

  • 7
  • 660