2025
Autores
Paiva, JC; Leal, JP; Figueira, A;
Publicação
INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS
Abstract
Clustering of source code is a technique that can help improve feedback in automated program assessment. Grouping code submissions that contain similar mistakes can, for instance, facilitate the identification of students' difficulties to provide targeted feedback. Moreover, solutions with similar functionality but possibly different coding styles or progress levels can allow personalized feedback to students stuck at some point based on a more developed source code or even detect potential cases of plagiarism. However, existing clustering approaches for source code are mostly inadequate for automated feedback generation or assessment systems in programming education. They either give too much emphasis to syntactical program features, rely on expensive computations over pairs of programs, or require previously collected data. This paper introduces an online approach and implemented tool-AsanasCluster-to cluster source code submissions to programming assignments. The proposed approach relies on program attributes extracted from semantic graph representations of source code, including control and data flow features. The obtained feature vector values are fed into an incremental k-means model. Such a model aims to determine the closest cluster of solutions, as they enter the system, timely, considering clustering is an intermediate step for feedback generation in automated assessment. We have conducted a twofold evaluation of the tool to assess (1) its runtime performance and (2) its precision in separating different algorithmic strategies. To this end, we have applied our clustering approach on a public dataset of real submissions from undergraduate students to programming assignments, measuring the runtimes for the distinct tasks involved: building a model, identifying the closest cluster to a new observation, and recalculating partitions. As for the precision, we partition two groups of programs collected from GitHub. One group contains implementations of two searching algorithms, while the other has implementations of several sorting algorithms. AsanasCluster matches and, in some cases, improves the state-of-the-art clustering tools in terms of runtime performance and precision in identifying different algorithmic strategies. It does so without requiring the execution of the code. Moreover, it is able to start the clustering process from a dataset with only two submissions and continuously partition the observations as they enter the system.
2025
Autores
dos Santos, AF; Leal, JP; Alves, RA; Jacques, T;
Publicação
DATA IN BRIEF
Abstract
The PAP900 dataset centers on the semantic relationship between affective words in Portuguese. It contains 900 word pairs, each annotated by at least 30 human raters for both semantic similarity and semantic relatedness. In addition to the semantic ratings, the dataset includes the word categorization used to build the word pairs and detailed sociodemographic information about annotators, enabling the analysis of the influence of personal factors on the perception of semantic relationships. Furthermore, this article describes in detail the dataset construction process, from word selection to agreement metrics. Data was collected from Portuguese university psychology students, who completed two rounds of questionnaires. In the first round annotators were asked to rate word pairs on either semantic similarity or relatedness. The second round switched the relation type for most annotators, with a small percentage being asked to repeat the same relation. The instructions given emphasized the differences between semantic relatedness and semantic similarity, and provided examples of expected ratings of both. There are few semantic relations datasets in Portuguese, and none focusing on affective words. PAP900 is distributed in distinct formats to be easy to use for both researchers just looking for the final averaged values and for researchers looking to take advantage of the individual ratings, the word categorization and the annotator data. This dataset is a valuable resource for researchers in computational linguistics, natural language processing, psychology, and cognitive science. (c) 2025TheAuthors.
2025
Autores
Brito C.; Pina N.; Esteves T.; Vitorino R.; Cunha I.; Paulo J.;
Publicação
Transportation Engineering
Abstract
Cities worldwide have agreed on ambitious goals regarding carbon neutrality. To do so, policymakers seek ways to foster smarter and cleaner transportation solutions. However, citizens lack awareness of their carbon footprint and of greener mobility alternatives such as public transports. With this, three main challenges emerge: (i) increase users’ awareness regarding their carbon footprint, (ii) provide personalized recommendations and incentives for using sustainable transportation alternatives and, (iii) guarantee that any personal data collected from the user is kept private. This paper addresses these challenges by proposing a new methodology. Created under the FranchetAI project, the methodology combines federated Artificial Intelligence (AI) and Greenhouse Gas (GHG) estimation models to calculate the carbon footprint of users when choosing different transportation modes (e.g., foot, car, bus). Through a mobile application that keeps the privacy of users’ personal information, the project aims at providing detailed reports to inform citizens about their impact on the environment, and an incentive program to promote the usage of more sustainable mobility alternatives.
2025
Autores
Morgado, L; Beck, D; O'Shea, P;
Publicação
VIRTUAL REALITY
Abstract
Since publication of the 2020 survey of surveys, Finding the gaps about uses of immersive learning environments: a survey of surveys, the field of immersive learning environments has experienced substantial growth and diversification. This updated review systematically maps recent developments by analyzing 64 new literature surveys published after the original corpus date, significantly expanding the corpus from 47 to 111 reviews. Through thematic content analysis, our study identifies and integrates five new educational use themes-Games, Observation, Personification, Storytelling, and Student Authoring-and revises existing categories based on recent research. We observed shifts in the prevalence of themes, most notably an increase in uses related to data collection, interactive exploration and manipulation, contextual/media integration, and physical world simulation. We also discussed these changes in relation to recent technological advancements and the influence of emergency remote teaching during the COVID-19 pandemic. Moreover, our results provide an updated representation of immersive learning uses within the conceptual framework of immersion dimensions (system, narrative, agency), updating current research clusters and persistent gaps. By illustrating areas with limited exploration, such as highly interactive narrative experiences, or low-technology interactive uses, this paper informs future research directions and contributes to an understanding of how immersive environments are being employed for learning. This comprehensive mapping thus serves as a resource for researchers and educators aiming to leverage immersive learningenvironments. This paper builds on a shorter version accepted for inclusion in the proceedings of the iLRN 2025 conference, offering expanded results, additional analyses, and extended discussion that clarifies and deepens the original findings.
2025
Autores
Abdellatif, AA; Shaban, K; Massoud, A;
Publicação
Computers and Electrical Engineering
Abstract
2025
Autores
Queirós, R; Pinto, M; Portela, F; Simões, A;
Publicação
ICPEC
Abstract
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.