Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por CRACS

2024

Topic Extraction: BERTopic’s Insight into the 117th Congress’s Twitterverse

Autores
Mendonça, M; Figueira, Á;

Publicação
Informatics

Abstract
As social media (SM) becomes increasingly prevalent, its impact on society is expected to grow accordingly. While SM has brought positive transformations, it has also amplified pre-existing issues such as misinformation, echo chambers, manipulation, and propaganda. A thorough comprehension of this impact, aided by state-of-the-art analytical tools and by an awareness of societal biases and complexities, enables us to anticipate and mitigate the potential negative effects. One such tool is BERTopic, a novel deep-learning algorithm developed for Topic Mining, which has been shown to offer significant advantages over traditional methods like Latent Dirichlet Allocation (LDA), particularly in terms of its high modularity, which allows for extensive personalization at each stage of the topic modeling process. In this study, we hypothesize that BERTopic, when optimized for Twitter data, can provide a more coherent and stable topic modeling. We began by conducting a review of the literature on topic-mining approaches for short-text data. Using this knowledge, we explored the potential for optimizing BERTopic and analyzed its effectiveness. Our focus was on Twitter data spanning the two years of the 117th US Congress. We evaluated BERTopic’s performance using coherence, perplexity, diversity, and stability scores, finding significant improvements over traditional methods and the default parameters for this tool. We discovered that improvements are possible in BERTopic’s coherence and stability. We also identified the major topics of this Congress, which include abortion, student debt, and Judge Ketanji Brown Jackson. Additionally, we describe a simple application we developed for a better visualization of Congress topics.

2024

Uncovering Manipulated Files Using Mathematical Natural Laws

Autores
Fernandes, P; Ciardhuáin, SO; Antunes, M;

Publicação
PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2023, PT I

Abstract
The data exchange between different sectors of society has led to the development of electronic documents supported by different reading formats, namely portable PDF format. These documents have characteristics similar to those used in programming languages, allowing the incorporation of potentially malicious code, which makes them a vector for cyberattacks. Thus, detecting anomalies in digital documents, such as PDF files, has become crucial in several domains, such as finance, digital forensic analysis and law enforcement. Currently, detection methods are mostly based on machine learning and are characterised by being complex, slow and mainly inefficient in detecting zero-day attacks. This paper aims to propose a Benford Law (BL) based model to uncover manipulated PDF documents by analysing potential anomalies in the first digit extracted from the PDF document's characteristics. The proposed model was evaluated using the CIC Evasive PDFMAL-2022 dataset, consisting of 1191 documents (278 benign and 918 malicious). To classify the PDF documents, based on BL, into malicious or benign documents, three statistical models were used in conjunction with the mean absolute deviation: the parametric Pearson and the non-parametric Spearman and Cramer-Von Mises models. The results show a maximum F1 score of 87.63% in detecting malicious documents using Pearson's model, demonstrating the suitability and effectiveness of applying Benford's Law in detecting anomalies in digital documents to maintain the accuracy and integrity of information and promoting trust in systems and institutions.

2024

GERF - Gamified Educational Virtual Escape Room Framework for Innovative Micro-Learning and Adaptive Learning Experiences

Autores
Queirós, R;

Publicação
Communications in Computer and Information Science

Abstract
This paper introduces GERF, a Gamified Educational Virtual Escape Room Framework designed to enhance micro-learning and adaptive learning experiences in educational settings. The framework incorporates a user taxonomy based on the user type hexad, addressing the preferences and motivations of different learners profiles. GERF focuses on two key facets: interoperability and analytics. To ensure seamless integration of Escape Room (ER) platforms with Learning Management Systems (LMS), the Learning Tools Interoperability (LTI) specification is used. This enables smooth and efficient communication between ERs and LMS platforms. Additionally, GERF uses the xAPI specification to capture and transmit experiential data in the form of xAPI statements, which are then sent to a Learning Record Store (LRS). By leveraging these learning analytics, educators gain valuable insights into students’ interactions within the ER, facilitating the adaptation of learning content based on individual learning needs. Ultimately, GERF empowers educators to create personalized learning experiences within the ER environment, fostering student engagement and learning outcomes. © 2024, The Author(s), under exclusive license to Springer Nature Switzerland AG.

2024

Comparing semantic graph representations of source code: The case of automatic feedback on programming assignments

Autores
Paiva, JC; Leal, JP; Figueira, A;

Publicação
Comput. Sci. Inf. Syst.

Abstract
Static source code analysis techniques are gaining relevance in automated assessment of programming assignments as they can provide less rigorous evaluation and more comprehensive and formative feedback. These techniques fo-cus on source code aspects rather than requiring effective code execution. To this end, syntactic and semantic information encoded in textual data is typically rep-resented internally as graphs, after parsing and other preprocessing stages. Static automated assessment techniques, therefore, draw inferences from intermediate representations to determine the correctness of a solution and derive feedback. Conse-quently, achieving the most effective semantic graph representation of source code for the specific task is critical, impacting both techniques’ accuracy, outcome, and execution time. This paper aims to provide a thorough comparison of the most widespread semantic graph representations for the automated assessment of programming assignments, including usage examples, facets, and costs for each of these representations. A benchmark has been conducted to assess their cost using the Abstract Syntax Tree (AST) as a baseline. The results demonstrate that the Code Property Graph (CPG) is the most feature-rich representation, but also the largest and most space-consuming (about 33% more than AST). © 2024, ComSIS Consortium. All rights reserved.

2024

Hardware Security for Internet of Things Identity Assurance

Autores
Cirne A.; Sousa P.R.; Resende J.S.; Antunes L.;

Publicação
IEEE Communications Surveys and Tutorials

Abstract
With the proliferation of Internet of Things (IoT) devices, there is an increasing need to prioritize their security, especially in the context of identity and authentication mechanisms. However, IoT devices have unique limitations in terms of computational capabilities and susceptibility to hardware attacks, which pose significant challenges to establishing strong identity and authentication systems. Paradoxically, the very hardware constraints responsible for these challenges can also offer potential solutions. By incorporating hardware-based identity implementations, it is possible to overcome computational and energy limitations, while bolstering resistance against both hardware and software attacks. This research addresses these challenges by investigating the vulnerabilities and obstacles faced by identity and authentication systems in the IoT context, while also exploring potential technologies to address these issues. Each identified technology underwent meticulous investigation, considering known security attacks, implemented countermeasures, and an assessment of their pros and cons. Furthermore, an extensive literature survey was conducted to identify instances where these technologies have effectively supported device identity. The research also includes a demonstration that evaluates the effectiveness of hardware trust anchors in mitigating various attacks on IoT identity. This empirical evaluation provides valuable insights into the challenges developers encounter when implementing hardware-based identity solutions. Moreover, it underscores the substantial value of these solutions in terms of mitigating attacks and developing robust identity frameworks. By thoroughly examining vulnerabilities, exploring technologies, and conducting empirical evaluations, this research contributes to understanding and promoting the adoption of hardware-based identity and authentication systems in secure IoT environments. The findings emphasize the challenges faced by developers and highlight the significance of hardware trust anchors in enhancing security and facilitating effective identity solutions.

2023

PROGpedia: Collection of source-code submitted to introductory programming assignments

Autores
Paiva, JC; Leal, JP; Figueira, A;

Publicação
DATA IN BRIEF

Abstract
Learning how to program is a difficult task. To acquire the re-quired skills, novice programmers must solve a broad range of programming activities, always supported with timely, rich, and accurate feedback. Automated assessment tools play a major role in fulfilling these needs, being a common pres-ence in introductory programming courses. As programming exercises are not easy to produce and those loaded into these tools must adhere to specific format requirements, teachers often opt for reusing them for several years. There-fore, most automated assessment tools, particularly Mooshak, store hundreds of submissions to the same programming ex-ercises, as these need to be kept after automatically pro-cessed for possible subsequent manual revision. Our dataset consists of the submissions to 16 programming exercises in Mooshak proposed in multiple years within the 2003-2020 timespan to undergraduate Computer Science students at the Faculty of Sciences from the University of Porto. In particular, we extract their code property graphs and store them as CSV files. The analysis of this data can enable, for instance, the generation of more concise and personalized feedback based on similar accepted submissions in the past, the identifica-tion of different strategies to solve a problem, the under -standing of a student's thinking process, among many other findings.(c) 2023 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ )

  • 1
  • 194