2021
Authors
Devezas, JL; Nunes, S;
Publication
CoRR
Abstract
2021
Authors
Devezas, JL; Nunes, S;
Publication
CoRR
Abstract
2023
Authors
Santana, B; Campos, R; Amorim, E; Jorge, A; Silvano, P; Nunes, S;
Publication
ARTIFICIAL INTELLIGENCE REVIEW
Abstract
Narratives are present in many forms of human expression and can be understood as a fundamental way of communication between people. Computational understanding of the underlying story of a narrative, however, may be a rather complex task for both linguists and computational linguistics. Such task can be approached using natural language processing techniques to automatically extract narratives from texts. In this paper, we present an in depth survey of narrative extraction from text, providing a establishing a basis/framework for the study roadmap to the study of this area as a whole as a means to consolidate a view on this line of research. We aim to fulfill the current gap by identifying important research efforts at the crossroad between linguists and computer scientists. In particular, we highlight the importance and complexity of the annotation process, as a crucial step for the training stage. Next, we detail methods and approaches regarding the identification and extraction of narrative components, their linkage and understanding of likely inherent relationships, before detailing formal narrative representation structures as an intermediate step for visualization and data exploration purposes. We then move into the narrative evaluation task aspects, and conclude this survey by highlighting important open issues under the domain of narratives extraction from texts that are yet to be explored.
2023
Authors
Silvano, P; Amorim, E; Leal, A; Cantante, I; Silva, F; Jorge, A; Campos, R; Nunes, S;
Publication
Text2Story@ECIR
Abstract
News articles typically include reporting events to inform on what happened. These reporting events are not part of the story being told but are nonetheless a relevant part of the news and can pose a challenge to the computational processing of news narratives. They compose a reporting narrative, which is the present study's focus. This paper aims to demonstrate through selected use cases how a comprehensive annotation scheme with suitable tags and links can properly represent the reporting events and the way they relate to the events that make the story. In addition, we put forward a proposal for their visual representation that enables a systematic and detailed analysis of the importance of reporting events in the news structure. Finally, we describe some lexico-grammatical features of reporting events, which can contribute to their automatic detection.
2024
Authors
António Ali, FDM; Jesus, Gd; Cardoso, HL; Nunes, S; Silva, RS;
Publication
PROPOR (2)
Abstract
Stopword lists, an essential resource for natural language processing and information retrieval, are often unavailable for low-resource languages. Creating these lists is time-consuming and expensive, making automated stopword detection a viable alternative. This paper introduces a novel stopword detection approach that exploits the topological properties of co-occurrence networks to identify function words. By leveraging the connectivity patterns of function words in these networks, the proposed approach aims to achieve higher precision compared to traditional frequency-based methods. To assess the effectiveness of the network-based approach, we constructed co-occurrence networks for Tetun and Emakhuwa (low-resourced languages), as well as English and Portuguese. We then compared the performance of this approach with traditional frequency-based methods. The results indicate that the network-based approach consistently outperforms traditional methods, with in-degree emerging as the most reliable indicator of function words. This finding suggests promising prospects for automatically generating stopword lists in other low-resource languages, paving the way for developing natural language processing tools for these linguistic contexts. © 2024 PROPOR. All Rights Reserved.
2026
Authors
Campos, R; Sequeira, R; Nerea, S; Cantante, I; Folques, D; Cunha, LF; Canavilhas, J; Branco, A; Jorge, A; Nunes, S; Guimarães, N; Silvano, P;
Publication
ECIR (4)
Abstract
Fact-checking remains a demanding and time-consuming task, still largely dependent on manual verification and unable to match the rapid spread of misinformation online. This is particularly important because debunking false information typically takes longer to reach consumers than the misinformation itself; accelerating corrections through automation can therefore help counter it more effectively. Although many organizations perform manual fact-checking, this approach is difficult to scale given the growing volume of digital content. These limitations have motivated interest in automating fact-checking, where identifying claims is a crucial first step. However, progress has been uneven across languages, with English dominating due to abundant annotated data. Portuguese, like other languages, still lacks accessible, licensed datasets, limiting research, Natural Language Processing (NLP) developments, and applications. In this paper, we introduce ClaimPT, a dataset of European Portuguese news articles annotated for factual claims, comprising 1,308 articles and 6,875 individual annotations. Unlike most existing resources based on social media or parliamentary transcripts, ClaimPT focuses on journalistic content, collected through a partnership with LUSA, the Portuguese News Agency. To ensure annotation quality, two trained annotators labeled each article, with a curator validating all annotations according to a newly proposed scheme. We also provide baseline models for claim detection, establishing initial benchmarks and enabling future NLP and Information Retrieval (IR) applications. By releasing ClaimPT, we aim to advance research on low-resource fact-checking and enhance understanding of misinformation in news media.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.