Publications

Publications by Sérgio Nunes

2021

Fatigued Random Walks in Hypergraphs: A Neuronal Analogy to Improve Retrieval Performance

Authors
Devezas, JL; Nunes, S;

Publication
CoRR

Abstract

2021

Managing Research the Wiki Way: A Systematic Approach to Documenting Research

Authors
Devezas, JL; Nunes, S;

Publication
CoRR

Abstract

2023

A survey on narrative extraction from textual data

Authors
Santana, B; Campos, R; Amorim, E; Jorge, A; Silvano, P; Nunes, S;

Publication
ARTIFICIAL INTELLIGENCE REVIEW

Abstract
Narratives are present in many forms of human expression and can be understood as a fundamental way of communication between people. Computational understanding of the underlying story of a narrative, however, may be a rather complex task for both linguists and computational linguistics. Such task can be approached using natural language processing techniques to automatically extract narratives from texts. In this paper, we present an in depth survey of narrative extraction from text, providing a establishing a basis/framework for the study roadmap to the study of this area as a whole as a means to consolidate a view on this line of research. We aim to fulfill the current gap by identifying important research efforts at the crossroad between linguists and computer scientists. In particular, we highlight the importance and complexity of the annotation process, as a crucial step for the training stage. Next, we detail methods and approaches regarding the identification and extraction of narrative components, their linkage and understanding of likely inherent relationships, before detailing formal narrative representation structures as an intermediate step for visualization and data exploration purposes. We then move into the narrative evaluation task aspects, and conclude this survey by highlighting important open issues under the domain of narratives extraction from texts that are yet to be explored.

CloseRead Abstract

2023

Annotation and Visualisation of Reporting Events in Textual Narratives

Authors
Silvano, P; Amorim, E; Leal, A; Cantante, I; Silva, F; Jorge, A; Campos, R; Nunes, S;

Publication
Text2Story@ECIR

Abstract
News articles typically include reporting events to inform on what happened. These reporting events are not part of the story being told but are nonetheless a relevant part of the news and can pose a challenge to the computational processing of news narratives. They compose a reporting narrative, which is the present study's focus. This paper aims to demonstrate through selected use cases how a comprehensive annotation scheme with suitable tags and links can properly represent the reporting events and the way they relate to the events that make the story. In addition, we put forward a proposal for their visual representation that enables a systematic and detailed analysis of the importance of reporting events in the news structure. Finally, we describe some lexico-grammatical features of reporting events, which can contribute to their automatic detection.

CloseRead Abstract

2024

Network-based Approach for Stopwords Detection

Authors
António Ali, FDM; Jesus, Gd; Cardoso, HL; Nunes, S; Silva, RS;

Publication
PROPOR (2)

Abstract
Stopword lists, an essential resource for natural language processing and information retrieval, are often unavailable for low-resource languages. Creating these lists is time-consuming and expensive, making automated stopword detection a viable alternative. This paper introduces a novel stopword detection approach that exploits the topological properties of co-occurrence networks to identify function words. By leveraging the connectivity patterns of function words in these networks, the proposed approach aims to achieve higher precision compared to traditional frequency-based methods. To assess the effectiveness of the network-based approach, we constructed co-occurrence networks for Tetun and Emakhuwa (low-resourced languages), as well as English and Portuguese. We then compared the performance of this approach with traditional frequency-based methods. The results indicate that the network-based approach consistently outperforms traditional methods, with in-degree emerging as the most reliable indicator of function words. This finding suggests promising prospects for automatically generating stopword lists in other low-resource languages, paving the way for developing natural language processing tools for these linguistic contexts. © 2024 PROPOR. All Rights Reserved.

CloseRead Abstract

2026

ClaimPT: A Portuguese Dataset of Annotated Claims in News Articles

Authors
Campos, R; Sequeira, R; Nerea, S; Cantante, I; Folques, D; Cunha, LF; Canavilhas, J; Branco, A; Jorge, A; Nunes, S; Guimarães, N; Silvano, P;

Publication
ECIR (4)

Abstract
Fact-checking remains a demanding and time-consuming task, still largely dependent on manual verification and unable to match the rapid spread of misinformation online. This is particularly important because debunking false information typically takes longer to reach consumers than the misinformation itself; accelerating corrections through automation can therefore help counter it more effectively. Although many organizations perform manual fact-checking, this approach is difficult to scale given the growing volume of digital content. These limitations have motivated interest in automating fact-checking, where identifying claims is a crucial first step. However, progress has been uneven across languages, with English dominating due to abundant annotated data. Portuguese, like other languages, still lacks accessible, licensed datasets, limiting research, Natural Language Processing (NLP) developments, and applications. In this paper, we introduce ClaimPT, a dataset of European Portuguese news articles annotated for factual claims, comprising 1,308 articles and 6,875 individual annotations. Unlike most existing resources based on social media or parliamentary transcripts, ClaimPT focuses on journalistic content, collected through a partnership with LUSA, the Portuguese News Agency. To ensure annotation quality, two trained annotators labeled each article, with a curator validating all annotations according to a newly proposed scheme. We also provide baseline models for claim detection, establishing initial benchmarks and enabling future NLP and Information Retrieval (IR) applications. By releasing ClaimPT, we aim to advance research on low-resource fact-checking and enhance understanding of misinformation in news media.

CloseRead Abstract