Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Sobre

Sobre

Sérgio Nunes é Professor Associado do Departamento de Engenharia Informática da FEUP, Universidade do Porto e Investigador Sénior do INESC TEC. É Doutorado em Engenharia Informática (2010), na área da Recuperação de Informação, com trabalho focado no uso de caraterísticas temporais para estimar a relevância de informação. É Mestre em Gestão da Informação (2004) com trabalho desenvolvido na área da interoperabilidade entre sistemas de informação académicos.


Tem como principais interesses de investigação a área da recuperação de informação, a interação e visualização de informação, e os sistemas de informação em contexto web. No ensino, o foco são as áreas das bases de dados, das tecnologias da web, e da recuperação de informação, com a coordenação de diversas unidades curriculares em diferentes programas, nomeadamente o Programa Doutoral em Engenharia Informática, a Licenciatura e o Mestrado em Engenharia Informática, e o Mestrado em Multimédia.


Foi Diretor do U.Porto Media Innovation Labs (MIL), o Centro de Competências da Universidade do Porto com o objetivo de desenvolver a capacidade da universidade na área dos Media nas vertentes do ensino, investigação e inovação, promovendo colaborações entre as estruturas existentes e a articulação com parceiros externos.

Tópicos
de interesse
Detalhes

Detalhes

  • Nome

    Sérgio Nunes
  • Cargo

    Responsável de Área
  • Desde

    20 dezembro 2010
007
Publicações

2026

Cross-Lingual Information Retrieval in Tetun for Ad-Hoc Search

Autores
Araújo, A; de Jesus, G; Nunes, S;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2025, PT II

Abstract
Developing information retrieval (IR) systems that enable access across multiple languages is crucial in multilingual contexts. In Timor-Leste, where Tetun, Portuguese, English, and Indonesian are official and working languages, no cross-lingual information retrieval (CLIR) solutions currently exist to support information access across these languages. This study addresses that gap by investigating CLIR approaches tailored to the linguistic landscape of Timor-Leste. Leveraging an existing monolingual Tetun document collection and ad-hoc text retrieval baselines, we explore the feasibility of CLIR for Tetun. Queries were manually translated into Portuguese, English, and Indonesian to create a multilingual query set. These were then automatically translated back into Tetun using Google Translate and several large language models, and used to retrieve documents in Tetun. Results show that Google Translate is the most reliable tool for Tetun CLIR overall, and the Hiemstra LM consistently outperforms BM25 and DFR BM25 in cross-lingual retrieval performance. However, overall effectiveness remains up to 26.95% points lower than that of the monolingual baseline, underscoring the limitations of current translation tools and the challenges of developing an effective CLIR for Tetun. Despite these challenges, this work establishes the first CLIR baseline for Tetun ad-hoc text retrieval, providing a foundation for future research in this under-resourced setting.

2026

User Behavior in Sports Search: Entity-Centric Query and Click Log Analysis

Autores
Damas, J; Nunes, S;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2025, PT II

Abstract
Understanding user behavior in search systems is essential for improving retrieval effectiveness and user satisfaction. While prior research has extensively examined general-purpose web search engines, domain-specific contexts-such as sports information-remain comparatively underexplored. In this study, we analyze over 400,000 interaction log entries from a sports-oriented search engine collected over a two-week period. Our analysis combines classic query-level metrics (e.g., frequency distributions, query lengths) with a detailed examination of click behavior, including entropy-based intent variability and a custom query quality scoring model. Compared to established baselines from general and specialized search environments, we observe a high proportion of new and single-term queries, as well as a notable lack of representativeness among top queries. These findings reveal patterns shaped by the event-driven and entity-centric nature of sports content, offering actionable insights for the design of domain-specific retrieval systems.

2026

ClaimPT: A Portuguese Dataset of Annotated Claims in News Articles

Autores
Campos, R; Sequeira, R; Nerea, S; Cantante, I; Folques, D; Cunha, LF; Canavilhas, J; Branco, A; Jorge, A; Nunes, S; Guimarães, N; Silvano, P;

Publicação
ECIR (4)

Abstract
Fact-checking remains a demanding and time-consuming task, still largely dependent on manual verification and unable to match the rapid spread of misinformation online. This is particularly important because debunking false information typically takes longer to reach consumers than the misinformation itself; accelerating corrections through automation can therefore help counter it more effectively. Although many organizations perform manual fact-checking, this approach is difficult to scale given the growing volume of digital content. These limitations have motivated interest in automating fact-checking, where identifying claims is a crucial first step. However, progress has been uneven across languages, with English dominating due to abundant annotated data. Portuguese, like other languages, still lacks accessible, licensed datasets, limiting research, Natural Language Processing (NLP) developments, and applications. In this paper, we introduce ClaimPT, a dataset of European Portuguese news articles annotated for factual claims, comprising 1,308 articles and 6,875 individual annotations. Unlike most existing resources based on social media or parliamentary transcripts, ClaimPT focuses on journalistic content, collected through a partnership with LUSA, the Portuguese News Agency. To ensure annotation quality, two trained annotators labeled each article, with a curator validating all annotations according to a newly proposed scheme. We also provide baseline models for claim detection, establishing initial benchmarks and enabling future NLP and Information Retrieval (IR) applications. By releasing ClaimPT, we aim to advance research on low-resource fact-checking and enhance understanding of misinformation in news media.

2026

CitiLink: Enhancing Municipal Transparency and Citizen Engagement Through Searchable Meeting Minutes

Autores
Silva, R; Evans, JP; Isidro, J; Marques, M; Fonseca, A; Morais, R; Canavilhas, J; Pasquali, A; Silvano, P; Jorge, A; Guimarães, N; Nunes, S; Campos, R;

Publicação
ECIR (4)

Abstract
City council minutes are typically lengthy and formal documents with a bureaucratic writing style. Although publicly available, their structure often makes it difficult for citizens or journalists to efficiently find information. In this demo, we present CitiLink, a platform designed to transform unstructured municipal meeting minutes into structured and searchable data, demonstrating how NLP and IR can enhance the accessibility and transparency of local government. The system employs LLMs to extract metadata, discussed subjects, and voting outcomes, which are then indexed in a database to support full-text search with BM25 ranking and faceted filtering through a user-friendly interface. The developed system was built over a collection of 120 min made available by six Portuguese municipalities. To assess its usability, CitiLink was tested through guided sessions with municipal personnel, providing insights into how real users interact with the system. In addition, we evaluated Gemini’s performance in extracting relevant information from the minutes, highlighting its performance in data extraction. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.

2026

VotIE: Information Extraction from Meeting Minutes

Autores
Evans, JP; Cunha, LF; Silvano, P; Jorge, A; Guimarães, N; Nunes, S; Campos, R;

Publicação
CoRR

Abstract