Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
About

About

Sérgio Nunes is an Associate Professor at the Department of Informatics Engineering at FEUP, University of Porto, and a Senior Researcher at INESC TEC. He holds a PhD in Information Retrieval (2010) focused on using temporal features for relevance estimation, and a MSc in Information Management (2004).


His main research interests are in information retrieval and web information systems. He teaches databases, web technologies and information retrieval in different programs, namely the Informatics Engineering Doctoral Program, the Informatics Engineering Bachelor and Masters, and the Multimedia Masters.


Was the Director of the U.Porto Media Innovation Labs (MIL), an Excellence Center of the University of Porto, with the mission of developing the university's capacity in the field of Media in teaching, research and innovation activities by promoting collaborations between existing university structures and articulation with external partners.

Interest
Topics
Details

Details

  • Name

    Sérgio Nunes
  • Role

    Area Manager
  • Since

    20th December 2010
007
Publications

2025

Zero-Shot and Hybrid Strategies for Tetun Ad-Hoc Text Retrieval

Authors
de Jesus, G; Singh, AK; Nunes, S; Yates, A;

Publication
Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR)

Abstract
Dense retrieval models are generally trained using supervised learning approaches for representation learning, which require a labeled dataset (i.e., query-document pairs). However, training such models from scratch is not feasible for most languages, particularly under-resourced ones, due to data scarcity and computational constraints. As an alternative, pretrained dense retrieval models can be fine-tuned for specific downstream tasks or applied directly in zero-shot settings. Given the lack of labeled data for Tetun and the fact that existing dense retrieval models do not currently support the language, this study investigates their application in zero-shot, out-of-distribution scenarios. We adapted these models to Tetun documents, producing zero-shot embeddings, to evaluate their performance across various document representations and retrieval strategies for the ad-hoc text retrieval task. The results show that most pretrained monolingual dense retrieval models outperformed their multilingual counterparts in various configurations. Given the lack of dense retrieval models specialized for Tetun, we combine Hiemstra LM with ColBERTv2 in a hybrid strategy, achieving a relative improvement of +2.01% in P@10, +4.24% in MAP@10, and +2.45% in NDCG@10 over the baseline, based on evaluations using 59 queries and up to 2,000 retrieved documents per query. We propose dual tuning parameters for the score fusion approach commonly used in hybrid retrieval and demonstrate that enriching document titles with summaries generated by a large language model (LLM) from the documents' content significantly enhances the performance of hybrid retrieval strategies in Tetun. To support reproducibility, we publicly release the LLM-generated document summaries and run files. © 2025 Elsevier B.V., All rights reserved.

2025

Insights into LLM-Based Conversational Search: A Study of Tetun-Speaking Users' Search Behavior

Authors
Jesus, GD; Nunes, S;

Publication
Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR)

Abstract
Advancements in large language model (LLM)-based conversational assistants have transformed search experiences into more natural and context-aware dialogues that resemble human conversation. However, limited access to interaction log data hinders a deeper understanding of their real-world usage. To address this gap, we analyzed 16,952 prompt logs from 904 unique users of Labadain Chat, an LLM-based conversational assistant designed for Tetun speakers, to uncover patterns in user search behavior, engagement, and intent. Our findings show that most users (29.87%) spent between one and five minutes per session, with an average of 43 unique daily users. The majority (93.97%) submitted multiple prompts per session, with an average session duration of 16.9 minutes. Most users (95.22%) were based in Timor-Leste, with education and science (28.75%) and health (28.00%) being the most searched topics. We compared our findings with a study on Google Bard logs in English, revealing similar search characteristics - including engagement duration, command-based instructions, and requests for specific assistance. Furthermore, a comparison with two conventional search engines suggests that LLM-based conversational systems have influenced user search behavior on traditional platforms, reflecting a broader trend toward command-driven queries. These insights contribute to a deeper understanding of how user search behavior evolves, particularly within low-resource language communities. To support future research, we publicly release LabadainLog-17k+, a dataset of over 17,000 real-world user search logs in Tetun, offering a unique resource for investigating conversational search in this language. © 2025 Elsevier B.V., All rights reserved.

2025

Cross-Lingual Information Retrieval in Tetun for Ad-Hoc Search

Authors
Araújo, A; de Jesus, G; Nunes, S;

Publication
Lecture Notes in Computer Science - Progress in Artificial Intelligence

Abstract

2025

User Behavior in Sports Search: Entity-Centric Query and Click Log Analysis

Authors
Damas, J; Nunes, S;

Publication
Lecture Notes in Computer Science - Progress in Artificial Intelligence

Abstract

2025

Evaluating Dense Model-based Approaches for Multimodal Medical Case Retrieval

Authors
Catarina Pires; Sérgio Nunes; Luís Filipe Teixeira;

Publication
Information Retrieval Research

Abstract
Medical case retrieval plays a crucial role in clinical decision-making by enabling healthcare professionals to find relevant cases based on patient records, diagnostic images, and textual descriptions. Given the inherently multimodal nature of medical data, effective retrieval requires models that can bridge the gap between different modalities. Traditional retrieval approaches often rely on unimodal representations, limiting their ability to capture cross-modal relationships. Recent advances in dense model-based techniques have shown promise in overcoming these limitations by encoding multimodal information into a shared latent space, facilitating retrieval based on semantic similarity. This paper investigates the potential of dense models to enhance multimodal search systems. We evaluate various dense model-based approaches to assess which model characteristics have the greatest impact on retrieval effectiveness, using the medical case-based retrieval task from ImageCLEFmed 2013 as a benchmark. Our findings indicate that different dense model approaches substantially impact retrieval effectiveness, and that applying the CombMAX fusion methodto combine their output results further improves effectiveness. Extending context length, however, yielded mixed results depending on the input data. Additionally, domain-specific models—those trained on medical data—outperformed general models trained on broad, non-specialized datasets within their respective fields. Furthermore, when text is the dominant information source, text-only models surpassed multimodal models

Supervised
thesis

2023

Visualizing News Stories from Annotated Text

Author
Catarina Justo dos Santos Fernandes

Institution
UP-FEUP

2023

Federation Solutions for Linked Data Applications

Author
Tiago Gonçalves Gomes

Institution
UP-FEUP

2023

Information Retrieval over Linked Data Archives

Author
Cláudia Inês da Costa Martins

Institution
UP-FEUP

2023

Connect-the-Dots: Artificial Intelligence and Automation in Investigative Journalism

Author
Joana Rodrigues da Silva

Institution
UP-FEUP

2023

Evaluation of Text Diversity over time for Automatically Generated Texts in Sports Journalism

Author
José David Souto Rocha

Institution
UP-FEUP