Publicacoes - INESC TEC

Publicações

Publicações por LIAAD

2026

Overview of the CLEF 2025 JOKER Lab: Humour in Machine

Autores
Ermakova, L; Campos, R; Bosser, AG; Miller, T;

Publicação
EXPERIMENTAL IR MEETS MULTILINGUALITY, MULTIMODALITY, AND INTERACTION, CLEF 2025

Abstract
Humour poses a unique challenge for artificial intelligence, as it often relies on non-literal language, cultural references, and linguistic creativity. The JOKER Lab, now in its fourth year, aims to advance computational humour research through shared tasks on curated, multilingual datasets, with applications in education, computer-mediated communication and translation, and conversational AI. This paper provides an overview of the JOKER Lab held at CLEF 2025, detailing the setup and results of its three main tasks: (1) humour-aware information retrieval, which involves searching a document collection for humorous texts relevant to user queries in either English or Portuguese; (2) pun translation, focussed on humour-preserving translation of paronomastic jokes from English into French; and (3) onomastic wordplay translation, a task addressing the translation of name-based wordplay from English into French. The 2025 edition builds upon previous iterations by expanding datasets and emphasising nuanced, manual evaluation methods. The Task 1 results show a marked improvement this year, apparently due to participants' judicious combination of retrieval and filtering techniques. Tasks 2 and 3 remain challenging, not only in terms of system performance but also in terms of defining meaningful and reliable evaluation metrics.

FecharLer Abstract

2026

CitiLink-Minutes: A Multilayer Annotated Dataset of Municipal Meeting Minutes

Autores
Campos, R; Pacheco, AF; Fernandes, AL; Cantante, I; Rebouças, R; Cunha, LF; Isidro, J; Evans, J; Marques, M; Batista, R; Amorim, E; Jorge, A; Guimaraes, N; Nunes, S; Leal, A; Silvano, P;

Publicação
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2026, PT IV

Abstract
City councils play a crucial role in local governance, directly influencing citizens' daily lives through decisions made during municipal meetings. These deliberations are formally documented in meeting minutes, which serve as official records of discussions, decisions, and voting outcomes. Despite their importance, municipal meeting records have received little attention in Information Retrieval (IR) and Natural Language Processing (NLP), largely due to the lack of annotated datasets, which ultimately limit the development of computational models. To address this gap, we introduce CitiLink-Minutes, a multilayer dataset of 120 European Portuguese municipal meeting minutes from six municipalities. Unlike prior annotated datasets of parliamentary or video records, CitiLink-Minutes provides multilayer annotations and structured linkage of official written minutes. The dataset contains over one million tokens, with all personal identifiers de-identified. Each minute was manually annotated by two trained annotators and curated by an experienced linguist across four complementary dimensions: (1) personal information, (2) metadata, (3) subjects of discussion, and (4) voting outcomes, totaling over 38,000 individual annotations. Released under FAIR principles and accompanied by baseline results on metadata extraction, topic classification, and vote labeling, CitiLink-Minutes demonstrates its potential for downstream NLP and IR tasks, while promoting transparent access to municipal decisions.

FecharLer Abstract

2026

pt-image-ir-dataset: An Image Retrieval Dataset in European Portuguese

Autores
Duarte, R; Branco, A; Proença, H; Campos, R;

Publicação
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2026, PT IV

Abstract
With the surge of multimodal models and the demand for effective image Information Retrieval (IR) systems, high-quality text-to-image datasets have become paramount. However, most existing datasets are primarily in English, limiting their applicability to multilingual settings. To address this, we introduce the pt-image-ir-dataset, a manually annotated resource for text-based Image IR in European Portuguese. The dataset comprises 80 diverse queries and a curated pool of 5,201 images, each annotated for relevance by multiple human judges. The proposed dataset is a step forward in supporting the development and evaluation of image IR systems for European Portuguese, addressing a clear gap in multilingual multimodal research. To this end, we have made our dataset publicly available, alongside baseline experimental results, demonstrating its suitability on the Image IR task across different retrieval paradigms, including traditional text-based lexical IR methods, semantic dense retrieval models based on language embeddings, cutting-edge vision-language models and proprietary black-box image retrieval systems. Results demonstrate that vision-language models, particularly OpenCLIP/xlm-roberta-base-ViT-B-32, significantly outperform other approaches (MRR = 0.610).

FecharLer Abstract

2026

ImageSeek: A Hybrid Text-to-Image Image Retrieval System for Domain-Specific Collections

Autores
Duarte, R; Silva, R; Branco, A; Proença, H; Campos, R;

Publicação
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2026, PT IV

Abstract
Large image collections are typically organized around basic metadata and keyword tags, making content discovery challenging for users seeking specific visual information. Although images may be accompanied by descriptive text, traditional retrieval systems often struggle to bridge the semantic gap between textual descriptions and visual content. In this demo, we present ImageSeek, a hybrid text-to-image retrieval system designed to enhance search effectiveness by combining text and image-based retrieval methods through an asymmetric score adjustment mechanism. The system leverages multilingual CLIP models to encode both visual and textual information, creating unified representations for cross-modal retrieval. Users can search through natural language queries in any supported language, with results ranked using a hybrid approach that treats image-based retrieval as a reliable baseline while harmonizing text-based scores through position-dependent adjustments. The demonstration system operates on a dataset of 42,333 images from the Portuguese Presidency website, providing an appropriate testbed for multimodal retrieval performance. The web application enables direct comparison between conventional CLIP-based retrieval and our hybrid approach, supporting image searches under the same conditions on external platforms, including Google Images and the Arquivo.pt image search system, enabling comparative analysis of the results. To evaluate its effectiveness, ImageSeek allows users to experience differences between retrieval modes while exploring domain-specific visual content.

FecharLer Abstract

2026

A computational evaluation of new and existing dispatching rules for the single machine total weighted tardiness problem

Autores
Martins, ASM; Valente, JMS; Schaller, JE;

Publicação
INTERNATIONAL TRANSACTIONS IN OPERATIONAL RESEARCH

Abstract
This paper considers the single machine total weighted tardiness problem. A thorough computational evaluation of new and existing dispatching rules is performed. We considered several existing heuristics and proposed new backward rules. These procedures are analyzed together for the first time and coded in the same programming language. We also created a new and much larger dataset, which allows a more detailed comparison and provides a useful benchmark for future work.We first conducted preliminary tests to determine appropriate parameter values and to choose between three versions of the new rules. These tests showed a need to use instance characteristics to make better choices. We then analyzed the heuristics and identified the non-dominated procedures, considering solution quality and computational time. One of the new backward rules is non-dominated, achieving the best solution quality. The non-dominated set allows decision-makers to choose a procedure depending on problem size and available time.

FecharLer Abstract

2026

A Comparative Study of Deep Learning Approaches for Leishmania Detection in Microscopic Images

Autores
Monteiro, E; Nogueira, DM; Gomes, EF;

Publicação
BIOSTEC (1)

Abstract