Publications

Publications by Ricardo Campos

2025

NarratEX Dataset: Explaining the Dominant Narratives in News Texts

Authors
Guimarães, N; Silvano, P; Campos, R; Jorge, AM; Pacheco, AF; Dimitrov, DI; Nikolaidis, N; Yangarber, R; Sartori, E; Stefanovitch, N; Nakov, P; Piskorski, J; San Martino, GD;

Publication
EMNLP (Findings)

Abstract
We present NarratEX, a dataset designed for the task of explaining the choice of the Dominant Narrative in a news article, and intended to support the research community in addressing challenges such as discourse polarization and propaganda detection. Our dataset comprises 1,056 news articles in four languages, Bulgarian, English, Portuguese, and Russian, covering two globally significant topics: the Ukraine-Russia War (URW) and Climate Change (CC). Each article is manually annotated with a dominant narrative and sub-narrative labels, and an explanation justifying the chosen labels. We describe the dataset, the process of its creation, and its characteristics. We present experiments with two new proposed tasks: Explaining Dominant Narrative based on Text, which involves writing a concise paragraph to justify the choice of the dominant narrative and sub-narrative of a given text, and Inferring Dominant Narrative from Explanation, which involves predicting the appropriate dominant narrative category based on an explanatory text. The proposed dataset is a valuable resource for advancing research on detecting and mitigating manipulative content, while promoting a deeper understanding of how narratives influence public discourse.

CloseRead Abstract

2026

The 9th International Workshop on Narrative Extraction from Text: Text2Story 2026

Authors
Campos, R; Jorge, A; Jatowt, A; Bhatia, S; Litvak, M;

Publication
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2026, PT III

Abstract
For eight years, the Text2Story Workshop series has fostered a vibrant research community dedicated to narrative understanding, advancing shared insights into the challenges of modelling narrative structure in text. While earlier approaches laid important foundations, recent progress in Transformers and Large Language Models (LLMs) has fundamentally reshaped the field. Building on the increasing prominence of LLM-based contributions in recent editions, the ninth edition of Text2Story expands the focus toward agentic AI, where systems plan, reason, and interact over time using narratives as internal representations. Recent advances, including long-context architectures, instruction and preference-tuned models, retrieval-augmented generation, and discourse-aware prompting, have broadened the applicability of LLMs to complex narrative tasks. Nevertheless, reliably capturing fine-grained narrative structures remains challenging, particularly for event chains, temporal and causal relations, character development, and perspective consistency. These challenges are amplified in interactive and agentic settings, where narrative coherence, controllability, and reliability are critical. This edition of Text2Story explores both the opportunities and limitations of LLMs and agentic systems for narrative understanding, including the analysis of narratives generated by LLMs themselves with respect to consistency, hallucination, bias, and control. Through a diverse program of research papers, works in progress, demos, resources, and keynote talks, the workshop continues to advance narrative understanding in the era of foundation and agentic models.

CloseRead Abstract

2026

EPHG-CR: embedding propagation for heterogeneous graphs with class refinement

Authors
Dos Santos, BN; Marcacini, RM; Jorge, AM; Campos, R; Rezende, SO;

Publication
APPLIED INTELLIGENCE

Abstract
Heterogeneous graphs can represent real-world problems in a way close to reality, supporting diverse types of vertices and edges. However, their inherent heterogeneity poses challenges in interpreting problem semantics. To address this, heterogeneous graph embedding, aiming to map graph elements to low-dimensional vectors, simplifies subsequent machine learning analysis. This approach has gained prominence in machine learning, fueling classification, recommendation, and similarity search applications. Embedding diverse data is essential for efficient data processing. Incorporating language models, like BERT, into heterogeneous graphs enhances semantic context capture, which is particularly useful when one vertex type represents text. Language models stand out in contextual representation, enriching graph vertex embeddings for various tasks. This paper proposes a novel approach to enhancing heterogeneous graph embeddings by combining language models and task class data. Our approach increases vector quality, accounting for graph structure, semantic textual information, and task labels. We compared our proposal with a language model in the aspect-based sentiment analysis task, demonstrating competitive results and, in some cases, a slight superiority. Furthermore, we explore applications of embeddings from auxiliary vertices in another task, highlighting another advantage of the approach over the language model.

CloseRead Abstract

2026

pt-image-ir-dataset: An Image Retrieval Dataset in European Portuguese

Authors
Duarte, R; Branco, A; Proença, H; Campos, R;

Publication
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2026, PT IV

Abstract
With the surge of multimodal models and the demand for effective image Information Retrieval (IR) systems, high-quality text-to-image datasets have become paramount. However, most existing datasets are primarily in English, limiting their applicability to multilingual settings. To address this, we introduce the pt-image-ir-dataset, a manually annotated resource for text-based Image IR in European Portuguese. The dataset comprises 80 diverse queries and a curated pool of 5,201 images, each annotated for relevance by multiple human judges. The proposed dataset is a step forward in supporting the development and evaluation of image IR systems for European Portuguese, addressing a clear gap in multilingual multimodal research. To this end, we have made our dataset publicly available, alongside baseline experimental results, demonstrating its suitability on the Image IR task across different retrieval paradigms, including traditional text-based lexical IR methods, semantic dense retrieval models based on language embeddings, cutting-edge vision-language models and proprietary black-box image retrieval systems. Results demonstrate that vision-language models, particularly OpenCLIP/xlm-roberta-base-ViT-B-32, significantly outperform other approaches (MRR = 0.610).

CloseRead Abstract

2026

ImageSeek: A Hybrid Text-to-Image Image Retrieval System for Domain-Specific Collections

Authors
Duarte, R; Silva, R; Branco, A; Proença, H; Campos, R;

Publication
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2026, PT IV

Abstract
Large image collections are typically organized around basic metadata and keyword tags, making content discovery challenging for users seeking specific visual information. Although images may be accompanied by descriptive text, traditional retrieval systems often struggle to bridge the semantic gap between textual descriptions and visual content. In this demo, we present ImageSeek, a hybrid text-to-image retrieval system designed to enhance search effectiveness by combining text and image-based retrieval methods through an asymmetric score adjustment mechanism. The system leverages multilingual CLIP models to encode both visual and textual information, creating unified representations for cross-modal retrieval. Users can search through natural language queries in any supported language, with results ranked using a hybrid approach that treats image-based retrieval as a reliable baseline while harmonizing text-based scores through position-dependent adjustments. The demonstration system operates on a dataset of 42,333 images from the Portuguese Presidency website, providing an appropriate testbed for multimodal retrieval performance. The web application enables direct comparison between conventional CLIP-based retrieval and our hybrid approach, supporting image searches under the same conditions on external platforms, including Google Images and the Arquivo.pt image search system, enabling comparative analysis of the results. To evaluate its effectiveness, ImageSeek allows users to experience differences between retrieval modes while exploring domain-specific visual content.

CloseRead Abstract