2026
Authors
Fares, AA; Mendes-Moreira, J;
Publication
INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING-IDEAL 2025, PT II
Abstract
Counterfactual explanations (CFs) help users understand and act on black-box machine learning decisions by suggesting minimal changes to achieve a desired outcome. However, existing methods often ignore individual feasibility, leading to unrealistic or unactionable recommendations. We propose a personalized CF generation method based on cluster-specific fine-tuning of Generative Adversarial Networks (GANs). By grouping users with similar behavior and constraints, we adapt immutable features and cost weights per cluster, allowing GANs to generate more actionable and user-aligned counterfactuals. Experiments on the German Credit dataset show that our approach achieves a 6x improvement in prediction gain and a 30% reduction in sparsity compared to a baseline CounterGAN, while maintaining plausibility and acceptable latency for online use.
2026
Authors
Pandey, S; Sharma, S; Kumar, R; Moreira, JM; Chandra, J;
Publication
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS
Abstract
Traffic flow prediction remains a complex task due to the intricate spatial and temporal correlations in real-world traffic data. Although existing graph neural network (GNN) approaches have shown promise in capturing these relationships, their high computational requirements limit their suitability for real-time deployment. To overcome these limitations, we propose spatiotemporal adaptive refinement with knowledge distillation (STARK), a novel and efficient framework that integrates graph fusion with adaptive knowledge distillation (AKD) in a spatiotemporal graph convolutional network (STGCN). Our method leverages graph fusion to capture both localized and global traffic dynamics, enhancing adaptability across diverse traffic conditions. It further employs two dedicated teacher models that independently emphasize spatial and temporal features, guiding a lightweight student model through a distillation process that dynamically adjusts based on prediction uncertainty. This adaptive learning mechanism enables the student model to prioritize and better learn from more difficult prediction instances. Evaluations on four benchmark traffic datasets [PEMS03, PEMS04, PEMSD7(M), and PEMS08] demonstrate that STARK achieves competitive predictive performance, measured by mean absolute error (MAE) and root mean square error (RMSE), while significantly reducing computational overhead. Our approach thus offers an effective and scalable solution for real-time traffic forecasting.
2026
Authors
Ermakova, L; Campos, R; Bosser, AG; Miller, T;
Publication
EXPERIMENTAL IR MEETS MULTILINGUALITY, MULTIMODALITY, AND INTERACTION, CLEF 2025
Abstract
Humour poses a unique challenge for artificial intelligence, as it often relies on non-literal language, cultural references, and linguistic creativity. The JOKER Lab, now in its fourth year, aims to advance computational humour research through shared tasks on curated, multilingual datasets, with applications in education, computer-mediated communication and translation, and conversational AI. This paper provides an overview of the JOKER Lab held at CLEF 2025, detailing the setup and results of its three main tasks: (1) humour-aware information retrieval, which involves searching a document collection for humorous texts relevant to user queries in either English or Portuguese; (2) pun translation, focussed on humour-preserving translation of paronomastic jokes from English into French; and (3) onomastic wordplay translation, a task addressing the translation of name-based wordplay from English into French. The 2025 edition builds upon previous iterations by expanding datasets and emphasising nuanced, manual evaluation methods. The Task 1 results show a marked improvement this year, apparently due to participants' judicious combination of retrieval and filtering techniques. Tasks 2 and 3 remain challenging, not only in terms of system performance but also in terms of defining meaningful and reliable evaluation metrics.
2026
Authors
Campos, R; Sequeira, R; Nerea, S; Cantante, I; Folques, D; Cunha, LF; Canavilhas, J; Branco, A; Jorge, A; Nunes, S; Guimarães, N; Silvano, P;
Publication
ECIR (4)
Abstract
Fact-checking remains a demanding and time-consuming task, still largely dependent on manual verification and unable to match the rapid spread of misinformation online. This is particularly important because debunking false information typically takes longer to reach consumers than the misinformation itself; accelerating corrections through automation can therefore help counter it more effectively. Although many organizations perform manual fact-checking, this approach is difficult to scale given the growing volume of digital content. These limitations have motivated interest in automating fact-checking, where identifying claims is a crucial first step. However, progress has been uneven across languages, with English dominating due to abundant annotated data. Portuguese, like other languages, still lacks accessible, licensed datasets, limiting research, Natural Language Processing (NLP) developments, and applications. In this paper, we introduce ClaimPT, a dataset of European Portuguese news articles annotated for factual claims, comprising 1,308 articles and 6,875 individual annotations. Unlike most existing resources based on social media or parliamentary transcripts, ClaimPT focuses on journalistic content, collected through a partnership with LUSA, the Portuguese News Agency. To ensure annotation quality, two trained annotators labeled each article, with a curator validating all annotations according to a newly proposed scheme. We also provide baseline models for claim detection, establishing initial benchmarks and enabling future NLP and Information Retrieval (IR) applications. By releasing ClaimPT, we aim to advance research on low-resource fact-checking and enhance understanding of misinformation in news media.
2026
Authors
Silva, R; Evans, JP; Isidro, J; Marques, M; Fonseca, A; Morais, R; Canavilhas, J; Pasquali, A; Silvano, P; Jorge, A; Guimarães, N; Nunes, S; Campos, R;
Publication
ECIR (4)
Abstract
City council minutes are typically lengthy and formal documents with a bureaucratic writing style. Although publicly available, their structure often makes it difficult for citizens or journalists to efficiently find information. In this demo, we present CitiLink, a platform designed to transform unstructured municipal meeting minutes into structured and searchable data, demonstrating how NLP and IR can enhance the accessibility and transparency of local government. The system employs LLMs to extract metadata, discussed subjects, and voting outcomes, which are then indexed in a database to support full-text search with BM25 ranking and faceted filtering through a user-friendly interface. The developed system was built over a collection of 120 min made available by six Portuguese municipalities. To assess its usability, CitiLink was tested through guided sessions with municipal personnel, providing insights into how real users interact with the system. In addition, we evaluated Gemini’s performance in extracting relevant information from the minutes, highlighting its performance in data extraction. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.
2026
Authors
Evans, JP; Cunha, LF; Silvano, P; Jorge, A; Guimarães, N; Nunes, S; Campos, R;
Publication
CoRR
Abstract
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.