2025
Authors
Campos, R; Jorge, A; Jatowt, A; Bhatia, S; Litvak, M;
Publication
Advances in Information Retrieval - 47th European Conference on Information Retrieval, ECIR 2025, Lucca, Italy, April 6-10, 2025, Proceedings, Part V
Abstract
For seven years, the Text2Story Workshop series has fostered a vibrant community dedicated to understanding narrative structure in text, resulting in significant contributions to the field and developing a shared understanding of the challenges in this domain. While traditional methods have yielded valuable insights, the advent of Transformers and LLMs have ignited a new wave of interest in narrative understanding. The previous iteration of the workshop also witnessed a surge in LLM-based approaches, demonstrating the community’s growing recognition of their potential. In this eighth edition we propose to go deeper into the role of LLMs in narrative understanding. While LLMs have revolutionized the field of NLP and are the go-to tools for any NLP task, the ability to capture, represent and analyze contextual nuances in longer texts is still an elusive goal, let alone the understanding of consistent fine-grained narrative structures in text. Consequently, this iteration of the workshop will explore the issues involved in using LLMs to unravel narrative structures, while also examining the characteristics of narratives generated by LLMs. By fostering dialogue on these emerging areas, we aim to continue the workshop's tradition of driving innovation in narrative understanding research. Text2Story encompasses sessions covering full research papers, work-in-progress, demos, resources, position and dissemination papers, along with one keynote talk. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
2025
Authors
Cunha, LF; Yu, N; Silvano, P; Campos, R; Jorge, A;
Publication
Advances in Information Retrieval - 47th European Conference on Information Retrieval, ECIR 2025, Lucca, Italy, April 6-10, 2025, Proceedings, Part V
Abstract
Manual text annotation is a complex and time-consuming task. However, recent advancements demonstrate that such a task can be accelerated with automated pre-annotation. In this paper, we present a methodology to improve the efficiency of manual text annotation by leveraging LLMs for text pre-annotation. For this purpose, we train a BERT model for a token classification task and integrate it into the INCEpTION annotation tool to generate span-level suggestions for human annotators. To assess the usefulness of our approach, we conducted an experiment where an experienced linguist annotated plain text both with and without our model’s pre-annotations. Our results show that the model-assisted approach reduces annotation time by nearly 23%. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
2025
Authors
Fernandes, AL; Silvano, P; Guimarães, N; Silva, RR; Munna, TA; Cunha, LF; Leal, A; Campos, R; Jorge, A;
Publication
Proceedings of Text2Story - Eighth Workshop on Narrative Extraction From Texts held in conjunction with the 47th European Conference on Information Retrieval (ECIR 2025), Lucca, Italy, April 10, 2025.
Abstract
Electronic Health Records (EHRs) contain vast amounts of unstructured narrative text, posing challenges for organization, curation, and automated information extraction in clinical and research settings. Developing effective annotation schemes is crucial for training extraction models, yet it remains complex for both human experts and Large Language Models (LLMs). This study compares human- and LLM-generated annotation schemes and guidelines through an experimental framework. In the first phase, both a human expert and an LLM created annotation schemes based on predefined criteria. In the second phase, experienced annotators applied these schemes following the guidelines. In both cases, the results were qualitatively evaluated using Likert scales. The findings indicate that the human-generated scheme is more comprehensive, coherent, and clear compared to those produced by the LLM. These results align with previous research suggesting that while LLMs show promising performance with respect to text annotation, the same does not apply to the development of annotation schemes, and human validation remains essential to ensure accuracy and reliability. © 2025 Copyright for this paper by its authors.
2025
Authors
Muratov, A; Shaikh, HF; Jani, V; Mahmoud, T; Xie, Z; Orel, D; Singh, A; Wang, Y; Joshi, A; Iqbal, H; Hee, MS; Sahnan, D; Nikolaidis, N; Silvano, P; Dimitrov, D; Yangarber, R; Campos, R; Jorge, A; Guimarães, N; Sartori, E; Stefanovitch, N; San Martino, GD; Piskorski, J; Nakov, P;
Publication
CoRR
Abstract
2025
Authors
Sousa, HO; Campos, R; Jorge, A;
Publication
CoRR
Abstract
2025
Authors
Rabaev, I; Litvak, M; Bass, R; Campos, R; Jorge, AM; Jatowt, A;
Publication
Document Analysis and Recognition - ICDAR 2025 - 19th International Conference, Wuhan, China, September 16-21, 2025, Proceedings, Part V
Abstract
This report describes the ICDAR 2025 Competition on Automatic Classification of Literary Epochs (ICDAR 2025 CoLiE), which consisted of two tasks focused on automatic prediction of the time in which a book was written (date of first publication). Both tasks comprised two sub-tasks, where a related fine-grained classification was addressed. Task 1 consisted of the identification of literary epochs, such as Romanticism or Modernism (sub-task 1.1), and a more precise classification of the period within the epoch (sub-task 1.2). Task 2 addressed the chronological identification of century (sub-task 2.1) or decade (sub-task 2.2). The compiled dataset and the reported findings are valuable to the scientific community and contribute to advancing research in the automatic dating of texts and its applications in digital humanities and temporal text analysis. © 2025 Elsevier B.V., All rights reserved.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.