Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by Alípio Jorge

2025

Human Experts vs. Large Language Models: Evaluating Annotation Scheme and Guidelines Development for Clinical Narratives

Authors
Fernandes, AL; Silvano, P; Guimarães, N; Silva, RR; Munna, TA; Cunha, LF; Leal, A; Campos, R; Jorge, A;

Publication
Proceedings of Text2Story - Eighth Workshop on Narrative Extraction From Texts held in conjunction with the 47th European Conference on Information Retrieval (ECIR 2025), Lucca, Italy, April 10, 2025.

Abstract
Electronic Health Records (EHRs) contain vast amounts of unstructured narrative text, posing challenges for organization, curation, and automated information extraction in clinical and research settings. Developing effective annotation schemes is crucial for training extraction models, yet it remains complex for both human experts and Large Language Models (LLMs). This study compares human- and LLM-generated annotation schemes and guidelines through an experimental framework. In the first phase, both a human expert and an LLM created annotation schemes based on predefined criteria. In the second phase, experienced annotators applied these schemes following the guidelines. In both cases, the results were qualitatively evaluated using Likert scales. The findings indicate that the human-generated scheme is more comprehensive, coherent, and clear compared to those produced by the LLM. These results align with previous research suggesting that while LLMs show promising performance with respect to text annotation, the same does not apply to the development of annotation schemes, and human validation remains essential to ensure accuracy and reliability. © 2025 Copyright for this paper by its authors.

2025

Using LLMs to Generate Patient Journeys in Portuguese: an Experiment

Authors
Munna, TA; Fernandes, AL; Silvano, P; Guimarães, N; Jorge, A;

Publication
Proceedings of Text2Story - Eighth Workshop on Narrative Extraction From Texts held in conjunction with the 47th European Conference on Information Retrieval (ECIR 2025), Lucca, Italy, April 10, 2025.

Abstract
The relationship of a patient with a hospital from admission to discharge is often kept in a series of textual documents that describe the patient’s journey. These documents are important to analyze the different steps of the clinical process and to make aggregated studies of the paths of patients in the hospital. In this paper, we explore the potential of Large Language Models (LLMs) to generate realistic and comprehensive patient journeys in European Portuguese, addressing the scarcity of medical data in this specific context. We employed Google’s Gemini 1.5 Flash model and utilized a dataset of 285 European Portuguese published case reports from the SPMI website, published by the Portuguese Society of Internal Medicine, as references for generating synthetic medical reports. Our methodology involves a sequential approach to generating a synthetic patient journey. Initially, we generate an admission report, followed by a discharge report. Subsequently, we generate a comprehensive patient journey that integrates the admission, multiple daily progress reports, and the discharge into a cohesive narrative. This end-to-end process ensures a realistic and detailed representation of the patient’s clinical pathway as a patient’s journey. The generated reports were rigorously evaluated by medical and linguistic professionals, as well as automatic metrics to measure the inclusion of key medical entities, similarity to the case report, and correct Portuguese variant. Both qualitative and quantitative evaluations confirmed that the generated synthetic reports are predominantly written in European Portuguese without the loss of important medical information from the case reports. This work contributes to developing high-quality synthetic medical data for training LLMs and advancing AI-driven healthcare applications in under-resourced language settings. © 2025 Copyright for this paper by its authors.

2025

FRaN-X: FRaming and Narratives-eXplorer

Authors
Muratov, A; Shaikh, HF; Jani, V; Mahmoud, T; Xie, Z; Orel, D; Singh, A; Wang, Y; Joshi, A; Iqbal, H; Hee, MS; Sahnan, D; Nikolaidis, N; Silvano, P; Dimitrov, D; Yangarber, R; Campos, R; Jorge, A; Guimarães, N; Sartori, E; Stefanovitch, N; San Martino, GD; Piskorski, J; Nakov, P;

Publication
CoRR

Abstract

2025

Report on the 8th Workshop on Narrative Extraction from Texts (Text2Story 2025) at ECIR 2025

Authors
Campos, R; Jorge, AM; Jatowt, A; Bhatia, S; Litvak, M; Cordeiro, JP; Rocha, C; Sousa, HO; Cunha, LF; Mansouri, B;

Publication
SIGIR Forum

Abstract
The Eighth International Workshop on Narrative Extraction from Texts (Text2Story'25) was held on April 10 th , 2025, in conjunction with the 47 th European Conference on Information Retrieval (ECIR 2025) in Lucca, Italy. During this half-day event, more than 30 attendees engaged in discussions and presentations focused on recent advancements in narrative representation, extraction, and generation. The workshop featured a keynote address and a mix of oral presentations and poster sessions covering nineteen papers. The workshop proceedings are available online 1 . Date: 10 April 2025. Website: https://text2story25.inesctec.pt/.

2025

ICDAR 2025 Competition on Automatic Classification of Literary Epochs

Authors
Rabaev, I; Litvak, M; Bass, R; Campos, R; Jorge, AM; Jatowt, A;

Publication
Document Analysis and Recognition - ICDAR 2025 - 19th International Conference, Wuhan, China, September 16-21, 2025, Proceedings, Part V

Abstract
This report describes the ICDAR 2025 Competition on Automatic Classification of Literary Epochs (ICDAR 2025 CoLiE), which consisted of two tasks focused on automatic prediction of the time in which a book was written (date of first publication). Both tasks comprised two sub-tasks, where a related fine-grained classification was addressed. Task 1 consisted of the identification of literary epochs, such as Romanticism or Modernism (sub-task 1.1), and a more precise classification of the period within the epoch (sub-task 1.2). Task 2 addressed the chronological identification of century (sub-task 2.1) or decade (sub-task 2.2). The compiled dataset and the reported findings are valuable to the scientific community and contribute to advancing research in the automatic dating of texts and its applications in digital humanities and temporal text analysis. © 2025 Elsevier B.V., All rights reserved.

2025

Proceedings of Text2Story - Eighth Workshop on Narrative Extraction From Texts held in conjunction with the 47th European Conference on Information Retrieval (ECIR 2025), Lucca, Italy, April 10, 2025

Authors
Campos, R; Jorge, AM; Jatowt, A; Bhatia, S; Litvak, M;

Publication
Text2Story@ECIR

Abstract

  • 43
  • 46