Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por Alípio Jorge

2026

LLM-Based Framework for Synthetic Data Generation in Portuguese Clinical NER

Autores
Henriques, L; Guimaraes, N; Jorge, A;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2025, PT I

Abstract
The ever-increasing volume of data produced in Healthcare demands solutions capable of automatically extracting the relevant elements of their narratives. However, given privacy regulations, bureaucratic procedures, and annotation efforts, the development of said solutions via Natural Language Processing (NLP) systems becomes hindered due to training data scarcity. Such scarcity increases when we consider languages and language varieties with lower resource availability, such as European and Brazilian Portuguese. To address this problem, we propose a Large Language Model (LLM)-based SDG (Synthetic Data Generation) framework to generate and annotate synthetic clinical texts for medical Named-Entity Recognition (NER). The SDG framework consists of a system/user prompt augmented with real examples, powered by GPT-4o. Our results show that, by feeding the framework few real clinical annotated texts, we can generate synthetic data capable of increasing the performance of NER models with respect to their non-augmented counterparts. In addition, the reduction of the BLEU scores in the generated texts indicates a decrease in the risk of privacy disclosure while ensuring greater lexical diversity. These results highlight the potential of synthetic data as a solution to overcome human annotation bottlenecks and privacy concerns, laying the groundwork for future research in clinical NLP across tasks, domains, and low-resource languages.

2026

Machine Learning and Knowledge Discovery in Databases. Research Track and Applied Data Science Track - European Conference, ECML PKDD 2025, Porto, Portugal, September 15-19, 2025, Proceedings, Part VIII

Autores
Pfahringer, B; Japkowicz, N; Larrañaga, P; Ribeiro, RP; Dutra, I; Pechenizkiy, M; Cortez, P; Pashami, S; Jorge, AM; Soares, C; Abreu, PH; Gama, J;

Publicação
ECML/PKDD (8)

Abstract

2026

Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track and Demo Track - European Conference, ECML PKDD 2025, Porto, Portugal, September 15-19, 2025, Proceedings, Part X

Autores
Dutra, I; Pechenizkiy, M; Cortez, P; Pashami, S; Pasquali, A; Moniz, N; Jorge, AM; Soares, C; Abreu, PH; Gama, J;

Publicação
ECML/PKDD (10)

Abstract

2025

Report on the 8th Workshop on Narrative Extraction from Texts (Text2Story 2025) at ECIR 2025

Autores
Campos, R; Jorge, AM; Jatowt, A; Bhatia, S; Litvak, M; Cordeiro, JP; Rocha, C; Sousa, HO; Cunha, LF; Mansouri, B;

Publicação
SIGIR Forum

Abstract
The Eighth International Workshop on Narrative Extraction from Texts (Text2Story'25) was held on April 10 th , 2025, in conjunction with the 47 th European Conference on Information Retrieval (ECIR 2025) in Lucca, Italy. During this half-day event, more than 30 attendees engaged in discussions and presentations focused on recent advancements in narrative representation, extraction, and generation. The workshop featured a keynote address and a mix of oral presentations and poster sessions covering nineteen papers. The workshop proceedings are available online 1 . Date: 10 April 2025. Website: https://text2story25.inesctec.pt/.

2025

The Temporal Game: A New Perspective on Temporal Relation Extraction

Autores
Sousa, H; Campos, R; Jorge, A;

Publicação
PROCEEDINGS OF THE 34TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2025

Abstract
In this paper we demo the Temporal Game, a novel approach to temporal relation extraction that casts the task as an interactive game. Instead of directly annotating interval-level relations, our approach decomposes them into point-wise comparisons between the start and end points of temporal entities. At each step, players classify a single point relation, and the system applies temporal closure to infer additional relations and enforce consistency. This point-based strategy naturally supports both interval and instant entities, enabling more fine-grained and flexible annotation than any previous approach. The Temporal Game also lays the groundwork for training reinforcement learning agents, by treating temporal annotation as a sequential decision-making task. To showcase this potential, the demo presented in this paper includes a Game mode, in which users annotate texts from the TempEval-3 dataset and receive feedback based on a scoring system, and an Annotation mode, that allows custom documents to be annotated and resulting timeline to be exported. Therefore, this demo serves both as a research tool and an annotation interface. The demo is publicly available at https://temporal-game.inesctec.pt, and the source code is open-sourced to foster further research and community-driven development in temporal reasoning and annotation.

2026

ICDAR 2025 Competition on Automatic Classification of Literary Epochs

Autores
Rabaev, I; Litvak, M; Bass, R; Campos, R; Jorge, AM; Jatowt, A;

Publicação
DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2025, PT V

Abstract
This report describes the ICDAR 2025 Competition on Automatic Classification of Literary Epochs (ICDAR 2025 CoLiE), which consisted of two tasks focused on automatic prediction of the time in which a book was written (date of first publication). Both tasks comprised two sub-tasks, where a related fine-grained classification was addressed. Task 1 consisted of the identification of literary epochs, such as Romanticism or Modernism (sub-task 1.1), and a more precise classification of the period within the epoch (sub-task 1.2). Task 2 addressed the chronological identification of century (sub-task 2.1) or decade (sub-task 2.2). The compiled dataset and the reported findings are valuable to the scientific community and contribute to advancing research in the automatic dating of texts and its applications in digital humanities and temporal text analysis.

  • 21
  • 46