Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por Alípio Jorge

2025

Knowledge-Aware Clinical Narrative Extraction Using Ontologies and Knowledge Graphs

Autores
Leite, M; Silva, RR; Guimarães, N; Stork, L; Jorge, A;

Publicação
EPIA (1)

Abstract
Providing healthcare professionals with quick access to structured standardized information enables comprehensive analysis and improves clinical decision-making. However, an important part of the records in health institutions is in the form of free text. This paper proposes a pipeline that automatically extracts medical information from Electronic Medical Records (EMRs), based on large language models (LLMs) and a domain ontology defined and validated in collaboration with a medical expert. The output is a knowledge graph of clinical narratives that can be used to search through repositories of EMRs or discover new facts. To promote the standardization of the extracted medical terms, we link them to existing international coding systems using biomedical repositories (UMLS - Unified Medical Language System and BioPortal - Biomedical Ontology Repository). We showcase our approach on a set of Portuguese clinical texts of cases of Acute Myeloid Leukemia (AML) guided by one medical expert. We evaluate the quality of the extraction and of the knowledge graph.

2025

LLM-Based Framework for Synthetic Data Generation in Portuguese Clinical NER

Autores
Henriques, L; Guimarães, N; Jorge, A;

Publicação
EPIA (1)

Abstract
The ever-increasing volume of data produced in Healthcare demands solutions capable of automatically extracting the relevant elements of their narratives. However, given privacy regulations, bureaucratic procedures, and annotation efforts, the development of said solutions via Natural Language Processing (NLP) systems becomes hindered due to training data scarcity. Such scarcity increases when we consider languages and language varieties with lower resource availability, such as European and Brazilian Portuguese. To address this problem, we propose a Large Language Model (LLM)-based SDG (Synthetic Data Generation) framework to generate and annotate synthetic clinical texts for medical Named-Entity Recognition (NER). The SDG framework consists of a system/user prompt augmented with real examples, powered by GPT-4o. Our results show that, by feeding the framework few real clinical annotated texts, we can generate synthetic data capable of increasing the performance of NER models with respect to their non-augmented counterparts. In addition, the reduction of the BLEU scores in the generated texts indicates a decrease in the risk of privacy disclosure while ensuring greater lexical diversity. These results highlight the potential of synthetic data as a solution to overcome human annotation bottlenecks and privacy concerns, laying the groundwork for future research in clinical NLP across tasks, domains, and low-resource languages.

2026

Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track - European Conference, ECML PKDD 2025, Porto, Portugal, September 15-19, 2025, Proceedings, Part IX

Autores
Dutra, I; Pechenizkiy, M; Cortez, P; Pashami, S; Jorge, AM; Soares, C; Abreu, PH; Gama, J;

Publicação
ECML/PKDD (9)

Abstract

2026

Machine Learning and Knowledge Discovery in Databases. Research Track and Applied Data Science Track - European Conference, ECML PKDD 2025, Porto, Portugal, September 15-19, 2025, Proceedings, Part VIII

Autores
Pfahringer, B; Japkowicz, N; Larrañaga, P; Ribeiro, RP; Dutra, I; Pechenizkiy, M; Cortez, P; Pashami, S; Jorge, AM; Soares, C; Abreu, PH; Gama, J;

Publicação
ECML/PKDD (8)

Abstract

2026

Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track and Demo Track - European Conference, ECML PKDD 2025, Porto, Portugal, September 15-19, 2025, Proceedings, Part X

Autores
Dutra, I; Pechenizkiy, M; Cortez, P; Pashami, S; Pasquali, A; Moniz, N; Jorge, AM; Soares, C; Abreu, PH; Gama, J;

Publicação
ECML/PKDD (10)

Abstract

2025

Report on the 8th Workshop on Narrative Extraction from Texts (Text2Story 2025) at ECIR 2025

Autores
Campos, R; Jorge, AM; Jatowt, A; Bhatia, S; Litvak, M; Cordeiro, JP; Rocha, C; Sousa, HO; Cunha, LF; Mansouri, B;

Publicação
SIGIR Forum

Abstract
The Eighth International Workshop on Narrative Extraction from Texts (Text2Story'25) was held on April 10 th , 2025, in conjunction with the 47 th European Conference on Information Retrieval (ECIR 2025) in Lucca, Italy. During this half-day event, more than 30 attendees engaged in discussions and presentations focused on recent advancements in narrative representation, extraction, and generation. The workshop featured a keynote address and a mix of oral presentations and poster sessions covering nineteen papers. The workshop proceedings are available online 1 . Date: 10 April 2025. Website: https://text2story25.inesctec.pt/.

  • 21
  • 46