Detalhes
Nome
Nuno Ricardo GuimarãesCargo
Investigador AuxiliarDesde
01 dezembro 2015
Nacionalidade
PortugalCentro
Laboratório de Inteligência Artificial e Apoio à DecisãoContactos
+351220402963
nuno.r.guimaraes@inesctec.pt
2025
Autores
Pacheco, AF; Guimarães, N; Torres, A; Silvano, P; Almeida, I;
Publicação
Revista da Associação Portuguesa de Linguística
Abstract
2025
Autores
Leite, M; Silva, RR; Guimarães, N; Stork, L; Jorge, A;
Publicação
Progress in Artificial Intelligence - 24th EPIA Conference on Artificial Intelligence, EPIA 2025, Faro, Portugal, October 1-3, 2025, Proceedings, Part I
Abstract
Providing healthcare professionals with quick access to structured standardized information enables comprehensive analysis and improves clinical decision-making. However, an important part of the records in health institutions is in the form of free text. This paper proposes a pipeline that automatically extracts medical information from Electronic Medical Records (EMRs), based on large language models (LLMs) and a domain ontology defined and validated in collaboration with a medical expert. The output is a knowledge graph of clinical narratives that can be used to search through repositories of EMRs or discover new facts. To promote the standardization of the extracted medical terms, we link them to existing international coding systems using biomedical repositories (UMLS - Unified Medical Language System and BioPortal - Biomedical Ontology Repository). We showcase our approach on a set of Portuguese clinical texts of cases of Acute Myeloid Leukemia (AML) guided by one medical expert. We evaluate the quality of the extraction and of the knowledge graph. © 2025 Elsevier B.V., All rights reserved.
2025
Autores
Henriques, L; Guimarães, N; Jorge, A;
Publicação
Progress in Artificial Intelligence - 24th EPIA Conference on Artificial Intelligence, EPIA 2025, Faro, Portugal, October 1-3, 2025, Proceedings, Part I
Abstract
The ever-increasing volume of data produced in Healthcare demands solutions capable of automatically extracting the relevant elements of their narratives. However, given privacy regulations, bureaucratic procedures, and annotation efforts, the development of said solutions via Natural Language Processing (NLP) systems becomes hindered due to training data scarcity. Such scarcity increases when we consider languages and language varieties with lower resource availability, such as European and Brazilian Portuguese. To address this problem, we propose a Large Language Model (LLM)-based SDG (Synthetic Data Generation) framework to generate and annotate synthetic clinical texts for medical Named-Entity Recognition (NER). The SDG framework consists of a system/user prompt augmented with real examples, powered by GPT-4o. Our results show that, by feeding the framework few real clinical annotated texts, we can generate synthetic data capable of increasing the performance of NER models with respect to their non-augmented counterparts. In addition, the reduction of the BLEU scores in the generated texts indicates a decrease in the risk of privacy disclosure while ensuring greater lexical diversity. These results highlight the potential of synthetic data as a solution to overcome human annotation bottlenecks and privacy concerns, laying the groundwork for future research in clinical NLP across tasks, domains, and low-resource languages. © 2025 Elsevier B.V., All rights reserved.
2025
Autores
Muratov, A; Shaikh, HF; Jani, V; Mahmoud, T; Xie, Z; Orel, D; Singh, A; Wang, Y; Joshi, A; Iqbal, H; Hee, MS; Sahnan, D; Nikolaidis, N; Silvano, P; Dimitrov, D; Yangarber, R; Campos, R; Jorge, A; Guimarães, N; Sartori, E; Stefanovitch, N; San Martino, GD; Piskorski, J; Nakov, P;
Publicação
CoRR
Abstract
2025
Autores
Nikolaidis, N; Stefanovitch, N; Silvano, P; Dimitrov, D; Yangarber, R; Guimaraes, N; Sartori, E; Androutsopoulos, I; Nakov, P; Da San Martino, G; Piskorski, J;
Publicação
PROCEEDINGS OF THE 63RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS
Abstract
We present PolyNarrative, a new multilingual dataset of news articles, annotated for narratives. Narratives are overt or implicit claims, recurring across articles and languages, promoting a specific interpretation or viewpoint on an ongoing topic, often propagating mis/disinformation. We developed two-level taxonomies with coarse- and fine-grained narrative labels for two domains: (i) climate change and (ii) the military conflict between Ukraine and Russia. We collected news articles in four languages (Bulgarian, English, Portuguese, and Russian) related to the two domains and manually annotated them at the paragraph level. We make the dataset publicly available, along with experimental results of several strong baselines that assign narrative labels to news articles at the paragraph or the document level. We believe that this dataset will foster research in narrative detection and enable new research directions towards more multi-domain and highly granular narrative related tasks.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.