Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Tópicos
de interesse
Detalhes

Detalhes

  • Nome

    Nuno Ricardo Guimarães
  • Cargo

    Investigador Auxiliar
  • Desde

    01 dezembro 2015
007
Publicações

2025

MedLink: Retrieval and Ranking of Case Reports to Assist Clinical Decision Making

Autores
Cunha, LF; Guimarães, N; Mendes, A; Campos, R; Jorge, A;

Publicação
Advances in Information Retrieval - 47th European Conference on Information Retrieval, ECIR 2025, Lucca, Italy, April 6-10, 2025, Proceedings, Part V

Abstract
In healthcare, diagnoses usually rely on physician expertise. However, complex cases may benefit from consulting similar past clinical reports cases. In this paper, we present MedLink (http://medlink.inesctec.pt), a tool that given a free-text medical report, retrieves and ranks relevant clinical case reports published in health conferences and journals, aiming to support clinical decision-making, particularly in challenging or complex diagnoses. To this regard, we trained two BERT models on the sentence similarity task: a bi-encoder for retrieval and a cross-encoder for reranking. To evaluate our approach, we used 10 medical reports and asked a physician to rank the top 10 most relevant published case reports for each one. Our results show that MedLink’s ranking model achieved NDCG@10 of 0.747. Our demo also includes the visualization of clinical entities (using a NER model) and the production of a textual explanation (using a LLM) to ease comparison and contrasting between reports. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

2025

Automating Data Extraction from PDF Sleep Reports Using Data Mining Techniques

Autores
Teixeira, F; Costa, J; Amorim, P; Guimarães, N; Ferreira Santos, D;

Publicação
Studies in health technology and informatics

Abstract
This work introduces a web application for extracting, processing, and visualizing data from sleep studies reports. Using Optical Character Recognition (OCR) and Natural Language Processing (NLP), the pipeline extracts over 75 key data points from four types of sleep reports. The web application offers an intuitive interface to view individual reports' details and aggregate data from multiple reports. The pipeline demonstrated 100% accuracy in extracting targeted information from a test set of 40 reports, even in cases with missing data or formatting inconsistencies. The developed tool streamlines the analysis of OSA reports, reducing the need for technical expertise and enabling healthcare providers and researchers to utilize sleep study data efficiently. Future work aims to expand the dataset for more complex analyses and imputation techniques.

2025

PolyNarrative: A Multilingual, Multilabel, Multi-domain Dataset for Narrative Extraction from News Articles

Autores
Nikolaidis, N; Stefanovitch, N; Silvano, P; Dimitrov, DI; Yangarber, R; Guimarães, N; Sartori, E; Androutsopoulos, I; Nakov, P; San Martino, GD; Piskorski, J;

Publicação
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2025, Vienna, Austria, July 27 - August 1, 2025

Abstract

2025

Human Experts vs. Large Language Models: Evaluating Annotation Scheme and Guidelines Development for Clinical Narratives

Autores
Fernandes, AL; Silvano, P; Guimarães, N; Silva, RR; Munna, TA; Cunha, LF; Leal, A; Campos, R; Jorge, A;

Publicação
Proceedings of Text2Story - Eighth Workshop on Narrative Extraction From Texts held in conjunction with the 47th European Conference on Information Retrieval (ECIR 2025), Lucca, Italy, April 10, 2025.

Abstract
Electronic Health Records (EHRs) contain vast amounts of unstructured narrative text, posing challenges for organization, curation, and automated information extraction in clinical and research settings. Developing effective annotation schemes is crucial for training extraction models, yet it remains complex for both human experts and Large Language Models (LLMs). This study compares human- and LLM-generated annotation schemes and guidelines through an experimental framework. In the first phase, both a human expert and an LLM created annotation schemes based on predefined criteria. In the second phase, experienced annotators applied these schemes following the guidelines. In both cases, the results were qualitatively evaluated using Likert scales. The findings indicate that the human-generated scheme is more comprehensive, coherent, and clear compared to those produced by the LLM. These results align with previous research suggesting that while LLMs show promising performance with respect to text annotation, the same does not apply to the development of annotation schemes, and human validation remains essential to ensure accuracy and reliability. © 2025 Copyright for this paper by its authors.

2025

Using LLMs to Generate Patient Journeys in Portuguese: an Experiment

Autores
Munna, TA; Fernandes, AL; Silvano, P; Guimarães, N; Jorge, A;

Publicação
Proceedings of Text2Story - Eighth Workshop on Narrative Extraction From Texts held in conjunction with the 47th European Conference on Information Retrieval (ECIR 2025), Lucca, Italy, April 10, 2025.

Abstract
The relationship of a patient with a hospital from admission to discharge is often kept in a series of textual documents that describe the patient’s journey. These documents are important to analyze the different steps of the clinical process and to make aggregated studies of the paths of patients in the hospital. In this paper, we explore the potential of Large Language Models (LLMs) to generate realistic and comprehensive patient journeys in European Portuguese, addressing the scarcity of medical data in this specific context. We employed Google’s Gemini 1.5 Flash model and utilized a dataset of 285 European Portuguese published case reports from the SPMI website, published by the Portuguese Society of Internal Medicine, as references for generating synthetic medical reports. Our methodology involves a sequential approach to generating a synthetic patient journey. Initially, we generate an admission report, followed by a discharge report. Subsequently, we generate a comprehensive patient journey that integrates the admission, multiple daily progress reports, and the discharge into a cohesive narrative. This end-to-end process ensures a realistic and detailed representation of the patient’s clinical pathway as a patient’s journey. The generated reports were rigorously evaluated by medical and linguistic professionals, as well as automatic metrics to measure the inclusion of key medical entities, similarity to the case report, and correct Portuguese variant. Both qualitative and quantitative evaluations confirmed that the generated synthetic reports are predominantly written in European Portuguese without the loss of important medical information from the case reports. This work contributes to developing high-quality synthetic medical data for training LLMs and advancing AI-driven healthcare applications in under-resourced language settings. © 2025 Copyright for this paper by its authors.

Teses
supervisionadas

2023

Imbalanced MultiClass Classification with Concept Drift

Autor
José Gabriel Moreira Pinto

Instituição
UP-FCUP