2025
Authors
Munna, TA; Fernandes, AL; Silvano, P; Guimarães, N; Jorge, A;
Publication
Proceedings of Text2Story - Eighth Workshop on Narrative Extraction From Texts held in conjunction with the 47th European Conference on Information Retrieval (ECIR 2025), Lucca, Italy, April 10, 2025.
Abstract
The relationship of a patient with a hospital from admission to discharge is often kept in a series of textual documents that describe the patient’s journey. These documents are important to analyze the different steps of the clinical process and to make aggregated studies of the paths of patients in the hospital. In this paper, we explore the potential of Large Language Models (LLMs) to generate realistic and comprehensive patient journeys in European Portuguese, addressing the scarcity of medical data in this specific context. We employed Google’s Gemini 1.5 Flash model and utilized a dataset of 285 European Portuguese published case reports from the SPMI website, published by the Portuguese Society of Internal Medicine, as references for generating synthetic medical reports. Our methodology involves a sequential approach to generating a synthetic patient journey. Initially, we generate an admission report, followed by a discharge report. Subsequently, we generate a comprehensive patient journey that integrates the admission, multiple daily progress reports, and the discharge into a cohesive narrative. This end-to-end process ensures a realistic and detailed representation of the patient’s clinical pathway as a patient’s journey. The generated reports were rigorously evaluated by medical and linguistic professionals, as well as automatic metrics to measure the inclusion of key medical entities, similarity to the case report, and correct Portuguese variant. Both qualitative and quantitative evaluations confirmed that the generated synthetic reports are predominantly written in European Portuguese without the loss of important medical information from the case reports. This work contributes to developing high-quality synthetic medical data for training LLMs and advancing AI-driven healthcare applications in under-resourced language settings. © 2025 Copyright for this paper by its authors.
2025
Authors
Teixeira, F; Costa, J; Amorim, P; Guimarães, N; Ferreira Santos, D;
Publication
Studies in health technology and informatics
Abstract
This work introduces a web application for extracting, processing, and visualizing data from sleep studies reports. Using Optical Character Recognition (OCR) and Natural Language Processing (NLP), the pipeline extracts over 75 key data points from four types of sleep reports. The web application offers an intuitive interface to view individual reports' details and aggregate data from multiple reports. The pipeline demonstrated 100% accuracy in extracting targeted information from a test set of 40 reports, even in cases with missing data or formatting inconsistencies. The developed tool streamlines the analysis of OSA reports, reducing the need for technical expertise and enabling healthcare providers and researchers to utilize sleep study data efficiently. Future work aims to expand the dataset for more complex analyses and imputation techniques.
2024
Authors
Guimarães, N; Campos, R; Jorge, A;
Publication
Proceedings of the 16th International Conference on Computational Processing of Portuguese, PROPOR 2024, Santiago de Compostela, Galicia/Spain, March 12-15, 2024, Volume 2
Abstract
2025
Authors
Cunha, LF; Guimarães, N; Mendes, A; Campos, R; Jorge, A;
Publication
Advances in Information Retrieval - 47th European Conference on Information Retrieval, ECIR 2025, Lucca, Italy, April 6-10, 2025, Proceedings, Part V
Abstract
In healthcare, diagnoses usually rely on physician expertise. However, complex cases may benefit from consulting similar past clinical reports cases. In this paper, we present MedLink (http://medlink.inesctec.pt), a tool that given a free-text medical report, retrieves and ranks relevant clinical case reports published in health conferences and journals, aiming to support clinical decision-making, particularly in challenging or complex diagnoses. To this regard, we trained two BERT models on the sentence similarity task: a bi-encoder for retrieval and a cross-encoder for reranking. To evaluate our approach, we used 10 medical reports and asked a physician to rank the top 10 most relevant published case reports for each one. Our results show that MedLink’s ranking model achieved NDCG@10 of 0.747. Our demo also includes the visualization of clinical entities (using a NER model) and the production of a textual explanation (using a LLM) to ease comparison and contrasting between reports. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
2014
Authors
Ribeiro, José Carlos; Ramos, Helena; Ferro-Lebres, Vera; Aires, Luísa; Mota, Jorge; Guimarães, Nuno; Esteves, Raquel; Moreira, Pedro; Marçal, Gustavo;
Publication
19th ANNUAL CONGRESS OF THE EUROPEAN COLLEGE OF SPORT SCIENCE
Abstract
Childhood obesity is a consequence of environments that disrupt the balance of energy intake and energy expenditure. Obesogenic environments consist of social norms and environmental factors that facilitate unhealthy behaviors around diet and physical activity. Nutritional knowledge and physical activity are cornerstones of every obesity treatment. The aims are to understand and compare how nutritional knowledge and physical activily panerns occur in children and adolescents, and if there's any differences by gender.
Methods Sample comprised 467 children and adolescents, 237 boys. PA was measured using Actigraph accelerometers (GTSXsl^ Participants were instructed to use the accelerometer, according to standard procedures, and data analyzed using the recommended guidelines (Evenson et al, 2008) Nutritional Knowledge INKI was assessed using the General Nutrition Questionnaire for Portuguese Adolescent,
and results presented as a Final Nutritional Score, in accordance with standard procedures (Ferro-Lebres, V, Ribeiro, J, Moreira, P, 2014). Height, weight body mass index were also assessed. Univariate Analysis of Variance-GLM was used to compare genders ad|usted to different school leveis of the studente, using SPSS. Results Our results present higher (p<0,05| nutritional scores for"girls (67,11 than boys (63,6 scorel. Opposed to these results boys los expectedl significantly present higher amounts of moderate to vigorous PA compared to girls 171,6 min. /day vs 42,3 min. /day; p<0, 01|. Additionally, we have 14,7% overweight/obese girls and 17,4% overweight/obese boys. Discussion Other studies have obsen/ed similar results regarding MVPA in boys and girls, but the NK about diet and nutritïon is also crucial for the treatment ond prevention of obesity in chiidren. Therefore ifs important to understand if higher scores in NK would lead to better nutritional practices; would it be possible that inaeasing studenfs NK about food contents regarding different nutrients could improve their daily practices. Do children and adolescents that have better NK behave differently regarding PA practices?
2024
Authors
Piskorski, J; Stefanovitch, N; Alam, F; Campos, R; Dimitrov, D; Jorge, A; Pollak, S; Ribin, N; Fijavz, Z; Hasanain, M; Silvano, P; Sartori, E; Guimarães, N; Vitez, AZ; Pacheco, AF; Koychev, I; Yu, N; Nakov, P; San Martino, GD;
Publication
Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), Grenoble, France, 9-12 September, 2024.
Abstract
We present an overview of CheckThat! Lab's 2024 Task 3, which focuses on detecting 23 persuasion techniques at the text-span level in online media. The task covers five languages, namely, Arabic, Bulgarian, English, Portuguese, and Slovene, and highly-debated topics in the media, e.g., the Isreali-Palestian conflict, the Russia-Ukraine war, climate change, COVID-19, abortion, etc. A total of 23 teams registered for the task, and two of them submitted system responses which were compared against a baseline and a task organizers' system, which used a state-of-the-art transformer-based architecture. We provide a description of the dataset and the overall task setup, including the evaluation methodology, and an overview of the participating systems. The datasets accompanied with the evaluation scripts are released to the research community, which we believe will foster research on persuasion technique detection and analysis of online media content in various fields and contexts. © 2024 Copyright for this paper by its authors.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.