Nuno Ricardo Guimarães

Cookies Policy

The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More

Institution
Research
Research Domains
Artificial Intelligence

Bioengineering

Communications

Computer Science and Engineering

Photonics

Power and Energy Systems

Robotics

Systems Engineering and Management
RESEARCH CENTERS
Porto, Portugal

+351 222 094 000

info@inesctec.pt
Innovation
Innovation / Tec4

TEC4AGRO-FOOD

TEC4ENERGY

TEC4HEALTH

TEC4INDUSTRY

TEC4SEA

TECPARTNERSHIPS

Available Technologies
Porto, Portugal

+351 222 094 000

info@inesctec.pt
Laboratories
Research Laboratories

iilab
Communication
News

Events

Media

Newsletter
Porto, Portugal

+351 222 094 000

info@inesctec.pt
Work with us
Contacts

Home
People
Nuno Ricardo Guimarães

About

I am part of REMINDS - RElevance MINing and Detection System project and my focus is on Sentiment Analysis on Social Networks.

I completed my Master's Degree in Computer Science in the Faculty of Science at the University of Porto

I graduated in Computer Science (Bsc) in the Faculty of Science at the University of Porto

Interest
Topics

Details

Name
Nuno Ricardo Guimarães
Role
Assistant Researcher
Since
01st December 2015

Nationality
Portugal
Centre
Artificial Intelligence and Decision Support
Contacts
+351220402963
nuno.r.guimaraes@inesctec.pt

007

Publications

View all Publications

2026

Knowledge-Aware Clinical Narrative Extraction Using Ontologies and Knowledge Graphs

Authors
Leite, M; Rb Silva, R; Guimaraes, N; Stork, L; Jorge, A;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2025, PT I

Abstract
Providing healthcare professionals with quick access to structured standardized information enables comprehensive analysis and improves clinical decision-making. However, an important part of the records in health institutions is in the form of free text. This paper proposes a pipeline that automatically extracts medical information from Electronic Medical Records (EMRs), based on large language models (LLMs) and a domain ontology defined and validated in collaboration with a medical expert. The output is a knowledge graph of clinical narratives that can be used to search through repositories of EMRs or discover new facts. We showcase our approach on a set of Portuguese clinical texts of cases of Acute Myeloid Leukemia (AML) guided by one medical expert. We evaluate the quality of the extraction and of the knowledge graph.

CloseRead Abstract

2026

LLM-Based Framework for Synthetic Data Generation in Portuguese Clinical NER

Authors
Henriques, L; Guimaraes, N; Jorge, A;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2025, PT I

Abstract
The ever-increasing volume of data produced in Healthcare demands solutions capable of automatically extracting the relevant elements of their narratives. However, given privacy regulations, bureaucratic procedures, and annotation efforts, the development of said solutions via Natural Language Processing (NLP) systems becomes hindered due to training data scarcity. Such scarcity increases when we consider languages and language varieties with lower resource availability, such as European and Brazilian Portuguese. To address this problem, we propose a Large Language Model (LLM)-based SDG (Synthetic Data Generation) framework to generate and annotate synthetic clinical texts for medical Named-Entity Recognition (NER). The SDG framework consists of a system/user prompt augmented with real examples, powered by GPT-4o. Our results show that, by feeding the framework few real clinical annotated texts, we can generate synthetic data capable of increasing the performance of NER models with respect to their non-augmented counterparts. In addition, the reduction of the BLEU scores in the generated texts indicates a decrease in the risk of privacy disclosure while ensuring greater lexical diversity. These results highlight the potential of synthetic data as a solution to overcome human annotation bottlenecks and privacy concerns, laying the groundwork for future research in clinical NLP across tasks, domains, and low-resource languages.

CloseRead Abstract

2026

ClaimPT: A Portuguese Dataset of Annotated Claims in News Articles

Authors
Campos, R; Sequeira, R; Nerea, S; Cantante, I; Folques, D; Cunha, LF; Canavilhas, J; Branco, A; Jorge, A; Nunes, S; Guimaraes, N; Silvano, P;

Publication
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2026, PT IV

Abstract
Fact-checking remains a demanding and time-consuming task, still largely dependent on manual verification and unable to match the rapid spread of misinformation online. This is particularly important because debunking false information typically takes longer to reach consumers than the misinformation itself; accelerating corrections through automation can therefore help counter it more effectively. Although many organizations perform manual fact-checking, this approach is difficult to scale given the growing volume of digital content. These limitations have motivated interest in automating fact-checking, where identifying claims is a crucial first step. However, progress has been uneven across languages, with English dominating due to abundant annotated data. Portuguese, like other languages, still lacks accessible, licensed datasets, limiting research, Natural Language Processing (NLP) developments, and applications. In this paper, we introduce ClaimPT, a dataset of European Portuguese news articles annotated for factual claims, comprising 1,308 articles and 6,875 individual annotations. Unlike most existing resources based on social media or parliamentary transcripts, ClaimPT focuses on journalistic content, collected through a partnership with LUSA, the Portuguese News Agency. To ensure annotation quality, two trained annotators labeled each article, with a curator validating all annotations according to a newly proposed scheme. We also provide baseline models for claim detection, establishing initial benchmarks and enabling future NLP and Information Retrieval (IR) applications. By releasing ClaimPT, we aim to advance research on low-resource fact-checking and enhance understanding of misinformation in news media.

CloseRead Abstract

2026

CitiLink: Enhancing Municipal Transparency and Citizen Engagement Through Searchable Meeting Minutes

Authors
Silva, R; Evans, J; Isidro, J; Marques, M; Fonseca, A; Morais, R; Canavilhas, J; Pasquali, A; Silvano, P; Jorge, A; Guimaraes, N; Nunes, S; Campos, R;

Publication
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2026, PT IV

Abstract
City council minutes are typically lengthy and formal documents with a bureaucratic writing style. Although publicly available, their structure often makes it difficult for citizens or journalists to efficiently find information. In this demo, we present CitiLink, a platform designed to transform unstructured municipal meeting minutes into structured and searchable data, demonstrating how NLP and IR can enhance the accessibility and transparency of local government. The system employs LLMs to extract metadata, discussed subjects, and voting outcomes, which are then indexed in a database to support full-text search with BM25 ranking and faceted filtering through a user-friendly interface. The developed system was built over a collection of 120 min made available by six Portuguese municipalities. To assess its usability, CitiLink was tested through guided sessions with municipal personnel, providing insights into how real users interact with the system. In addition, we evaluated Geminis performance in extracting relevant information from the minutes, highlighting its performance in data extraction.

CloseRead Abstract

2026

MiNER: A Two-Stage Pipeline for Metadata Extraction from Municipal Meeting Minutes

Authors
Batista, R; Cunha, LF; Silvano, P; Guimaraes, N; Jorge, A; Amorim, E; Campos, R;

Publication
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2026, PT II

Abstract
Municipal meeting minutes are official documents of local governance that exhibit heterogeneous formats and writing styles. Effective information retrieval (IR) requires identifying metadata such as meeting number, date, location, participants, and start/end times, elements that are rarely standardized or easily extracted automatically. Existing named entity recognition (NER) models are ill-suited to this task, as they are not adapted to such domain-specific categories. In this paper, we propose a two-stage pipeline for metadata extraction from municipal minutes. First, a question-answering (QA) model identifies the opening and closing text segments containing metadata. Transformer-based models (BERTimbau and XLM-RoBERTa with and without a CRF layer) are then applied for fine-grained entity extraction, with deslexicalization explored as an additional modeling strategy. We benchmark the pipeline against open and closed-weight LLMs (Phi and Gemini), considering performance, inference cost, and carbon footprint. Our results demonstrate strong in-domain performance, outperforming the evaluated LLMs. Differences observed in cross-municipality evaluation highlight the linguistic diversity and structural variation across municipal records, underscoring the challenges of generalization in this domain and motivating future research in metadata extraction from municipal minutes.

CloseRead Abstract

Nuno Ricardo Guimarães

About

Details

Name

Role

Since

Nationality

Centre

Contacts

HfPT

CardioComplete

StorySense

Easy4ALL

CitiLink

GREENGROCER

ScopeAI

Knowledge-Aware Clinical Narrative Extraction Using Ontologies and Knowledge Graphs

LLM-Based Framework for Synthetic Data Generation in Portuguese Clinical NER

ClaimPT: A Portuguese Dataset of Annotated Claims in News Articles

CitiLink: Enhancing Municipal Transparency and Citizen Engagement Through Searchable Meeting Minutes

MiNER: A Two-Stage Pipeline for Metadata Extraction from Municipal Meeting Minutes