Luís Filipe Cunha

Cookies Policy

The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More

Institution
Research
Research Domains
Artificial Intelligence

Bioengineering

Communications

Computer Science and Engineering

Photonics

Power and Energy Systems

Robotics

Systems Engineering and Management
RESEARCH CENTERS
Porto, Portugal

+351 222 094 000

info@inesctec.pt
Innovation
Innovation / Tec4

TEC4AGRO-FOOD

TEC4ENERGY

TEC4HEALTH

TEC4INDUSTRY

TEC4SEA

TECPARTNERSHIPS

Available Technologies
Porto, Portugal

+351 222 094 000

info@inesctec.pt
Laboratories
Research Laboratories

iilab
Communication
News

Events

Media

Newsletter
Porto, Portugal

+351 222 094 000

info@inesctec.pt
Work with us
Contacts

Home
People
Luís Filipe Cunha

Interest
Topics

Details

Name
Luís Filipe Cunha
Role
Research Assistant
Since
07th October 2022

Nationality
Portugal
Centre
Artificial Intelligence and Decision Support
Contacts
+351220402963
luis.f.cunha@inesctec.pt

002

Publications

View all Publications

2025

Human Experts vs. Large Language Models: Evaluating Annotation Scheme and Guidelines Development for Clinical Narratives

Authors
Fernandes, AL; Silvano, P; Guimarães, N; Silva, RR; Munna, TA; Cunha, LF; Leal, A; Campos, R; Jorge, A;

Publication
Text2Story@ECIR

Abstract
Electronic Health Records (EHRs) contain vast amounts of unstructured narrative text, posing challenges for organization, curation, and automated information extraction in clinical and research settings. Developing effective annotation schemes is crucial for training extraction models, yet it remains complex for both human experts and Large Language Models (LLMs). This study compares human- and LLM-generated annotation schemes and guidelines through an experimental framework. In the first phase, both a human expert and an LLM created annotation schemes based on predefined criteria. In the second phase, experienced annotators applied these schemes following the guidelines. In both cases, the results were qualitatively evaluated using Likert scales. The findings indicate that the human-generated scheme is more comprehensive, coherent, and clear compared to those produced by the LLM. These results align with previous research suggesting that while LLMs show promising performance with respect to text annotation, the same does not apply to the development of annotation schemes, and human validation remains essential to ensure accuracy and reliability.

CloseRead Abstract

2025

Leveraging LLMs to Improve Human Annotation Efficiency with INCEpTION

Authors
Cunha, LF; Yu, N; Silvano, P; Campos, R; Jorge, A;

Publication
ECIR (5)

Abstract
Manual text annotation is a complex and time-consuming task. However, recent advancements demonstrate that such a task can be accelerated with automated pre-annotation. In this paper, we present a methodology to improve the efficiency of manual text annotation by leveraging LLMs for text pre-annotation. For this purpose, we train a BERT model for a token classification task and integrate it into the INCEpTION annotation tool to generate span-level suggestions for human annotators. To assess the usefulness of our approach, we conducted an experiment where an experienced linguist annotated plain text both with and without our model’s pre-annotations. Our results show that the model-assisted approach reduces annotation time by nearly 23%.

CloseRead Abstract

2025

MedLink: Retrieval and Ranking of Case Reports to Assist Clinical Decision Making

Authors
Cunha, LF; Guimarães, N; Mendes, A; Campos, R; Jorge, A;

Publication
ECIR (5)

Abstract
In healthcare, diagnoses usually rely on physician expertise. However, complex cases may benefit from consulting similar past clinical reports cases. In this paper, we present MedLink (http://medlink.inesctec.pt), a tool that given a free-text medical report, retrieves and ranks relevant clinical case reports published in health conferences and journals, aiming to support clinical decision-making, particularly in challenging or complex diagnoses. To this regard, we trained two BERT models on the sentence similarity task: a bi-encoder for retrieval and a cross-encoder for reranking. To evaluate our approach, we used 10 medical reports and asked a physician to rank the top 10 most relevant published case reports for each one. Our results show that MedLink’s ranking model achieved NDCG@10 of 0.747. Our demo also includes the visualization of clinical entities (using a NER model) and the production of a textual explanation (using a LLM) to ease comparison and contrasting between reports.

CloseRead Abstract

2024

Document Level Event Extraction from Narratives

Authors
Cunha, LF;

Publication
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT V

Abstract
One of the fundamental tasks in Information Extraction (IE) is Event Extraction (EE), an extensively studied and challenging task [13,15], which aims to identify and classify events from the text. This involves identifying the event's central word (trigger) and its participants (arguments) [1]. These elements capture the event semantics and structure, which have applications in various fields, including biomedical texts [42], cybersecurity [24], economics [12], literature [32], and history [33]. Structured knowledge derived from EE can also benefit other downstream tasks such as Question Answering [20,30], Natural Language Understanding [21], Knowledge Base Graphs [3,37], summarization [8,10,41] and recommendation systems [9,18]. Despite the existence of several English EE systems [2,22,25,26], they face limited portability to other languages [4] and most of them are designed for closed domains, posing difficulties in generalising. Furthermore, most current EE systems restrict their scope to the sentence level, assuming that all arguments are contained within the same sentence as their corresponding trigger. However, real-world scenarios often involve event arguments spanning multiple sentences, highlighting the need for document-level EE.

CloseRead Abstract

2024

ACE-2005-PT: Corpus for Event Extraction in Portuguese

Authors
Cunha, LF; Silvano, P; Campos, R; Jorge, A;

Publication
PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024

Abstract
Event extraction is an NLP task that commonly involves identifying the central word (trigger) for an event and its associated arguments in text. ACE-2005 is widely recognised as the standard corpus in this field. While other corpora, like PropBank, primarily focus on annotating predicate-argument structure, ACE-2005 provides comprehensive information about the overall event structure and semantics. However, its limited language coverage restricts its usability. This paper introduces ACE-2005-PT, a corpus created by translating ACE-2005 into Portuguese, with European and Brazilian variants. To speed up the process of obtaining ACE-2005-PT, we rely on automatic translators. This, however, poses some challenges related to automatically identifying the correct alignments between multi-word annotations in the original text and in the corresponding translated sentence. To achieve this, we developed an alignment pipeline that incorporates several alignment techniques: lemmatization, fuzzy matching, synonym matching, multiple translations and a BERT-based word aligner. To measure the alignment effectiveness, a subset of annotations from the ACE-2005-PT corpus was manually aligned by a linguist expert. This subset was then compared against our pipeline results which achieved exact and relaxed match scores of 70.55% and 87.55% respectively. As a result, we successfully generated a Portuguese version of the ACE-2005 corpus, which has been accepted for publication by LDC.

CloseRead Abstract

Luís Filipe Cunha

Details

Name

Role

Since

Nationality

Centre

Contacts

StorySense

CitiLink

Human Experts vs. Large Language Models: Evaluating Annotation Scheme and Guidelines Development for Clinical Narratives

Leveraging LLMs to Improve Human Annotation Efficiency with INCEpTION

MedLink: Retrieval and Ranking of Case Reports to Assist Clinical Decision Making

Document Level Event Extraction from Narratives

ACE-2005-PT: Corpus for Event Extraction in Portuguese