Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
About

About

Carla Teixeira Lopes is an assistant Professor in the Department of Informatics Engineering, University of Porto, Portugal. She is also a researcher at INESC TEC since 2014. She received a PhD in Informatics Engineering from the University of Porto in 2013. Her research interests lie at the intersection of information retrieval and human-computer interaction. She is interested in studying information search behaviour and in developing tools that help people search more successfully. Lately, she has been focused in exploring how context can help improve the experience of health consumers searching the Web.

Interest
Topics
Details

Details

  • Name

    Carla Lopes
  • Role

    Senior Researcher
  • Since

    01st May 2014
005
Publications

2025

Can Llama 3 Accurately Assess Readability? A Comparative Study Using Lead Sections from Wikipedia

Authors
Rodrigues, JF; Cardoso, HL; Lopes, CT;

Publication
Research Challenges in Information Science - 19th International Conference, RCIS 2025, Seville, Spain, May 20-23, 2025, Proceedings, Part II

Abstract
Text readability is vital for effective communication and learning, especially for those with lower information literacy. This research aims to assess Llama 3’s ability to grade readability and compare its alignment with established metrics. For that purpose, we create a new dataset of article lead sections from English and Simple English Wikipedia, covering nine categories. The model is prompted to rate the readability of the texts on a grade-level scale, and an in-depth analysis of the results is conducted. While Llama 3 correlates strongly with most metrics, it may underestimate text grade levels. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

2025

Evaluating Llama 3 for Text Simplification: A Study on Wikipedia Lead Sections

Authors
Rodrigues, JF; Cardoso, HL; Lopes, CT;

Publication
Companion Proceedings of the ACM on Web Conference 2025, WWW 2025, Sydney, NSW, Australia, 28 April 2025 - 2 May 2025

Abstract
Text simplification converts complex text into simpler language, improving readability and comprehension. This study evaluates the effectiveness of open-source large language models for text simplification across various categories. We created a dataset of 66, 620 lead section pairs from English and Simple English Wikipedia, spanning nine categories, and tested Llama 3 for text simplification. We assessed its output for readability, simplicity, and meaning preservation. Results show improved readability, with simplification varying by category. Texts on Time were the most shortened, while Leisure-related texts had the greatest reduction of words/characters and syllables per sentence. Meaning preservation was most effective for the Objects and Education categories. © 2025 Copyright held by the owner/author(s). Publication rights licensed to ACM.

2025

Cross-Lingual Entity Linking Using GPT Models in Radiology Abstracts

Authors
Dias, M; Lopes, CT;

Publication
Research Challenges in Information Science - 19th International Conference, RCIS 2025, Seville, Spain, May 20-23, 2025, Proceedings, Part II

Abstract
Entity linking is an important task in medical natural language processing (NLP) for converting unstructured text into structured data for clinical analysis and semantic interoperability. However, in lower-resource languages, this task is challenging due to the limited availability of domain-specific resources. This paper explores a translation-based cross-lingual entity linking approach using GPT models, GPT-3.5 and GPT-4o, for zero-shot machine translation and entity linking with in-context learning. We evaluate our approach using a Portuguese-English parallel dataset of radiology abstracts. Our results show that chunk-level machine translation outperforms sentence-level translation. Moreover, our translation-based approach to cross-lingual entity linking of UMLS concepts outperformed the multilingual encoder method baseline. However, the in-context learning entity linking approach did not outperform a translation-based approach with a dictionary-based entity linking method. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

2024

Automatic Quality Assessment of Wikipedia Articles-A Systematic Literature Review

Authors
Moas, PM; Lopes, CT;

Publication
ACM COMPUTING SURVEYS

Abstract
Wikipedia is the world's largest online encyclopedia, but maintaining article quality through collaboration is challenging. Wikipedia designed a quality scale, but with such a manual assessment process, many articles remain unassessed. We review existing methods for automatically measuring the quality of Wikipedia articles, identifying and comparing machine learning algorithms, article features, quality metrics, and used datasets, examining 149 distinct studies, and exploring commonalities and gaps in them. The literature is extensive, and the approaches follow past technological trends. However, machine learning is still not widely used by Wikipedia, and we hope that our analysis helps future researchers change that reality.

2024

Automated image label extraction from radiology reports - A review

Authors
Pereira, SC; Mendonca, AM; Campilho, A; Sousa, P; Lopes, CT;

Publication
ARTIFICIAL INTELLIGENCE IN MEDICINE

Abstract
Machine Learning models need large amounts of annotated data for training. In the field of medical imaging, labeled data is especially difficult to obtain because the annotations have to be performed by qualified physicians. Natural Language Processing (NLP) tools can be applied to radiology reports to extract labels for medical images automatically. Compared to manual labeling, this approach requires smaller annotation efforts and can therefore facilitate the creation of labeled medical image data sets. In this article, we summarize the literature on this topic spanning from 2013 to 2023, starting with a meta-analysis of the included articles, followed by a qualitative and quantitative systematization of the results. Overall, we found four types of studies on the extraction of labels from radiology reports: those describing systems based on symbolic NLP, statistical NLP, neural NLP, and those describing systems combining or comparing two or more of the latter. Despite the large variety of existing approaches, there is still room for further improvement. This work can contribute to the development of new techniques or the improvement of existing ones.

Supervised
thesis

2023

ArchMine: Learning from non-machine-readable documents for additional insights

Author
Mariana Ferreira Dias

Institution
UP-FEUP

2023

Integration of models for linked data in cultural heritage and contributions to the FAIR principles

Author
Inês Dias Koch

Institution
UP-FEUP

2023

Images as data and metadata: management practices to promote Findability, Accessibility, Interoperability and Reusability of research data

Author
Joana Patrícia de Sousa Rodrigues

Institution
UP-FEUP

2023

Archive users, their characteristics and motivations

Author
Luana Rodrigues Ponte

Institution
UP-FEUP

2022

Automatic Categorization of Health-related Messages in Online Health Communities

Author
João Paulo Gomes Torres Abelha

Institution
UP-FEUP