Publications

Publications by Luís Filipe Cunha

2022

Fine-Tuning BERT Models to Extract Named Entities from Archival Finding Aids

Authors
Costa Cunha, LF; Ramalho, JC;

Publication
Proceedings of the 26th International Conference on Theory and Practice of Digital Libraries - Workshops and Doctoral Consortium, Padua, Italy, September 20, 2022.

Abstract
In recent works, several NER models were developed to extract named entities from Portuguese Archival Finding Aids. In this paper, we are complementing the work already done by presenting a different NER model with a new architecture, Bidirectional Encoding Representation from Transformers (BERT). In order to do so, we used a BERT model that was pre-trained in Portuguese vocabulary and fine-tuned it to our concrete classification problem, NER. In the end, we compared the results obtained with previous architectures. In addition to this model we also developed an annotation tool that uses ML models to speed up the corpora annotation process. © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)

CloseRead Abstract

2022

Reasoning with Portuguese Word Embeddings

Authors
Costa Cunha, LF; Almeida, JJ; Simões, A;

Publication
11th Symposium on Languages, Applications and Technologies, SLATE 2022, July 14-15, 2022, Universidade da Beira Interior, Covilhã, Portugal.

Abstract
Representing words with semantic distributions to create ML models is a widely used technique to perform Natural Language processing tasks. In this paper, we trained word embedding models with different types of Portuguese corpora, analyzing the influence of the models’ parameterization, the corpora size, and domain. Then we validated each model with the classical evaluation methods available: four words analogies and measurement of the similarity of pairs of words. In addition to these methods, we proposed new alternative techniques to validate word embedding models, presenting new resources for this purpose. Finally, we discussed the obtained results and argued about some limitations of the word embedding models’ evaluation methods. © Luís Filipe Cunha, J. João Almeida, and Alberto Simões.

CloseRead Abstract