Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Interest
Topics
Details

Details

  • Name

    Luís Pimentel Trigo
  • Role

    External Research Collaborator
  • Since

    18th April 2013
Publications

2023

NLP-Crowdsourcing Hybrid Framework for Inter-Researcher Similarity Detection

Authors
Correia, A; Guimaraes, D; Paredes, H; Fonseca, B; Paulino, D; Trigo, L; Brazdil, P; Schneider, D; Grover, A; Jameel, S;

Publication
IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS

Abstract
Visualizing and examining the intellectual landscape and evolution of scientific communities to support collaboration is crucial for multiple research purposes. In some cases, measuring similarities and matching patterns between research publication document sets can help to identify people with similar interests for building research collaboration networks and university-industry linkages. The premise of this work is assessing feasibility for resolving ambiguous cases in similarity detection to determine authorship with natural language processing (NLP) techniques so that crowdsourcing is applied only in instances that require human judgment. Using an NLP-crowdsourcing convergence strategy, we can reduce the costs of microtask crowdsourcing while saving time and maintaining disambiguation accuracy over large datasets. This article contributes a next-gen crowd-artificial intelligence framework that used an ensemble of term frequency-inverse document frequency and bidirectional encoder representation from transformers to obtain similarity rankings for pairs of scientific documents. A sequence of content-based similarity tasks was created using a crowd-powered interface for solving disambiguation problems. Our experimental results suggest that an adaptive NLP-crowdsourcing hybrid framework has advantages for inter-researcher similarity detection tasks where fully automatic algorithms provide unsatisfactory results, with the goal of helping researchers discover potential collaborators using data-driven approaches.

2023

CreoPhonPt: a collaborative database saving Portuguese creoles from digital obliteration

Authors
Silva, CRSe; Pimentel Trigo, LM;

Publication
Annual International Conference of the Alliance of Digital Humanities Organizations, DH 2022, Graz, Austria, July 10-14, 2023, Conference Abstracts

Abstract

2022

Comparing Lexical and Usage Frequencies of Palatal Segments in Portuguese

Authors
Trigo, L; Silva, C;

Publication
COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2022

Abstract
Palatal consonants in Portuguese are considered complex or marked segments because they are inherently heavy and restricted in terms of their distribution, in relation to other consonants. Moreover, they appear to display differences between themselves, as first language acquisition and creoles' adaptation suggest that /L/ is more complex than /n/. The arguments for complexity are endorsed by some qualitative studies but are still lacking quantitative support. This paper aims at analyzing the phonological restrictiveness of these consonants by comparing their actual frequency in several different corpora, reporting both lexical entries and usage in discourse. In addition to their context-free frequency, we control for their word position and phonetic adjacency. We find that palatals are less frequent than other consonants. However, relative to each other, they do not display proportional lexical and usage frequencies. These results shed new light not only on the representation of /n/ and /L/ but also on the relation between frequency and markedness in language studies.

2022

Exploring consonant frequency in Sri Lanka Portuguese

Authors
Silva, C; Trigo, L;

Publication
Proceedings of the Second Workshop on Digital Humanities and Natural Language Processing (2nd DHandNLP 2022) co-located with International Conference on the Computational Processing of Portuguese (PROPOR 2022), Virtual Event, Fortaleza, Brazil, 21st March, 2022.

Abstract
Although phoneme selection is a well-studied subject in contact linguistics, phoneme integration is mostly unexplored. This study aims at assessing phoneme integration by measuring consonant frequency in Sri Lanka Portuguese and Portuguese. For that, we select two large lexical corpora and, take several preparation steps to make the data uniform, consistent and reusable. In terms of integration, we find that the more unconstrained a consonant is concerning its phonotactic patterns, the more frequent it is. We also find that being coronal has a positive impact on integration, whereas being palatal has a negative impact. Moreover, we find that in spite of the apparently random changes in the consonant frequency, consonant classes are robustly transmitted from the lexifier to this creole. Copyright © 2022 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

2022

Predicting Argument Density from Multiple Annotations

Authors
Rocha, G; Leite, B; Trigo, L; Cardoso, HL; Sousa-Silva, R; Carvalho, P; Martins, B; Won, M;

Publication
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2022)

Abstract
Annotating a corpus with argument structures is a complex task, and it is even more challenging when addressing text genres where argumentative discourse markers do not abound. We explore a corpus of opinion articles annotated by multiple annotators, providing diverse perspectives of the argumentative content therein. New annotation aggregation methods are explored, diverging from the traditional ones that try to minimize presumed errors from annotator disagreement. The impact of our methods is assessed for the task of argument density prediction, seen as an initial step in the argument mining pipeline. We evaluate and compare models trained for this regression task in different generated datasets, considering their prediction error and also from a ranking perspective. Results confirm the expectation that addressing argument density from a ranking perspective is more promising than looking at the problem as a mere regression task. We also show that probabilistic aggregation, which weighs tokens by considering all annotators, is a more interesting approach, achieving encouraging results as it accommodates different annotator perspectives. The code and models are publicly available at https://github.com/DARGMINTS/argument density.