Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Tópicos
de interesse
Detalhes

Detalhes

  • Nome

    Luís Pimentel Trigo
  • Cluster

    Informática
  • Cargo

    Investigador Colaborador Externo
  • Desde

    18 abril 2013
Publicações

2022

Comparing Lexical and Usage Frequencies of Palatal Segments in Portuguese

Autores
Trigo, L; Silva, C;

Publicação
COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2022

Abstract
Palatal consonants in Portuguese are considered complex or marked segments because they are inherently heavy and restricted in terms of their distribution, in relation to other consonants. Moreover, they appear to display differences between themselves, as first language acquisition and creoles' adaptation suggest that /L/ is more complex than /n/. The arguments for complexity are endorsed by some qualitative studies but are still lacking quantitative support. This paper aims at analyzing the phonological restrictiveness of these consonants by comparing their actual frequency in several different corpora, reporting both lexical entries and usage in discourse. In addition to their context-free frequency, we control for their word position and phonetic adjacency. We find that palatals are less frequent than other consonants. However, relative to each other, they do not display proportional lexical and usage frequencies. These results shed new light not only on the representation of /n/ and /L/ but also on the relation between frequency and markedness in language studies.

2022

Exploring consonant frequency in Sri Lanka Portuguese

Autores
Silva, C; Trigo, L;

Publicação
CEUR Workshop Proceedings

Abstract
Although phoneme selection is a well-studied subject in contact linguistics, phoneme integration is mostly unexplored. This study aims at assessing phoneme integration by measuring consonant frequency in Sri Lanka Portuguese and Portuguese. For that, we select two large lexical corpora and, take several preparation steps to make the data uniform, consistent and reusable. In terms of integration, we find that the more unconstrained a consonant is concerning its phonotactic patterns, the more frequent it is. We also find that being coronal has a positive impact on integration, whereas being palatal has a negative impact. Moreover, we find that in spite of the apparently random changes in the consonant frequency, consonant classes are robustly transmitted from the lexifier to this creole. Copyright © 2022 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

2022

Predicting Argument Density from Multiple Annotations

Autores
Rocha, G; Leite, B; Trigo, L; Cardoso, HL; Sousa-Silva, R; Carvalho, P; Martins, B; Won, M;

Publicação
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2022)

Abstract
Annotating a corpus with argument structures is a complex task, and it is even more challenging when addressing text genres where argumentative discourse markers do not abound. We explore a corpus of opinion articles annotated by multiple annotators, providing diverse perspectives of the argumentative content therein. New annotation aggregation methods are explored, diverging from the traditional ones that try to minimize presumed errors from annotator disagreement. The impact of our methods is assessed for the task of argument density prediction, seen as an initial step in the argument mining pipeline. We evaluate and compare models trained for this regression task in different generated datasets, considering their prediction error and also from a ranking perspective. Results confirm the expectation that addressing argument density from a ranking perspective is more promising than looking at the problem as a mere regression task. We also show that probabilistic aggregation, which weighs tokens by considering all annotators, is a more interesting approach, achieving encouraging results as it accommodates different annotator perspectives. The code and models are publicly available at https://github.com/DARGMINTS/argument density.

2022

Annotating Arguments in a Corpus of Opinion Articles

Autores
Rocha, G; Trigo, L; Cardoso, HL; Sousa-Silva, R; Carvalho, P; Martins, B; Won, M;

Publicação
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION

Abstract
Interest in argument mining has resulted in an increasing number of argument annotated corpora. However, most focus on English texts with explicit argumentative discourse markers, such as persuasive essays or legal documents. Conversely, we report on the first extensive and consolidated Portuguese argument annotation project focused on opinion articles. We briefly describe the annotation guidelines based on a multi-layered process and analyze the manual annotations produced, highlighting the main challenges of this textual genre. We then conduct a comprehensive inter-annotator agreement analysis, including argumentative discourse units, their classes and relations, and resulting graphs. This analysis reveals that each of these aspects tackles very different kinds of challenges. We observe differences in annotator profiles, motivating our aim of producing a non-aggregated corpus containing the insights of every annotator. We note that the interpretation and identification of token-level arguments is challenging; nevertheless, tasks that focus on higher-level components of the argument structure can obtain considerable agreement. We lay down perspectives on corpus usage, exploiting its multi-faceted nature.

2021

Towards a Human-AI Hybrid Framework for Inter-Researcher Similarity Detection

Autores
Guimaraes, D; Paulino, D; Correia, A; Trigo, L; Brazdil, P; Paredes, H;

Publicação
PROCEEDINGS OF THE 2021 IEEE INTERNATIONAL CONFERENCE ON HUMAN-MACHINE SYSTEMS (ICHMS)

Abstract
Understanding the intellectual landscape of scientific communities and their collaborations has become an indispensable part of research per se. In this regard, measuring similarities among scientific documents can help researchers to identify groups with similar interests as a basis for strengthening collaboration and university-industry linkages. To this end, we intend to evaluate the performance of hybrid crowd-computing methods in measuring the similarity between document pairs by comparing the results achieved by crowds and artificial intelligence (AI) algorithms. That said, in this paper we designed two types of experiments to illustrate some issues in calculating how similar an automatic solution is to a given ground truth. In the first type of experiments, we created a crowdsourcing campaign consisting of four human intelligence tasks (HITs) in which the participants had to indicate whether or not a set of papers belonged to the same author. The second type involves a set of natural language processing (NLP) processes in which we used the TF-IDF measure and the Bidirectional Encoder Representation from Transformers (BERT) model. The results of the two types of experiments carried out in this study provide preliminary insight into detecting major contributions from human-AI cooperation at similarity calculation in order to achieve better decision support. We believe that in this case decision makers can be better informed about potential collaborators based on content-based insights enhanced by hybrid human-AI mechanisms.