Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por LIAAD

2025

The 8th International Workshop on Narrative Extraction from Texts: Text2Story 2025

Autores
Campos, R; Jorge, A; Jatowt, A; Bhatia, S; Litvak, M;

Publicação
Advances in Information Retrieval - 47th European Conference on Information Retrieval, ECIR 2025, Lucca, Italy, April 6-10, 2025, Proceedings, Part V

Abstract
For seven years, the Text2Story Workshop series has fostered a vibrant community dedicated to understanding narrative structure in text, resulting in significant contributions to the field and developing a shared understanding of the challenges in this domain. While traditional methods have yielded valuable insights, the advent of Transformers and LLMs have ignited a new wave of interest in narrative understanding. The previous iteration of the workshop also witnessed a surge in LLM-based approaches, demonstrating the community’s growing recognition of their potential. In this eighth edition we propose to go deeper into the role of LLMs in narrative understanding. While LLMs have revolutionized the field of NLP and are the go-to tools for any NLP task, the ability to capture, represent and analyze contextual nuances in longer texts is still an elusive goal, let alone the understanding of consistent fine-grained narrative structures in text. Consequently, this iteration of the workshop will explore the issues involved in using LLMs to unravel narrative structures, while also examining the characteristics of narratives generated by LLMs. By fostering dialogue on these emerging areas, we aim to continue the workshop's tradition of driving innovation in narrative understanding research. Text2Story encompasses sessions covering full research papers, work-in-progress, demos, resources, position and dissemination papers, along with one keynote talk. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

2025

Enhancing Portuguese Variety Identification with Cross-Domain Approaches

Autores
Sousa, H; Almeida, R; Silvano, P; Cantante, I; Campos, R; Jorge, A;

Publicação
THIRTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, AAAI-25, VOL 39 NO 24

Abstract
Recent advances in natural language processing have raised expectations for generative models to produce coherent text across diverse language varieties. In the particular case of the Portuguese language, the predominance of Brazilian Portuguese corpora online introduces linguistic biases in these models, limiting their applicability outside of Brazil. To address this gap and promote the creation of European Portuguese resources, we developed a cross-domain language variety identifier (LVI) to discriminate between European and Brazilian Portuguese. Motivated by the findings of our literature review, we compiled the PtBrVarId corpus, a cross-domain LVI dataset, and study the effectiveness of transformer-based LVI classifiers for cross-domain scenarios. Although this research focuses on two Portuguese varieties, our contribution can be extended to other varieties and languages. We open source the code, corpus, and models to foster further research in this task.

2025

Tradutor: Building a Variety Specific Translation Model

Autores
Sousa, H; Almasian, S; Campos, R; Jorge, A;

Publicação
THIRTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, AAAI-25, VOL 39 NO 24

Abstract
Language models have become foundational to many widely used systems. However, these seemingly advantageous models are double-edged swords. While they excel in tasks related to resource-rich languages like English, they often lose the fine nuances of language forms, dialects, and varieties that are inherent to languages spoken in multiple regions of the world. Languages like European Portuguese are neglected in favor of their more popular counterpart, Brazilian Portuguese, leading to suboptimal performance in various linguistic tasks. To address this gap, we introduce the first open-source translation model specifically tailored for European Portuguese, along with a novel dataset specifically designed for this task. Results from automatic evaluations on two benchmark datasets demonstrate that our best model surpasses existing open-source translation systems for Portuguese and approaches the performance of industry-leading closed-source systems for European Portuguese. By making our dataset, models, and code publicly available, we aim to support and encourage further research, fostering advancements in the representation of underrepresented language varieties.

2025

Leveraging LLMs to Improve Human Annotation Efficiency with INCEpTION

Autores
Cunha, LF; Yu, N; Silvano, P; Campos, R; Jorge, A;

Publicação
Advances in Information Retrieval - 47th European Conference on Information Retrieval, ECIR 2025, Lucca, Italy, April 6-10, 2025, Proceedings, Part V

Abstract
Manual text annotation is a complex and time-consuming task. However, recent advancements demonstrate that such a task can be accelerated with automated pre-annotation. In this paper, we present a methodology to improve the efficiency of manual text annotation by leveraging LLMs for text pre-annotation. For this purpose, we train a BERT model for a token classification task and integrate it into the INCEpTION annotation tool to generate span-level suggestions for human annotators. To assess the usefulness of our approach, we conducted an experiment where an experienced linguist annotated plain text both with and without our model’s pre-annotations. Our results show that the model-assisted approach reduces annotation time by nearly 23%. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

2025

Anomaly Detection in Pet Behavioural Data

Autores
Silva, I; Ribeiro, RP; Gama, J;

Publicação
MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2023, PT II

Abstract
Pet owners are increasingly becoming conscious of their pet's necessities and are paying more attention to their overall wellness. The well-being of their pets is intricately linked to their own emotional and physical well-being. Some veterinary system solutions are emerging to provide proactive healthcare options for pets. One such solution offers the continuous monitoring of a pet's activity through accelerometer tracking devices. Based on data collected by this application, in this paper, we study different time aggregation and three unsupervised machine learning techniques to identify anomalies in pet behaviour data. Specifically, three algorithms, Isolation Forest, Local Outlier Factor, and K-Nearest Neighbour, with various thresholds to differentiate between normal and abnormal events. Results conducted on ten pets (five cats and five dogs) show that the most effective approach is to use daily data divided into periods. Moreover, the Local Outlier Factor is the best algorithm for detecting anomalies when prioritizing the identification of true positives. However, it also produces a high false positive ratio.

2025

Data Science for Fighting Environmental Crime

Autores
Barbosa, M; Ribeiro, C; Gomes, F; Ribeiro, RP; Gama, J;

Publicação
MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2023, PT II

Abstract
The rise of environmental crimes has become a major concern globally as they cause significant damage to ecosystems, public health and result in economic losses. The availability of vast sensor data provides an opportunity to analyze environmental data proactively. This helps to detect irregularities and uncover potential criminal activities. This paper highlights the critical role played by machine learning (ML) and remote sensing technologies in the continuously evolving scenarios of environmental crime. By examining some case studies on detecting illegal fishing, illegal oil spills, illegal landfills, and illegal logging, we delve into the practical implementation of data-driven approaches for environmental crime detection. Our goal with this study is to provide an overview of the existing research in this area and foster the use of ML and data science techniques to enhance environmental crime detection.

  • 4
  • 510