Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por LIAAD

2023

Combining Neighbor Models to Improve Predictions of Age of Onset of ATTRv Carriers

Autores
Pedroto, M; Jorge, A; Mendes-Moreira, J; Coelho, T;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2023, PT II

Abstract
Transthyretin (TTR)-related familial amyloid polyneuropathy (ATTRv) is a life-threatening autosomal dominant disease and the age of onset represents the moment when first symptoms are felt. Accurately predicting the age of onset for a given patient is relevant for risk assessment and treatment management. In this work, we evaluate the impact of combining prediction models obtained from neighboring time windows on prediction error. We propose Symmetric (Sym) and Asymmetric (Asym) models which represent two different averaging approaches. These are incorporated with a weighting mechanism as to create Symmetric (Sym), Symmetric-weighted (Sym-w), Asymmetric (Asym), and Asymmetric-weighted (Asym-w). These four ensemble models are then compared to the original approach which is focused on individual regression base learners namely: Baseline (BL), Decision Tree (DT), Elastic Net (EN), Lasso (LA), Linear Regression (LR), Random Forest (RF), Ridge (RI), Support Vector Regressor (SV) and XGBoost (XG). Our results show that by aggregating predictions from neighbor models the average mean absolute error obtained by each base learner decreases. Overall, the best results are achieved by regression-based ensemble tree models as base learners.

2023

Report on the 6th International Workshop on Narrative Extraction from Texts (Text2Story 2023) at ECIR 2023

Autores
Campos, R; Jorge, AM; Jatowt, A; Bhatia, S; Litvak, M; Cordeiro, JP; Rocha, C; Sousa, H; Mansouri, B;

Publicação
SIGIR Forum

Abstract
The Sixth International Workshop on Narrative Extraction from Texts (Text2Story'23) was held on April 2 nd , 2023, in conjunction with the 45 th European Conference on Information Retrieval (ECIR 2023) in Dublin, Ireland. Continuing the tradition of past years, the workshop was held as a hybrid event. Online participation was allowed using the Zoom platform. During the course of the day, more than 50 attendees had the opportunity to follow up and discuss the recent advances in topics related to representation, extraction, and generation of narratives. The workshop program included two invited keynotes and nineteen paper presentations. The proceedings of the workshop are available online 1 . Date: 2 April 2023. Website: https://text2story23.inesctec.pt/.

2023

GPT Struct Me: Probing GPT Models on Narrative Entity Extraction

Autores
Sousa, H; Guimaraes, N; Jorge, A; Campos, R;

Publicação
2023 IEEE INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY, WI-IAT

Abstract
The importance of systems that can extract structured information from textual data becomes increasingly pronounced given the ever-increasing volume of text produced on a daily basis. Having a system that can effectively extract such information in an interoperable manner would be an asset for several domains, be it finance, health, or legal. Recent developments in natural language processing led to the production of powerful language models that can, to some degree, mimic human intelligence. Such effectiveness raises a pertinent question: Can these models be leveraged for the extraction of structured information? In this work, we address this question by evaluating the capabilities of two state-of-the-art language models - GPT-3 and GPT-3.5, commonly known as ChatGPT - in the extraction of narrative entities, namely events, participants, and temporal expressions. This study is conducted on the Text2Story Lusa dataset, a collection of 119 Portuguese news articles whose annotation framework includes a set of entity structures along with several tags and attribute values. We first select the best prompt template through an ablation study over prompt components that provide varying degrees of information on a subset of documents of the dataset. Subsequently, we use the best templates to evaluate the effectiveness of the models on the remaining documents. The results obtained indicate that GPT models are competitive with out-of-the-box baseline systems, presenting an all-in-one alternative for practitioners with limited resources. By studying the strengths and limitations of these models in the context of information extraction, we offer insights that can guide future improvements and avenues to explore in this field.

2023

Proceedings of the 6th Workshop on Online Recommender Systems and User Modeling co-located with the 17th ACM Conference on Recommender Systems (RecSys 2023), Singapore, September 19th, 2023

Autores
Vinagre, J; Ghossein, MA; Peska, L; Jorge, AM; Bifet, A;

Publicação
ORSUM@RecSys

Abstract

2023

AfriSenti: A Twitter Sentiment Analysis Benchmark for African Languages

Autores
Muhammad, SH; Abdulmumin, I; Ayele, AA; Ousidhoum, N; Adelani, DI; Yimam, SM; Ahmad, IS; Beloucif, M; Mohammad, SM; Ruder, S; Hourrane, O; Jorge, A; Brazdil, P; António Ali, FDM; David, D; Osei, S; Bello, BS; Lawan, FI; Gwadabe, T; Rutunda, S; Belay, TD; Messelle, WB; Balcha, HB; Chala, SA; Gebremichael, HT; Opoku, B; Arthur, S;

Publicação
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023

Abstract
Africa is home to over 2,000 languages from more than six language families and has the highest linguistic diversity among all continents. These include 75 languages with at least one million speakers each. Yet, there is little NLP research conducted on African languages. Crucial to enabling such research is the availability of high-quality annotated datasets. In this paper, we introduce AfriSenti, a sentiment analysis benchmark that contains a total of >110,000 tweets in 14 African languages (Amharic, Algerian Arabic, Hausa, Igbo, Kinyarwanda, Moroccan Arabic, Mozambican Portuguese, Nigerian Pidgin, Oromo, Swahili, Tigrinya, Twi, Xitsonga, and Yorùbá) from four language families. The tweets were annotated by native speakers and used in the AfriSenti-SemEval shared task 1. We describe the data collection methodology, annotation process, and the challenges we dealt with when curating each dataset. We further report baseline experiments conducted on the different datasets and discuss their usefulness. ©2023 Association for Computational Linguistics.

2023

The Competition on Automatic Classification of Literary Epochs

Autores
Rabaev, I; Litvak, M; Younkin, V; Campos, R; Jorge, AM; Jatowt, A;

Publicação
Proceedings of the IACT - The 1st International Workshop on Implicit Author Characterization from Texts for Search and Retrieval held in conjunction with the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2023), Taipei, Taiwan, July 27, 2023.

Abstract
This paper describes the shared task on Automatic Classification of Literary Epochs (CoLiE) held as a part of the 1st International Workshop on Implicit Author Characterization from Texts for Search and Retrieval (IACT’23) held at SIGIR 2023. The competition aimed to enhance the capabilities of large-scale analysis and cross-comparative studies of literary texts by automating their classification into the respective epochs. We believe that the competition contributed to the field of information retrieval by exposing the first large benchmark dataset and the first study’s results with various methods applied to this dataset. This paper presents the details of the contest, the dataset used, the evaluation procedure, and an overview of participating methods. © 2022 Copyright for this paper by its authors.

  • 62
  • 515