Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by LIAAD

2023

TEI2GO: A Multilingual Approach for Fast Temporal Expression Identification

Authors
Sousa, H; Campos, R; Jorge, A;

Publication
PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023

Abstract
Temporal expression identification is crucial for understanding texts written in natural language. Although highly effective systems such as HeidelTime exist, their limited runtime performance hampers adoption in large-scale applications and production environments. In this paper, we introduce the TEI2GO models, matching HeidelTime's effectiveness but with significantly improved runtime, supporting six languages, and achieving state-of-the-art results in four of them. To train the TEI2GO models, we used a combination of manually annotated reference corpus and developed Professor HeidelTime, a comprehensive weakly labeled corpus of news texts annotated with HeidelTime. This corpus comprises a total of 138, 069 documents (over six languages) with 1, 050, 921 temporal expressions, the largest open-source annotated dataset for temporal expression identification to date. By describing how the models were produced, we aim to encourage the research community to further explore, refine, and extend the set of models to additional languages and domains. Code, annotations, and models are openly available for community exploration and use. The models are conveniently on HuggingFace for seamless integration and application.

2023

Proceedings of the IACT - The 1st International Workshop on Implicit Author Characterization from Texts for Search and Retrieval held in conjunction with the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2023), Taipei, Taiwan, July 27, 2023

Authors
Litvak, M; Rabaev, I; Campos, R; Jorge, AM; Jatowt, A;

Publication
IACT@SIGIR

Abstract

2023

ORSUM 2023 - 6th Workshop on Online Recommender Systems and User Modeling

Authors
Vinagre, J; Ghossein, MA; Peska, L; Jorge, AM; Bifet, A;

Publication
Proceedings of the 17th ACM Conference on Recommender Systems, RecSys 2023, Singapore, Singapore, September 18-22, 2023

Abstract
Modern online platforms for user modeling and recommendation require complex data infrastructures to collect and process data. Some of this data has to be kept to later be used in batches to train personalization models. However, since user activity data can be generated at very fast rates it is also useful to have algorithms able to process data streams online, in real time. Given the continuous and potentially fast change of content, context and user preferences or intents, stream-based models, and their synchronization with batch models can be extremely challenging. Therefore, it is important to investigate methods able to transparently and continuously adapt to the inherent dynamics of user interactions, preferably over long periods of time. Models able to continuously learn from such flows of data are gaining attention in the recommender systems community, and are being increasingly deployed in online platforms. However, many challenges associated with learning from streams need further investigation. The objective of this workshop is to foster contributions and bring together a growing community of researchers and practitioners interested in online, adaptive approaches to user modeling, recommendation and personalization, and their implications regarding multiple dimensions, such as reproducibility, privacy, fairness, diversity, transparency, auditability, and compliance with recently adopted or upcoming legal frameworks worldwide. © 2023 Owner/Author.

2023

Combining Symbolic and Deep Learning Approaches for Sentiment Analysis

Authors
Muhammad, SH; Brazdil, P; Jorge, A;

Publication
Compendium of Neurosymbolic Artificial Intelligence

Abstract
Deep learning approaches have become popular in sentiment analysis because of their competitive performance. The downside of this approach is that they do not provide understandable explanations on how the sentiment values are calculated. Previous approaches that used sentiment lexicons for sentiment analysis can do that, but their performance is lower than deep learning approaches. Therefore, it is natural to wonder if the two approaches can be combined to exploit their advantages. In this chapter, we present a neuro-symbolic approach that combines both symbolic and deep learning approaches for sentiment analysis tasks. The symbolic approach exploits sentiment lexicon and shifter patterns-which cover the operations of inversion/reversal, intensification, and attenuation/downtoning. The deep learning approach used a pre-trained language model (PLM) to construct sentiment lexicon. Our experimental result shows that the proposed approach leads to promising results, substantially better than the results of a pure lexicon-based approach. Although the results did not reach the level of the deep learning approach, a great advantage is that sentiment prediction can be accompanied by understandable explanations. For some users, it is very important to see how sentiment is derived, even if performance is a little lower. © 2023 The authors and IOS Press. All rights reserved.

2023

The 1st International Workshop on Implicit Author Characterization from Texts for Search and Retrieval (IACT'23)

Authors
Litvak, M; Rabaev, I; Campos, R; Jorge, AM; Jatowt, A;

Publication
PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023

Abstract
The first edition of the Implicit Author Characterization from Texts for Search and Retrieval (IACT'23) aims at bringing to the forefront the challenges involved in identifying and extracting from texts implicit information about authors (e.g., human or AI) and using it in IR tasks. The IACT workshop provides a common forum to consolidate multi-disciplinary efforts and foster discussions to identify the wide-ranging issues related to the task of extracting implicit author-related information from the textual content, including novel tasks and datasets. We will also discuss the ethical implications of implicit information extraction. In addition, we announce a shared task focused on automatically determining the literary epochs of written books.

2023

Clinical model for Hereditary Transthyretin Amyloidosis age of onset prediction

Authors
Pedroto, M; Coelho, T; Jorge, A; Mendes Moreira, J;

Publication
FRONTIERS IN NEUROLOGY

Abstract
IntroductionHereditary transthyretin amyloidosis (ATTRv amyloidosis) is a rare neurological hereditary disease clinically characterized as severe, progressive, and life-threatening while the age of onset represents the moment in time when the first symptoms are felt. In this study, we present and discuss our results on the study, development, and evaluation of an approach that allows for time-to-event prediction of the age of onset, while focusing on genealogical feature construction. Materials and methodsThis research was triggered by the need to answer the medical problem of when will an asymptomatic ATTRv patient show symptoms of the disease. To do so, we defined and studied the impact of 77 features (ranging from demographic and genealogical to familial disease history) we studied and compared a pool of prediction algorithms, namely, linear regression (LR), elastic net (EN), lasso (LA), ridge (RI), support vector machines (SV), decision tree (DT), random forest (RF), and XGboost (XG), both in a classification as well as a regression setting; we assembled a baseline (BL) which corresponds to the current medical knowledge of the disease; we studied the problem of predicting the age of onset of ATTRv patients; we assessed the viability of predicting age of onset on short term horizons, with a classification framing, on localized sets of patients (currently symptomatic and asymptomatic carriers, with and without genealogical information); and we compared the results with an out-of-bag evaluation set and assembled in a different time-frame than the original data in order to account for data leakage. ResultsCurrently, we observe that our approach outperforms the BL model, which follows a set of clinical heuristics and represents current medical practice. Overall, our results show the supremacy of SV and XG for both the prediction tasks although impacted by data characteristics, namely, the existence of missing values, complex data, and small-sized available inputs. DiscussionWith this study, we defined a predictive model approach capable to be well-understood by medical professionals, compared with the current practice, namely, the baseline approach (BL), and successfully showed the improvement achieved to the current medical knowledge.

  • 63
  • 515