Publications

Publications by Alípio Jorge

2023

Combining Symbolic and Deep Learning Approaches for Sentiment Analysis

Authors
Muhammad, SH; Brazdil, P; Jorge, A;

Publication
Compendium of Neurosymbolic Artificial Intelligence

Abstract
Deep learning approaches have become popular in sentiment analysis because of their competitive performance. The downside of this approach is that they do not provide understandable explanations on how the sentiment values are calculated. Previous approaches that used sentiment lexicons for sentiment analysis can do that, but their performance is lower than deep learning approaches. Therefore, it is natural to wonder if the two approaches can be combined to exploit their advantages. In this chapter, we present a neuro-symbolic approach that combines both symbolic and deep learning approaches for sentiment analysis tasks. The symbolic approach exploits sentiment lexicon and shifter patterns-which cover the operations of inversion/reversal, intensification, and attenuation/downtoning. The deep learning approach used a pre-trained language model (PLM) to construct sentiment lexicon. Our experimental result shows that the proposed approach leads to promising results, substantially better than the results of a pure lexicon-based approach. Although the results did not reach the level of the deep learning approach, a great advantage is that sentiment prediction can be accompanied by understandable explanations. For some users, it is very important to see how sentiment is derived, even if performance is a little lower.

CloseRead Abstract

2023

The 1st International Workshop on Implicit Author Characterization from Texts for Search and Retrieval (IACT'23)

Authors
Litvak, M; Rabaev, I; Campos, R; Jorge, AM; Jatowt, A;

Publication
PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023

Abstract
The first edition of the Implicit Author Characterization from Texts for Search and Retrieval (IACT'23) aims at bringing to the forefront the challenges involved in identifying and extracting from texts implicit information about authors (e.g., human or AI) and using it in IR tasks. The IACT workshop provides a common forum to consolidate multi-disciplinary efforts and foster discussions to identify the wide-ranging issues related to the task of extracting implicit author-related information from the textual content, including novel tasks and datasets. We will also discuss the ethical implications of implicit information extraction. In addition, we announce a shared task focused on automatically determining the literary epochs of written books.

CloseRead Abstract

2023

Clinical model for Hereditary Transthyretin Amyloidosis age of onset prediction

Authors
Pedroto, M; Coelho, T; Jorge, A; Mendes Moreira, J;

Publication
FRONTIERS IN NEUROLOGY

Abstract
IntroductionHereditary transthyretin amyloidosis (ATTRv amyloidosis) is a rare neurological hereditary disease clinically characterized as severe, progressive, and life-threatening while the age of onset represents the moment in time when the first symptoms are felt. In this study, we present and discuss our results on the study, development, and evaluation of an approach that allows for time-to-event prediction of the age of onset, while focusing on genealogical feature construction. Materials and methodsThis research was triggered by the need to answer the medical problem of when will an asymptomatic ATTRv patient show symptoms of the disease. To do so, we defined and studied the impact of 77 features (ranging from demographic and genealogical to familial disease history) we studied and compared a pool of prediction algorithms, namely, linear regression (LR), elastic net (EN), lasso (LA), ridge (RI), support vector machines (SV), decision tree (DT), random forest (RF), and XGboost (XG), both in a classification as well as a regression setting; we assembled a baseline (BL) which corresponds to the current medical knowledge of the disease; we studied the problem of predicting the age of onset of ATTRv patients; we assessed the viability of predicting age of onset on short term horizons, with a classification framing, on localized sets of patients (currently symptomatic and asymptomatic carriers, with and without genealogical information); and we compared the results with an out-of-bag evaluation set and assembled in a different time-frame than the original data in order to account for data leakage. ResultsCurrently, we observe that our approach outperforms the BL model, which follows a set of clinical heuristics and represents current medical practice. Overall, our results show the supremacy of SV and XG for both the prediction tasks although impacted by data characteristics, namely, the existence of missing values, complex data, and small-sized available inputs. DiscussionWith this study, we defined a predictive model approach capable to be well-understood by medical professionals, compared with the current practice, namely, the baseline approach (BL), and successfully showed the improvement achieved to the current medical knowledge.

CloseRead Abstract

2009

Analysis and forecast of team formations in the simulated robotic soccer domain using Weka classification methodologies [Análise e previsão das formações das equipas no domínio do futebol robótico simulado utilizando metodologias de classificação no weka]

Authors
Almeida, R; Reis, LP; Jorge, AM;

Publication
Actas da 4a Conferencia Iberica de Sistemas e Tecnologias de Informacao, CISTI 2009

Abstract

1999

Iterative Part-of-Speech Tagging

Authors
Jorge, A; Andrade Lopes, Ad;

Publication
Learning Language in Logic

Abstract
Assigning a category to a given word (tagging) depends on the particular word and on the categories (tags) of neighboring words. A theory that is able to assign tags to a given text can naturally be viewed as a recursive logic program. This article describes how iterative induction, a technique that has been proven powerful in the synthesis of recursive logic programs, has been applied to the task of part-of-speech tagging. The main strategy consists of inducing a succession T1, T2,…, Tn of theories, using in the induction of theory Ti all the previously induced theories. Each theory in the sequence may have lexical rules, context rules and hybrid ones. This iterative strategy is, to a large extent, independent of the inductive algorithm underneath. Here we consider one particular relational learning algorithm, CSC(RC), and we induce first order theories from positive examples and background knowledge that are able to successfully tag a relatively large corpus in Portuguese. © Springer-Verlag Berlin Heidelberg 2000.

CloseRead Abstract

2011

Identification of rib boundaries in chest x-ray images using elliptical models

Authors
Brás, L; Jorge, AM; Gomes, EF; Duarte, R;

Publication
Technology and Medical Sciences - TMSi 2010

Abstract
We are developing a new method for the identification of rib boundaries in chest x-ray images. The identification of rib boundaries is important for radiologist diagnosis of lung diseases as TB. The radiologists use the ribs as reference for location and can be used to eliminate false positives in the detection of abnormalities. Our method automatically identifies rib boundaries from raw images through a sequence of steps using a combination of image processing techniques. Radiographs are still very relevant in practice because in Portugal and many other countries it is the first step for TB detection. We have access a large database of x-ray images provided by the pneumological screening centre (CDP) of Vila Nova de Gaia, in Portugal.

CloseRead Abstract