Publicacoes - INESC TEC

Publicações

Publicações por Alípio Jorge

2022

Preface to the special issue on dynamic recommender systems and user models

Autores
Vinagre, J; Jorge, AM; Al Ghossein, M; Bifet, A; Cremonesi, P;

Publicação
USER MODELING AND USER-ADAPTED INTERACTION

Abstract

2022

The robustness of Random Forest and Support Vector Machine Algorithms to a Faulty Heart Sound Segmentation

Autores
Oliveira, J; Nogueira, DM; Ferreira, C; Jorge, AM; Coimbra, MT;

Publicação
EMBC

Abstract
Cardiac auscultation is the key exam to screen cardiac diseases both in developed and developing countries. A heart sound auscultation procedure can detect the presence of murmurs and point to a diagnosis, thus it is an important first-line assessment and also cost-effective tool. The design automatic recommendation systems based on heart sound auscultation can play an important role in boosting the accuracy and the pervasiveness of screening tools. One such as step, consists in detecting the fundamental heart sound states, a process known as segmentation. A faulty segmentation or a wrong estimation of the heart rate might result in an incapability of heart sound classifiers to detect abnormal waves, such as murmurs. In the process of understanding the impact of a faulty segmentation, several common heart sound segmentation errors are studied in detail, namely those where the heart rate is badly estimated and those where S1/S2 and Systolic/Diastolic states are swapped in comparison with the ground truth state sequence. From the tested algorithms, support vector machine (SVMs) and random forest (RFs) shown to be more sensitive to a wrong estimation of the heart rate (an expected drop of 6% and 8% on the overall performance, respectively) than to a swap in the state sequence of events (an expected drop of 1.9% and 4.6%, respectively).

FecharLer Abstract

2022

Can Multi-channel Heart Sounds Analysis improve Murmur Detection?

Autores
Nogueira, M; Oliveira, J; Ferreira, CG; Coimbra, MT; Jorge, AM;

Publicação
2022 IEEE-EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS (BHI) JOINTLY ORGANISED WITH THE IEEE-EMBS INTERNATIONAL CONFERENCE ON WEARABLE AND IMPLANTABLE BODY SENSOR NETWORKS (BSN'22)

Abstract
Cardiac auscultation is still the most cost-effective screening procedure for cardiovascular diseases. The development of computer assisted methods can empower a large variety of health professionals and thus enable mass cardiac health low-cost screening. The procedure for correct cardiac auscultation includes listening to the heart sounds of the four main auscultation spots. Until recently, attempts to develop automatic heart sound analysis methods that explore the multi-channel richness of a real auscultation, were very difficult due to the lack of adequate public datasets. In this work, we use the CirCor Dataset which is characterized by the existence of more than one heart sound per patient (each patient has heart sounds collected at different auscultation spots). Using this dataset, we evaluate and quantify the comparative impact of using a single or a multichannel approach. A single channel approach uses the sound from a single auscultation spot, whereas a multi-channel approach uses four auscultation spots in an asynchronous way. From the different classifiers tested, models that use four auscultation spots achieved a higher overall performance than those that search for abnormalities in a single heart sound spot. Our best result is a multi-channel SVM that analyzes four auscultation spots, with an overall performance of 87,4 %. This opens the path to future research using a multi-channel approach.

FecharLer Abstract

2022

NaijaSenti: A Nigerian Twitter Sentiment Corpus for Multilingual Sentiment Analysis

Autores
Muhammad, SH; Adelani, DI; Ruder, S; Ahmad, IS; Abdulmumin, I; Bello, BS; Choudhury, M; Emezue, CC; Abdullahi, SS; Aremu, A; Jorge, A; Brazdil, P;

Publicação
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION

Abstract
Sentiment analysis is one of the most widely studied applications in NLP, but most work focuses on languages with large amounts of data. We introduce the first large-scale human-annotated Twitter sentiment dataset for the four most widely spoken languages in Nigeria-Hausa, Igbo, Nigerian-Pidgin, and Yoruba-consisting of around 30,000 annotated tweets per language, including a significant fraction of code-mixed tweets. We propose text collection, filtering, processing, and labeling methods that enable us to create datasets for these low-resource languages. We evaluate a range of pre-trained models and transfer strategies on the dataset. We find that language-specific models and language-adaptive fine-tuning generally perform best. We release the datasets, trained models, sentiment lexicons, and code to incentivize research on sentiment analysis in under-represented languages.

FecharLer Abstract

2026

Resilience Under Attack: Benchmarking Optimizers Against Poisoning in Federated Learning for Image Classification Using CNN

Autores
Biadgligne, Y; Baghoussi, Y; Li, K; Jorge, A;

Publicação
ADVANCES IN COMPUTATIONAL INTELLIGENCE, IWANN 2025, PT I

Abstract
Federated Learning (FL) enables decentralized model training while preserving data privacy but remains susceptible to poisoning attacks. Malicious clients can manipulate local data or model updates, threatening FL's reliability, especially in privacy-sensitive domains like healthcare and finance. While client-side optimization algorithms play a crucial role in training local models, their resilience to such attacks is underexplored. This study empirically evaluates the robustness of three widely used optimization algorithms: SGD, Adam, and RMSProp-against label-flipping attacks (LFAs) in image classification tasks using Convolutional Neural Networks (CNNs). Through 900 individual runs in both federated and centralized learning (CL) settings, we analyze their performance under Independent and Identically Distributed (IID) and Non-IID data distributions. Results reveal that SGD is the most resilient, achieving the highest accuracy in 87% of cases, while Adam performs best in 13%. Additionally, centralized models outperform FL on CIFAR-10, whereas FL excels on Fashion-MNIST, highlighting the impact of dataset characteristics on adversarial robustness.

FecharLer Abstract

2026

Knowledge-Aware Clinical Narrative Extraction Using Ontologies and Knowledge Graphs

Autores
Leite, M; Rb Silva, R; Guimaraes, N; Stork, L; Jorge, A;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2025, PT I

Abstract
Providing healthcare professionals with quick access to structured standardized information enables comprehensive analysis and improves clinical decision-making. However, an important part of the records in health institutions is in the form of free text. This paper proposes a pipeline that automatically extracts medical information from Electronic Medical Records (EMRs), based on large language models (LLMs) and a domain ontology defined and validated in collaboration with a medical expert. The output is a knowledge graph of clinical narratives that can be used to search through repositories of EMRs or discover new facts. We showcase our approach on a set of Portuguese clinical texts of cases of Acute Myeloid Leukemia (AML) guided by one medical expert. We evaluate the quality of the extraction and of the knowledge graph.

FecharLer Abstract