Publicacoes - INESC TEC

Publicações

Publicações por Hélder Filipe Oliveira

2019

Computer Aided Detection of Deep Inferior Epigastric Perforators in Computed Tomography Angiography scans

Autores
Araújo, RJ; Garrido, V; Baraças, CA; Vasconcelos, MA; Mavioso, C; Anacleto, JC; Cardoso, MJ; Oliveira, HP;

Publicação
CoRR

Abstract

2021

Topological Similarity Index and Loss Function for Blood Vessel Segmentation

Autores
Araújo, RJ; Cardoso, JS; Oliveira, HP;

Publicação
CoRR

Abstract

2025

From Pixels to Pathways: AI-Based Approaches for Multimodal Lung Cancer Classification

Autores
Gonçalves, S; Sousa, JV; Gouveia, M; Amaro, M; Oliveira, HP; Pereira, T;

Publicação
BIBM

Abstract
Lung cancer remains the leading cause of cancer related deaths globally, responsible for approximately 1.8 million deaths each year. A key contributor to this high mortality rate is the late-stage diagnosis of the disease, underscoring the urgent need for effective early detection strategies. Low-dose computed tomography (CT) has shown great value in early screening, particularly when paired with clinical information. Clinical data, while valuable, lacks spatial and morphological insights essential for comprehensive evaluation. Combining both modalities offers a more holistic approach for lung cancer classification. This study presents AI-based methods for lung cancer classification using unimodal approaches - structured clinical data and chest CT imaging - alongside a novel multimodal deep learning framework that integrates both data types to classify lung nodules as malignant or benign. For the clinical modality, machine learning models including logistic regression, random forests, LightGBM, XGBoost, and multilayer perceptrons were evaluated with extensive hyperparameter tuning. In the imaging modality, ResNet18 and ResNet34 convolutional neural networks were used, with and without data augmentation. The study explored both intermediate and late fusion strategies to combine modality-specific representations. Results show that multimodal models consistently outperformed their unimodal counterparts, achieving a best-case area under the ROC curve (AUC) of 0.9138, with an accuracy of 0.8424 and an F1-score of 0.8422. These findings highlight the complementary strengths of imaging and clinical data and support the growing potential of multimodal deep learning in improving diagnostic accuracy in lung cancer classification. © 2025 IEEE.

FecharLer Abstract

2025

ROBUST VISUAL TRANSFORMERS FOR MEDICAL IMAGE CLASSIFICATION

Autores
Montrezol, J; Oliveira, HS; Araujo, J; Oliveira, HP;

Publicação
2025 47TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC)

Abstract
The Vision Transformer (ViT) architecture has emerged as a potential game-changer in computer vision, offering scalability and global attention that have generated considerable interest in recent years. Its adaptability has fueled enthusiasm for its application. This work investigates the boundaries of the architecture, focusing on developing new techniques targeting explicitly complex tasks, such as medical imaging datasets, which often exhibit high variability, class imbalance, and limited sample sizes. We propose a set of mixed regularisation and augmentation techniques to enhance the performance of models. These include a novel loss function and a smoothly differentiable activation function, leading to more stable training and model performance. The results show that incorporating these techniques improves model performance and training convergence.

FecharLer Abstract

2026

Decoding vision transformer variations for image classification: A guide to performance and usability

Autores
Montrezol, J; Oliveira, HS; Oliveira, HP;

Publicação
MACHINE LEARNING WITH APPLICATIONS

Abstract
With the rise of Transformers, Vision Transformers (ViTs) have become a new standard in visual recognition. This has led to the development of numerous architectures with diverse designs and applications. This survey identifies 22 key ViT and hybrid CNN-ViT models, along with 5 top Convolutional Neural Network (CNN) models. These were selected based on their new architecture, relevance to benchmarks, and overall impact. The models are organised using a defined taxonomy formed by CNN-based, pure Transformer-based, and hybrid architectures. We analyse their main components, training methods, and computational features, while assessing performance using reported results on standard benchmarks such as ImageNet and CIFAR, along with our training and fine-tuning evaluations on specific imaging datasets. In addition to accuracy, we look at real-world deployment issues by analysing the trade-offs between accuracy and efficiency in embedded, mobile, and clinical settings. The results indicate that modern CNNs are still very competitive in limited-resource environments, while advanced ViT variants perform well after large-scale pretraining, especially in areas with high variability. Hybrid CNN-ViT architectures, on the other hand, tend to offer the best balance between accuracy, data efficiency, and computational cost. This survey establishes a consolidated benchmark and reference framework for understanding the evolution, capabilities, and practical applicability of contemporary vision architectures.

FecharLer Abstract

2026

Pattern Recognition and Image Analysis

Autores
Gonçalves, N; Oliveira, HP; Sánchez, JA;

Publicação
Lecture Notes in Computer Science

Abstract