Publications

Publications by CTM

2026

Optimizing Medical Image Captioning with Conditional Prompt Encoding

Authors
Fernandes, RF; Oliveira, HS; Ribeiro, PP; Oliveira, HP;

Publication
PATTERN RECOGNITION AND IMAGE ANALYSIS, IBPRIA 2025, PT II

Abstract
Medical image captioning is an essential tool to produce descriptive text reports of medical images. One of the central problems of medical image captioning is their poor domain description generation because large pre-trained language models are primarily trained in non-medical text domains with different semantics of medical text. To overcome this limitation, we explore improvements in contrastive learning for X-ray images complemented with soft prompt engineering for medical image captioning and conditional text decoding for caption generation. The main objective is to develop a softprompt model to improve the accuracy and clinical relevance of the automatically generated captions while guaranteeing their complete linguistic accuracy without corrupting the models' performance. Experiments on the MIMIC-CXR and ROCO datasets showed that the inclusion of tailored soft-prompts improved accuracy and efficiency, while ensuring a more cohesive medical context for captions, aiding medical diagnosis and encouraging more accurate reporting.

CloseRead Abstract

2026

Decoding vision transformer variations for image classification: A guide to performance and usability

Authors
Montrezol, J; Oliveira, HS; Oliveira, HP;

Publication
MACHINE LEARNING WITH APPLICATIONS

Abstract
With the rise of Transformers, Vision Transformers (ViTs) have become a new standard in visual recognition. This has led to the development of numerous architectures with diverse designs and applications. This survey identifies 22 key ViT and hybrid CNN-ViT models, along with 5 top Convolutional Neural Network (CNN) models. These were selected based on their new architecture, relevance to benchmarks, and overall impact. The models are organised using a defined taxonomy formed by CNN-based, pure Transformer-based, and hybrid architectures. We analyse their main components, training methods, and computational features, while assessing performance using reported results on standard benchmarks such as ImageNet and CIFAR, along with our training and fine-tuning evaluations on specific imaging datasets. In addition to accuracy, we look at real-world deployment issues by analysing the trade-offs between accuracy and efficiency in embedded, mobile, and clinical settings. The results indicate that modern CNNs are still very competitive in limited-resource environments, while advanced ViT variants perform well after large-scale pretraining, especially in areas with high variability. Hybrid CNN-ViT architectures, on the other hand, tend to offer the best balance between accuracy, data efficiency, and computational cost. This survey establishes a consolidated benchmark and reference framework for understanding the evolution, capabilities, and practical applicability of contemporary vision architectures.

CloseRead Abstract

2026

Pattern Recognition and Image Analysis

Authors
Gonçalves, N; Oliveira, HP; Sánchez, JA;

Publication
Lecture Notes in Computer Science

Abstract

2026

Pattern Recognition and Image Analysis - 12th Iberian Conference, IbPRIA 2025, Coimbra, Portugal, June 30 - July 3, 2025, Proceedings, Part II

Authors
Gonçalves, N; Oliveira, HP; Sánchez, JA;

Publication
IbPRIA (2)

Abstract

2026

Pattern Recognition and Image Analysis - 12th Iberian Conference, IbPRIA 2025, Coimbra, Portugal, June 30 - July 3, 2025, Proceedings, Part I

Authors
Gonçalves, N; Oliveira, HP; Sánchez, JA;

Publication
IbPRIA (1)

Abstract

2026

HUydra: Full-Range Lung CT Synthesis via Multiple HU Interval Generative Modelling

Authors
Cardoso, A; Sousa, P; Pereira, T; Oliveira, HP;

Publication
CoRR

Abstract