Publications

Publications by CTM

2025

SiameseOrdinalCLIP: A Language-Guided Siamese Network for the Aesthetic Evaluation of Breast Cancer Locoregional Treatment

Authors
Teixeira, LF; Montenegro, H; Bonci, E; Cardoso, MJ; Cardoso, JS;

Publication
Artificial Intelligence and Imaging for Diagnostic and Treatment Challenges in Breast Care - Second Deep Breast Workshop, Deep-Breath 2025, Held in Conjunction with MICCAI 2025, Daejeon, South Korea, September 23, 2025, Proceedings

Abstract
Breast cancer locoregional treatment includes a wide variety of procedures with diverse aesthetic outcomes. The aesthetic assessment of such procedures is typically subjective, hindering the fair comparison between their outcomes, and consequently restricting evidence-based improvements. Most objective evaluation tools were developed for conservative surgery, focusing on asymmetries while ignoring other relevant traits. To overcome these limitations, we propose SiameseOrdinalCLIP, an ordinal classification network based on image-text matching and pairwise ranking optimisation for the aesthetic evaluation of breast cancer treatment. Furthermore, we integrate a concept bottleneck module into the network for increased explainability. Experiments on a private dataset show that the proposed model surpasses the state-of-the-art aesthetic evaluation and ordinal classification networks. © 2025 Elsevier B.V., All rights reserved.

CloseRead Abstract

2025

Leveraging Adversarial Learning for Pathological Fidelity in Virtual Staining

Authors
Teixeira, J; Klöckner, P; Montezuma, D; Cesur, ME; Fraga, J; Horlings, HM; Cardoso, JS; de Oliveira, SP;

Publication
Deep Generative Models - 5th MICCAI Workshop, DGM4MICCAI 2025, Held in Conjunction with MICCAI 2025, Daejeon, South Korea, September 23, 2025, Proceedings

Abstract
In addition to evaluating tumor morphology using H&E staining, immunohistochemistry is used to assess the presence of specific proteins within the tissue. However, this is a costly and labor-intensive technique, for which virtual staining, as an image-to-image translation task, offers a promising alternative. Although recent, this is an emerging field of research with 64% of published studies just in 2024. Most studies use publicly available datasets of H&E-IHC pairs from consecutive tissue sections. Recognizing the training challenges, many authors develop complex virtual staining models based on conditional Generative Adversarial Networks but ignore the impact of adversarial loss on the quality of virtual staining. Furthermore, overlooking the issues of model evaluation, they claim improved performance based on metrics such as SSIM and PSNR, which are not sufficiently robust to evaluate the quality of virtually stained images. In this paper, we developed CSSP2P GAN, which we demonstrate to achieve heightened pathological fidelity through a blind pathological expert evaluation. Furthermore, while iteratively developing our model, we study the impact of the adversarial loss and demonstrate its crucial role in the quality of virtually stained images. Finally, while comparing our model with reference works in the field, we underscore the limitations of the currently used evaluation metrics and demonstrate the superior performance of CSSP2P GAN. © 2025 Elsevier B.V., All rights reserved.

CloseRead Abstract

2025

Towards Utilizing Robust Radiance Fields for 3D Reconstruction of Breast Aesthetics

Authors
Pinto, G; Zolfagharnasab, MH; Teixeira, LF; Cruz, H; Cardoso, MJ; Cardoso, JS;

Abstract
3D models are crucial in predicting aesthetic outcomes in breast reconstruction, supporting personalized surgical planning, and improving patient communication. In response to this necessity, this is the first application of Radiance Fields to 3D breast reconstruction. Building on this, the work compares six SoTA 3D reconstruction models. It introduces a novel variant tailored to medical contexts: Depth-Splatfacto, designed to improve denoising and geometric consistency through pseudo-depth supervision. Additionally, we extended model training to grayscale, which enhances robustness under grayscale-only input constraints. Experiments on a breast cancer patient dataset demonstrate that Splatfacto consistently outperforms others, delivering the highest reconstruction quality (PSNR 27.11, SSIM 0.942) and the fastest training times (×1.3 faster at 200k iterations). At the same time, the depth-enhanced variant offers an efficient and stable alternative with minimal fidelity loss. The grayscale train improves speed by ×1.6 with a PSNR drop of 0.70. Depth-Splatfacto further improves robustness, reducing PSNR variance by 10% and making images less blurry across test cases. These results establish a foundation for future clinical applications, supporting personalized surgical planning and improved patient-doctor communication. © 2025 Elsevier B.V., All rights reserved.

CloseRead Abstract

2025

Multimodal PointPillars for Efficient Object Detection in Autonomous Vehicles

Authors
Oliveira, M; Cerqueira, R; Pinto, JR; Fonseca, J; Teixeira, LF;

Publication
IEEE Trans. Intell. Veh.

Abstract
Autonomous Vehicles aim to understand their surrounding environment by detecting relevant objects in the scene, which can be performed using a combination of sensors. The accurate prediction of pedestrians is a particularly challenging task, since the existing algorithms have more difficulty detecting small objects. This work studies and addresses this often overlooked problem by proposing Multimodal PointPillars (M-PP), a fast and effective novel fusion architecture for 3D object detection. Inspired by both MVX-Net and PointPillars, image features from a 2D CNN-based feature map are fused with the 3D point cloud in an early fusion architecture. By changing the heavy 3D convolutions of MVX-Net to a set of convolutional layers in 2D space, along with combining LiDAR and image information at an early stage, M-PP considerably improves inference time over the baseline, running at 28.49 Hz. It achieves inference speeds suitable for real-world applications while keeping the high performance of multimodal approaches. Extensive experiments show that our proposed architecture outperforms both MVX-Net and PointPillars for the pedestrian class in the KITTI 3D object detection dataset, with 62.78% in $AP_{BEV}$ (moderate difficulty), while also outperforming MVX-Net in the nuScenes dataset. Moreover, experiments were conducted to measure the detection performance based on object distance. The performance of M-PP surpassed other methods in pedestrian detection at any distance, particularly for faraway objects (more than 30 meters). Qualitative analysis shows that M-PP visibly outperformed MVX-Net for pedestrians and cyclists, while simultaneously making accurate predictions of cars.

CloseRead Abstract

2025

Markerless multi-view 3D human pose estimation: A survey

Authors
Nogueira, AFR; Oliveira, HP; Teixeira, LF;

Publication
IMAGE AND VISION COMPUTING

Abstract
3D human pose estimation aims to reconstruct the human skeleton of all the individuals in a scene by detecting several body joints. The creation of accurate and efficient methods is required for several real-world applications including animation, human-robot interaction, surveillance systems or sports, among many others. However, several obstacles such as occlusions, random camera perspectives, or the scarcity of 3D labelled data, have been hampering the models' performance and limiting their deployment in real-world scenarios. The higher availability of cameras has led researchers to explore multi-view solutions due to the advantage of being able to exploit different perspectives to reconstruct the pose. Most existing reviews focus mainly on monocular 3D human pose estimation and a comprehensive survey only on multi-view approaches to determine the 3D pose has been missing since 2012. Thus, the goal of this survey is to fill that gap and present an overview of the methodologies related to 3D pose estimation in multi-view settings, understand what were the strategies found to address the various challenges and also, identify their limitations. According to the reviewed articles, it was possible to find that most methods are fully-supervised approaches based on geometric constraints. Nonetheless, most of the methods suffer from 2D pose mismatches, to which the incorporation of temporal consistency and depth information have been suggested to reduce the impact of this limitation, besides working directly with 3D features can completely surpass this problem but at the expense of higher computational complexity. Models with lower supervision levels were identified to overcome some of the issues related to 3D pose, particularly the scarcity of labelled datasets. Therefore, no method is yet capable of solving all the challenges associated with the reconstruction of the 3D pose. Due to the existing trade-off between complexity and performance, the best method depends on the application scenario. Therefore, further research is still required to develop an approach capable of quickly inferring a highly accurate 3D pose with bearable computation cost. To this goal, techniques such as active learning, methods that learn with a low level of supervision, the incorporation of temporal consistency, view selection, estimation of depth information and multi-modal approaches might be interesting strategies to keep in mind when developing a new methodology to solve this task.

CloseRead Abstract

2025

A two-step concept-based approach for enhanced interpretability and trust in skin lesion diagnosis

Authors
Patrício, C; Teixeira, LF; Neves, JC;

Publication
COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL

Abstract
The main challenges hindering the adoption of deep learning-based systems in clinical settings are the scarcity of annotated data and the lack of interpretability and trust in these systems. Concept Bottleneck Models (CBMs) offer inherent interpretability by constraining the final disease prediction on a set of human-understandable concepts. However, this inherent interpretability comes at the cost of greater annotation burden. Additionally, adding new concepts requires retraining the entire system. In this work, we introduce a novel two-step methodology that addresses both of these challenges. By simulating the two stages of a CBM, we utilize a pretrained Vision Language Model (VLM) to automatically predict clinical concepts, and an off-the-shelf Large Language Model (LLM) to generate disease diagnoses grounded on the predicted concepts. Furthermore, our approach supports test-time human intervention, enabling corrections to predicted concepts, which improves final diagnoses and enhances transparency in decision-making. We validate our approach on three skin lesion datasets, demonstrating that it outperforms traditional CBMs and state-of-the-art explainable methods, all without requiring any training and utilizing only a few annotated examples. The code is available at https://github.com/CristianoPatricio/2step-concept-based-skin-diagnosis.

CloseRead Abstract