Publicacoes - INESC TEC

Publicações

Publicações por CTM

2025

End-to-End Occluded Person Re-Identification With Artificial Occlusion Generation

Autores
Capozzi, L; Cardoso, JS; Rebelo, A;

Publicação
IEEE ACCESS

Abstract
In recent years, the task of person re-identification (Re-ID) has improved considerably with the advances in deep learning methodologies. However, occluded person Re-ID remains a challenging task, as parts of the body of the individual are frequently hidden by various objects, obstacles, or other people, making the identification process more difficult. To address these issues, we introduce a novel data augmentation strategy using artificial occlusions, consisting of random shapes and objects from a small image dataset that was created. We also propose an end-to-end methodology for occluded person Re-ID, which consists of three branches: a global branch, a feature dropping branch, and an occlusion detection branch. Experimental results show that the use of random shape occlusions is superior to random erasing using our architecture. Results on six datasets consisting of three tasks (holistic, partial and occluded person Re-ID) demonstrate that our method performs favourably against state-of-the-art methodologies.

FecharLer Abstract

2025

Disentanglement and Assessment of Shortcuts in Ophthalmological Retinal Imaging Exams

Autores
Fernandes, L; Gonçalves, T; Matos, J; Nakayama, LF; Cardoso, JS;

Publicação
Fairness of AI in Medical Imaging - Third International Workshop, FAIMI 2025, Held in Conjunction with MICCAI 2025, Daejeon, South Korea, September 23, 2025, Proceedings

Abstract
Diabetic retinopathy (DR) is a leading cause of vision loss in working-age adults. While screening reduces the risk of blindness, traditional imaging is often costly and inaccessible. Artificial intelligence (AI) algorithms present a scalable diagnostic solution, but concerns regarding fairness and generalization persist. This work evaluates the fairness and performance of image-trained models in DR prediction, as well as the impact of disentanglement as a bias mitigation technique, using the diverse mBRSET fundus dataset. Three models, ConvNeXt V2, DINOv2, and Swin V2, were trained on macula images to predict DR and sensitive attributes (SAs) (e.g., age and gender/sex). Fairness was assessed between subgroups of SAs, and disentanglement was applied to reduce bias. All models achieved high DR prediction performance in diagnosing (up to 94% AUROC) and could reasonably predict age and gender/sex (91% and 77% AUROC, respectively). Fairness assessment suggests disparities, such as a 10% AUROC gap between age groups in DINOv2. Disentangling SAs from DR prediction had varying results, depending on the model selected. Disentanglement improved DINOv2 performance (2% AUROC gain), but led to performance drops in ConvNeXt V2 and Swin V2 (7% and 3%, respectively). These findings highlight the complexity of disentangling fine-grained features in fundus imaging and emphasize the importance of fairness in medical imaging AI to ensure equitable and reliable healthcare solutions. © 2025 Elsevier B.V., All rights reserved.

FecharLer Abstract

2025

Deciphering the Silent Signals: Unveiling Frequency Importance for Wi-Fi-Based Human Pose Estimation with Explainability

Autores
Capozzi, L; Ferreira, L; Gonçalves, T; Rebelo, A; Cardoso, JS; Sequeira, AF;

Publicação
Pattern Recognition and Image Analysis - 12th Iberian Conference, IbPRIA 2025, Coimbra, Portugal, June 30 - July 3, 2025, Proceedings, Part II

Abstract
The rapid advancement of wireless technologies, particularly Wi-Fi, has spurred significant research into indoor human activity detection across various domains (e.g., healthcare, security, and industry). This work explores the non-invasive and cost-effective Wi-Fi paradigm and the application of deep learning for human activity recognition using Wi-Fi signals. Focusing on the challenges in machine interpretability, motivated by the increase in data availability and computational power, this paper uses explainable artificial intelligence to understand the inner workings of transformer-based deep neural networks designed to estimate human pose (i.e., human skeleton key points) from Wi-Fi channel state information. Using different strategies to assess the most relevant sub-carriers (i.e., rollout attention and masking attention) for the model predictions, we evaluate the performance of the model when it uses a given number of sub-carriers as input, selected randomly or by ascending (high-attention) or descending (low-attention) order. We concluded that the models trained with fewer (but relevant) sub-carriers are competitive with the baseline (trained with all sub-carriers) but better in terms of computational efficiency (i.e., processing more data per second). © 2025 Elsevier B.V., All rights reserved.

FecharLer Abstract

2025

Predicting Aesthetic Outcomes of Breast Cancer Surgery: A Robust and Explainable Image Retrieval Approach

Autores
Ferreira, P; Zolfagharnasab, MH; Gonçalves, T; Bonci, E; Mavioso, C; Cardoso, MJ; Cardoso, JS;

Publicação
Artificial Intelligence and Imaging for Diagnostic and Treatment Challenges in Breast Care - Second Deep Breast Workshop, Deep-Breath 2025, Held in Conjunction with MICCAI 2025, Daejeon, South Korea, September 23, 2025, Proceedings

Abstract
Accurate retrieval of post-surgical images plays a critical role in surgical planning for breast cancer patients. However, current content-based image retrieval methods face challenges related to limited interpretability, poor robustness to image noise, and reduced generalization across clinical settings. To address these limitations, we propose a multistage retrieval pipeline integrating saliency-based explainability, noise-reducing image pre-processing, and ensemble learning. Evaluated on a dataset of post-operative breast cancer patient images, our approach achieves contrastive accuracy of 77.67% for Excellent/Good and 84.98% for Fair/Poor outcomes, surpassing prior studies by 8.37% and 11.80%, respectively. Explainability analysis provided essential insight by showing that feature extractors often attend to irrelevant regions, thereby motivating targeted input refinement. Ablations show that expanded bounding box inputs improve performance over original images, with gains of 0.78% and 0.65% contrastive accuracy for Excellent/Good and Fair/Poor, respectively. In contrast, the use of segmented images leads to a performance drop (1.33% and 1.65%) due to the loss of contextual cues. Furthermore, ensemble learning yielded additional gains of 0.89% and 3.60% over the best-performing single-model baselines. These findings underscore the importance of targeted input refinement and ensemble integration for robust and generalizable image retrieval systems. © 2025 Elsevier B.V., All rights reserved.

FecharLer Abstract

2025

Towards Robust Breast Segmentation: Leveraging Depth Awareness and Convexity Optimization For Tackling Data Scarcity

Autores
Zolfagharnasab, MH; Gonalves, T; Ferreira, P; Cardoso, MJ; Cardoso, JS;

Abstract
Breast segmentation has a critical role for objective pre and postoperative aesthetic evaluation but challenged by limited data (privacy concerns), class imbalance, and anatomical variability. As a response to the noted obstacles, we introduce an encoder–decoder framework with a Segment Anything Model (SAM) backbone, enhanced with synthetic depth maps and a multiterm loss combining weighted crossentropy, convexity, and depth alignment constraints. Evaluated on a 120patient dataset split into 70% training, 10% validation, and 20% testing, our approach achieves a balanced test dice score of 98.75%—a 4.5% improvement over prior methods—with dice of 95.5% (breast) and 89.2% (nipple). Ablations show depth injection reduces noise and focuses on anatomical regions, yielding dice gains of 0.47% (body) and 1.04% (breast). Geometric alignment increases convexity by almost 3% up to 99.86%, enhancing geometric plausibility of the nipple masks. Lastly, crossdataset evaluation on CINDERELLA samples demonstrates robust generalization, with small performance gain primarily attributable to differences in annotation styles. © 2025 Elsevier B.V., All rights reserved.

FecharLer Abstract

2025

Anatomically and Clinically Informed Deep Generative Model for Breast Surgery Outcome Prediction

Autores
Santos, J; Montenegro, H; Bonci, E; Cardoso, MJ; Cardoso, JS;

Abstract
Breast cancer patients often face difficulties when choosing among diverse surgeries. To aid patients, this paper proposes ACID-GAN (Anatomically and Clinically Informed Deep Generative Adversarial Network), a conditional generative model for predicting post-operative breast cancer outcomes using deep learning. Built on Pix2Pix, the model incorporates clinical metadata, such as surgery type and cancer laterality, by introducing a dedicated encoder for semantic supervision. Further improvements include colour preservation and anatomically informed losses, as well as clinical supervision via segmentation and classification modules. Experiments on a private dataset demonstrate that the model produces realistic, context-aware predictions. The results demonstrate that the model presents a meaningful trade-off between generating precise, anatomically defined results and maintaining patient-specific appearance, such as skin tone and shape. © 2025 Elsevier B.V., All rights reserved.

FecharLer Abstract