Publicacoes - INESC TEC

Publicações

Publicações por Ana Maria Mendonça

2024

CLARE-XR: explainable regression-based classification of chest radiographs with label embeddings

Autores
Rocha, J; Pereira, SC; Sousa, P; Campilho, A; Mendonca, AM;

Publicação
SCIENTIFIC REPORTS

Abstract
An automatic system for pathology classification in chest X-ray scans needs more than predictive performance, since providing explanations is deemed essential for fostering end-user trust, improving decision-making, and regulatory compliance. CLARE-XR is a novel methodology that, when presented with an X-ray image, identifies the associated pathologies and provides explanations based on the presentation of similar cases. The diagnosis is achieved using a regression model that maps an image into a 2D latent space containing the reference coordinates of all findings. The references are generated once through label embedding, before the regression step, by converting the original binary ground-truth annotations to 2D coordinates. The classification is inferred minding the distance from the coordinates of an inference image to the reference coordinates. Furthermore, as the regressor is trained on a known set of images, the distance from the coordinates of an inference image to the coordinates of the training set images also allows retrieving similar instances, mimicking the common clinical practice of comparing scans to confirm diagnoses. This inherently interpretable framework discloses specific classification rules and visual explanations through automatic image retrieval methods, outperforming the multi-label ResNet50 classification baseline across multiple evaluation settings on the NIH ChestX-ray14 dataset.

FecharLer Abstract

2025

Grad-CAM: The impact of large receptive fields and other caveats

Autores
Santos, R; Pedrosa, J; Mendonça, AM; Campilho, A;

Publicação
COMPUTER VISION AND IMAGE UNDERSTANDING

Abstract
The increase in complexity of deep learning models demands explanations that can be obtained with methods like Grad-CAM. This method computes an importance map for the last convolutional layer relative to a specific class, which is then upsampled to match the size of the input. However, this final step assumes that there is a spatial correspondence between the last feature map and the input, which may not be the case. We hypothesize that, for models with large receptive fields, the feature spatial organization is not kept during the forward pass, which may render the explanations devoid of meaning. To test this hypothesis, common architectures were applied to a medical scenario on the public VinDr-CXR dataset, to a subset of ImageNet and to datasets derived from MNIST. The results show a significant dispersion of the spatial information, which goes against the assumption of Grad-CAM, and that explainability maps are affected by this dispersion. Furthermore, we discuss several other caveats regarding Grad-CAM, such as feature map rectification, empty maps and the impact of global average pooling or flatten layers. Altogether, this work addresses some key limitations of Grad-CAM which may go unnoticed for common users, taking one step further in the pursuit for more reliable explainability methods.

FecharLer Abstract