2025
Authors
Oliveira Coelho, BF; Cardoso, JS;
Publication
Neurocomputing
Abstract
In order to facilitate the adoption of deep learning in areas where decisions are of critical importance, understanding the model's internal workings is paramount. Nevertheless, since most models are considered black boxes, this task is usually not trivial, especially when the user does not have access to the network's intermediate outputs. In this paper, we propose IBISA, a model-agnostic attribution method that reaches state-of-the-art performance by optimizing sampling masks using the Information Bottleneck Principle. Our method improves on the previously known RISE and IBA techniques by placing the bottleneck right after the image input without complex formulations to estimate the mutual information. The method also requires only twenty forward passes and ten backward passes through the network, which is significantly faster than RISE, which needs at least 4000 forward passes. We evaluated IBISA using a VGG-16 and a ResNET-50 model, showing that our method produces explanations comparable or superior to IBA, RISE, and Grad-CAM but much more efficiently. © 2025 The Authors
2025
Authors
Nunes, JD; Montezuma, D; Oliveira, D; Pereira, T; Zlobec, I; Pinto, IM; Cardoso, JS;
Publication
SENSORS
Abstract
Due to the high variability in Hematoxylin and Eosin (H&E)-stained Whole Slide Images (WSIs), hidden stratification, and batch effects, generalizing beyond the training distribution is one of the main challenges in Deep Learning (DL) for Computational Pathology (CPath). But although DL depends on large volumes of diverse and annotated data, it is common to have a significant number of annotated samples from one or multiple source distributions, and another partially annotated or unlabeled dataset representing a target distribution for which we want to generalize, the so-called Domain Adaptation (DA). In this work, we focus on the task of generalizing from a single source distribution to a target domain. As it is still not clear which domain adaptation strategy is best suited for CPath, we evaluate three different DA strategies, namely FixMatch, CycleGAN, and a self-supervised feature extractor, and show that DA is still a challenge in CPath.
2025
Authors
Larbi, A; Abed, M; Cardoso, JS; Ouahabi, A;
Publication
BIOMEDICAL SIGNAL PROCESSING AND CONTROL
Abstract
Neonatal seizures represent a critical medical issue that requires prompt diagnosis and treatment. Typically, at-risk newborns undergo a Magnetic Resonance Imaging (MRI) brain assessment followed by continuous seizure monitoring using multichannel EEG. Visual analysis of multichannel electroencephalogram (EEG) recordings remains the standard modality for seizure detection; however, it is limited by fatigue and delayed seizure identification. Advances in machine and deep learning have led to the development of powerful neonatal seizure detection algorithms that may help address these limitations. Nevertheless, their performance remains relatively low and often disregards the non-stationary attributes of EEG signals, especially when learned from weakly labeled EEG data. In this context, the present paper proposes a novel deep-learning approach for neonatal seizure detection. The method employs rigorous preprocessing to reduce noise and artifacts, along with a recently developed time-frequency distribution (TFD) derived from a separable compact support kernel to capture the fast spectral changes associated with neonatal seizures. The high-resolution TFD diagrams are then converted into RGB images and used as inputs to a pre-trained ResNet-18 model. This is followed by the training of an attention-based multiple-instance learning (MIL) mechanism. The purpose is to perform a spatial time-frequency analysis that can highlight which channels exhibit seizure activity, thereby reducing the time required for secondary evaluation by a doctor. Additionally, per-instance learning (PIL) is performed to further validate the robustness of our TFD and methodology. Tested on the Helsinki public dataset, the PIL model achieved an area under the curve (AUC) of 96.8%, while the MIL model attained an average AUC of 94.1%, surpassing similar attention-based methods.
2025
Authors
Nogueira, AFR; Oliveira, HP; Teixeira, LF;
Publication
IMAGE AND VISION COMPUTING
Abstract
3D human pose estimation aims to reconstruct the human skeleton of all the individuals in a scene by detecting several body joints. The creation of accurate and efficient methods is required for several real-world applications including animation, human-robot interaction, surveillance systems or sports, among many others. However, several obstacles such as occlusions, random camera perspectives, or the scarcity of 3D labelled data, have been hampering the models' performance and limiting their deployment in real-world scenarios. The higher availability of cameras has led researchers to explore multi-view solutions due to the advantage of being able to exploit different perspectives to reconstruct the pose. Most existing reviews focus mainly on monocular 3D human pose estimation and a comprehensive survey only on multi-view approaches to determine the 3D pose has been missing since 2012. Thus, the goal of this survey is to fill that gap and present an overview of the methodologies related to 3D pose estimation in multi-view settings, understand what were the strategies found to address the various challenges and also, identify their limitations. According to the reviewed articles, it was possible to find that most methods are fully-supervised approaches based on geometric constraints. Nonetheless, most of the methods suffer from 2D pose mismatches, to which the incorporation of temporal consistency and depth information have been suggested to reduce the impact of this limitation, besides working directly with 3D features can completely surpass this problem but at the expense of higher computational complexity. Models with lower supervision levels were identified to overcome some of the issues related to 3D pose, particularly the scarcity of labelled datasets. Therefore, no method is yet capable of solving all the challenges associated with the reconstruction of the 3D pose. Due to the existing trade-off between complexity and performance, the best method depends on the application scenario. Therefore, further research is still required to develop an approach capable of quickly inferring a highly accurate 3D pose with bearable computation cost. To this goal, techniques such as active learning, methods that learn with a low level of supervision, the incorporation of temporal consistency, view selection, estimation of depth information and multi-modal approaches might be interesting strategies to keep in mind when developing a new methodology to solve this task.
2025
Authors
Patrício, C; Teixeira, LF; Neves, JC;
Publication
COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL
Abstract
The main challenges hindering the adoption of deep learning-based systems in clinical settings are the scarcity of annotated data and the lack of interpretability and trust in these systems. Concept Bottleneck Models (CBMs) offer inherent interpretability by constraining the final disease prediction on a set of human-understandable concepts. However, this inherent interpretability comes at the cost of greater annotation burden. Additionally, adding new concepts requires retraining the entire system. In this work, we introduce a novel two-step methodology that addresses both of these challenges. By simulating the two stages of a CBM, we utilize a pretrained Vision Language Model (VLM) to automatically predict clinical concepts, and an off-the-shelf Large Language Model (LLM) to generate disease diagnoses grounded on the predicted concepts. Furthermore, our approach supports test-time human intervention, enabling corrections to predicted concepts, which improves final diagnoses and enhances transparency in decision-making. We validate our approach on three skin lesion datasets, demonstrating that it outperforms traditional CBMs and state-of-the-art explainable methods, all without requiring any training and utilizing only a few annotated examples. The code is available at https://github.com/CristianoPatricio/2step-concept-based-skin-diagnosis.
2025
Authors
Silva, F; Oliveira, HP; Pereira, T;
Publication
ACM COMPUTING SURVEYS
Abstract
The large gap between the generalization level of state-of-the-art machine learning and human learning systems calls for the development of artificial intelligence (AI) models that are truly inspired by human cognition. In tasks related to image analysis, searching for pixel-level regularities has reached a power of information extraction still far from what humans capture with image-based observations. This leads to poor generalization when even small shifts occur at the level of the observations. We explore a perspective on this problem that is directed to learning the generative process with causality-related foundations, using models capable of combining symbolic manipulation, probabilistic reasoning, and pattern recognition abilities. We briefly review and explore connections of research from machine learning, cognitive science, and related fields of human behavior to support our perspective for the direction to more robust and human-like artificial learning systems.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.