2025
Autores
Nogueira, AFR; Oliveira, HP; Teixeira, LF;
Publicação
IMAGE AND VISION COMPUTING
Abstract
3D human pose estimation aims to reconstruct the human skeleton of all the individuals in a scene by detecting several body joints. The creation of accurate and efficient methods is required for several real-world applications including animation, human-robot interaction, surveillance systems or sports, among many others. However, several obstacles such as occlusions, random camera perspectives, or the scarcity of 3D labelled data, have been hampering the models' performance and limiting their deployment in real-world scenarios. The higher availability of cameras has led researchers to explore multi-view solutions due to the advantage of being able to exploit different perspectives to reconstruct the pose. Most existing reviews focus mainly on monocular 3D human pose estimation and a comprehensive survey only on multi-view approaches to determine the 3D pose has been missing since 2012. Thus, the goal of this survey is to fill that gap and present an overview of the methodologies related to 3D pose estimation in multi-view settings, understand what were the strategies found to address the various challenges and also, identify their limitations. According to the reviewed articles, it was possible to find that most methods are fully-supervised approaches based on geometric constraints. Nonetheless, most of the methods suffer from 2D pose mismatches, to which the incorporation of temporal consistency and depth information have been suggested to reduce the impact of this limitation, besides working directly with 3D features can completely surpass this problem but at the expense of higher computational complexity. Models with lower supervision levels were identified to overcome some of the issues related to 3D pose, particularly the scarcity of labelled datasets. Therefore, no method is yet capable of solving all the challenges associated with the reconstruction of the 3D pose. Due to the existing trade-off between complexity and performance, the best method depends on the application scenario. Therefore, further research is still required to develop an approach capable of quickly inferring a highly accurate 3D pose with bearable computation cost. To this goal, techniques such as active learning, methods that learn with a low level of supervision, the incorporation of temporal consistency, view selection, estimation of depth information and multi-modal approaches might be interesting strategies to keep in mind when developing a new methodology to solve this task.
2024
Autores
Patrício, C; Barbano, CA; Fiandrotti, A; Renzulli, R; Grangetto, M; Teixeira, LF; Neves, JC;
Publicação
CoRR
Abstract
2022
Autores
Gonçalves, T; Torto, IR; Teixeira, LF; Cardoso, JS;
Publicação
CoRR
Abstract
2024
Autores
Rio-Torto, I; Gonçalves, T; Cardoso, JS; Teixeira, LF;
Publicação
IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING, ISBI 2024
Abstract
In fields that rely on high-stakes decisions, such as medicine, interpretability plays a key role in promoting trust and facilitating the adoption of deep learning models by the clinical communities. In the medical image analysis domain, gradient-based class activation maps are the most widely used explanation methods and the field lacks a more in depth investigation into inherently interpretable models that focus on integrating knowledge that ensures the model is learning the correct rules. A new approach, B-cos networks, for increasing the interpretability of deep neural networks by inducing weight-input alignment during training showed promising results on natural image classification. In this work, we study the suitability of these B-cos networks to the medical domain by testing them on different use cases (skin lesions, diabetic retinopathy, cervical cytology, and chest X-rays) and conducting a thorough evaluation of several explanation quality assessment metrics. We find that, just like in natural image classification, B-cos explanations yield more localised maps, but it is not clear that they are better than other methods' explanations when considering more explanation properties.
2025
Autores
Oliveira, M; Cerqueira, R; Pinto, JR; Fonseca, J; Teixeira, LF;
Publicação
IEEE Trans. Intell. Veh.
Abstract
Autonomous Vehicles aim to understand their surrounding environment by detecting relevant objects in the scene, which can be performed using a combination of sensors. The accurate prediction of pedestrians is a particularly challenging task, since the existing algorithms have more difficulty detecting small objects. This work studies and addresses this often overlooked problem by proposing Multimodal PointPillars (M-PP), a fast and effective novel fusion architecture for 3D object detection. Inspired by both MVX-Net and PointPillars, image features from a 2D CNN-based feature map are fused with the 3D point cloud in an early fusion architecture. By changing the heavy 3D convolutions of MVX-Net to a set of convolutional layers in 2D space, along with combining LiDAR and image information at an early stage, M-PP considerably improves inference time over the baseline, running at 28.49 Hz. It achieves inference speeds suitable for real-world applications while keeping the high performance of multimodal approaches. Extensive experiments show that our proposed architecture outperforms both MVX-Net and PointPillars for the pedestrian class in the KITTI 3D object detection dataset, with 62.78% in
2023
Autores
Silva, D; Agrotis, G; Tan, RB; Teixeira, LF; Silva, W;
Publicação
International Conference on Machine Learning and Applications, ICMLA 2023, Jacksonville, FL, USA, December 15-17, 2023
Abstract
Deep Learning models are tremendously valuable in several prediction tasks, and their use in the medical field is spreading abruptly, especially in computer vision tasks, evaluating the content in X-rays, CTs or MRIs. These methods can save a significant amount of time for doctors in patient diagnostics and help in treatment planning. However, these models are significantly sensitive to confounders in the training data and generally suffer a performance hit when dealing with out-of-distribution data, affecting their reliability and scalability in different medical institutions. Deep Learning research on Medical datasets may overlook essential details regarding the image acquisition procedure and the preprocessing steps. This work proposes a data-centric approach, exploring the potential of attention maps as a regularisation technique to improve robustness and generalisation. We use image metadata and explore self-attention maps and contrastive learning to promote feature space invariance to image disturbance. Experiments were conducted using Chest X-ray datasets that are publicly available. Some datasets contained information about the windowing settings applied by the radiologist, acting as a source of variability. The proposed model was tested and outperformed the baseline in out-of-distribution data, serving as a proof of concept. © 2023 IEEE.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.