Detalhes
Nome
Leonardo Gomes CapozziCargo
Assistente de InvestigaçãoDesde
08 outubro 2020
Nacionalidade
PortugalCentro
Telecomunicações e MultimédiaContactos
+351222094000
leonardo.g.capozzi@inesctec.pt
2025
Autores
Capozzi, L; Ferreira, L; Gonçalves, T; Rebelo, A; Cardoso, JS; Sequeira, AF;
Publicação
Pattern Recognition and Image Analysis - 12th Iberian Conference, IbPRIA 2025, Coimbra, Portugal, June 30 - July 3, 2025, Proceedings, Part II
Abstract
The rapid advancement of wireless technologies, particularly Wi-Fi, has spurred significant research into indoor human activity detection across various domains (e.g., healthcare, security, and industry). This work explores the non-invasive and cost-effective Wi-Fi paradigm and the application of deep learning for human activity recognition using Wi-Fi signals. Focusing on the challenges in machine interpretability, motivated by the increase in data availability and computational power, this paper uses explainable artificial intelligence to understand the inner workings of transformer-based deep neural networks designed to estimate human pose (i.e., human skeleton key points) from Wi-Fi channel state information. Using different strategies to assess the most relevant sub-carriers (i.e., rollout attention and masking attention) for the model predictions, we evaluate the performance of the model when it uses a given number of sub-carriers as input, selected randomly or by ascending (high-attention) or descending (low-attention) order. We concluded that the models trained with fewer (but relevant) sub-carriers are competitive with the baseline (trained with all sub-carriers) but better in terms of computational efficiency (i.e., processing more data per second). © 2025 Elsevier B.V., All rights reserved.
2023
Autores
Castro, E; Ferreira, PM; Rebelo, A; Rio-Torto, I; Capozzi, L; Ferreira, MF; Goncalves, T; Albuquerque, T; Silva, W; Afonso, C; Sousa, RG; Cimarelli, C; Daoudi, N; Moreira, G; Yang, HY; Hrga, I; Ahmad, J; Keswani, M; Beco, S;
Publicação
MACHINE VISION AND APPLICATIONS
Abstract
Every year, the VISion Understanding and Machine intelligence (VISUM) summer school runs a competition where participants can learn and share knowledge about Computer Vision and Machine Learning in a vibrant environment. 2021 VISUM's focused on applying those methodologies in fashion. Recently, there has been an increase of interest within the scientific community in applying computer vision methodologies to the fashion domain. That is highly motivated by fashion being one of the world's largest industries presenting a rapid development in e-commerce mainly since the COVID-19 pandemic. Computer Vision for Fashion enables a wide range of innovations, from personalized recommendations to outfit matching. The competition enabled students to apply the knowledge acquired in the summer school to a real-world problem. The ambition was to foster research and development in fashion outfit complementary product retrieval by leveraging vast visual and textual data with domain knowledge. For this, a new fashion outfit dataset (acquired and curated by FARFETCH) for research and benchmark purposes is introduced. Additionally, a competitive baseline with an original negative sampling process for triplet mining was implemented and served as a starting point for participants. The top 3 performing methods are described in this paper since they constitute the reference state-of-the-art for this particular problem. To our knowledge, this is the first challenge in fashion outfit complementary product retrieval. Moreover, this joint project between academia and industry brings several relevant contributions to disseminating science and technology, promoting economic and social development, and helping to connect early-career researchers to real-world industry challenges.
2022
Autores
Pinto, JR; Carvalho, P; Pinto, C; Sousa, A; Capozzi, L; Cardoso, JS;
Publicação
PROCEEDINGS OF THE 17TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 5
Abstract
With the advent of self-driving cars, and big companies such as Waymo or Bosch pushing forward into fully driverless transportation services, the in-vehicle behaviour of passengers must be monitored to ensure safety and comfort. The use of audio-visual information is attractive by its spatio-temporal richness as well as non-invasive nature, but faces tile likely constraints posed by available hardware and energy consumption. Hence new strategies are required to improve the usage of these scarce resources. We propose the processing of audio and visual data in a cascade pipeline for in-vehicle action recognition. The data is processed by modality-specific sub-modules. with subsequent ones being used when a confident classification is not reached. Experiments show an interesting accuracy-acceleration trade-off when compared with a parallel pipeline with late fusion, presenting potential for industrial applications on embedded devices.
2022
Autores
Capozzi, L; Barbosa, V; Pinto, C; Pinto, JR; Pereira, A; Carvalho, PM; Cardoso, JS;
Publicação
IEEE ACCESS
Abstract
With the advent of self-driving cars and the push by large companies into fully driverless transportation services, monitoring passenger behaviour in vehicles is becoming increasingly important for several reasons, such as ensuring safety and comfort. Although several human action recognition (HAR) methods have been proposed, developing a true HAR system remains a very challenging task. If the dataset used to train a model contains a small number of actors, the model can become biased towards these actors and their unique characteristics. This can cause the model to generalise poorly when confronted with new actors performing the same actions. This limitation is particularly acute when developing models to characterise the activities of vehicle occupants, for which data sets are short and scarce. In this study, we describe and evaluate three different methods that aim to address this actor bias and assess their performance in detecting in-vehicle violence. These methods work by removing specific information about the actor from the model's features during training or by using data that is independent of the actor, such as information about body posture. The experimental results show improvements over the baseline model when evaluated with real data. On the Hanau03 Vito dataset, the accuracy improved from 65.33% to 69.41%. On the Sunnyvale dataset, the accuracy improved from 82.81% to 86.62%.
2021
Autores
Capozzi, L; Pinto, JR; Cardoso, JS; Rebelo, A;
Publicação
Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications - 25th Iberoamerican Congress, CIARP 2021, Porto, Portugal, May 10-13, 2021, Revised Selected Papers
Abstract
The task of person re-identification has important applications in security and surveillance systems. It is a challenging problem since there can be a lot of differences between pictures belonging to the same person, such as lighting, camera position, variation in poses and occlusions. The use of Deep Learning has contributed greatly towards more effective and accurate systems. Many works use attention mechanisms to force the models to focus on less distinctive areas, in order to improve performance in situations where important information may be missing. This paper proposes a new, more flexible method for calculating these masks, using a U-Net which receives a picture and outputs a mask representing the most distinctive areas of the picture. Results show that the method achieves an accuracy comparable or superior to those in state-of-the-art methods.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.