Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
About

About

I was born in the district of Porto. I got a degree in Eletric and Computer Engeneering in 2001, a Master degre in Networks and Communication Services in 2004 and the PhD degree in Eletric and COmputer Engeneering in 2012, all from the Faculty of Engeneering of the University of Porto. I've been a collaborator of INESC TEC since 2001 and I'm currently a Senior Researcher at the Center of Telecommunications and Multimedia. I'm also an Invited Adjunct Professor at the School f Engeneering of the Polythecnic Institute of Porto. My current reseach interests include image and video processing, multimedia systems and computer vision. 

Interest
Topics
Details

Details

  • Name

    Pedro Miguel Carvalho
  • Role

    Senior Researcher
  • Since

    01st September 2001
014
Publications

2025

Enhancing Weakly-Supervised Video Anomaly Detection With Temporal Constraints

Authors
Caetano, F; Carvalho, P; Mastralexi, C; Cardoso, JS;

Publication
IEEE ACCESS

Abstract
Anomaly Detection has been a significant field in Machine Learning since it began gaining traction. In the context of Computer Vision, the increased interest is notorious as it enables the development of video processing models for different tasks without the need for a cumbersome effort with the annotation of possible events, that may be under represented. From the predominant strategies, weakly and semi-supervised, the former has demonstrated potential to achieve a higher score in its analysis, adding to its flexibility. This work shows that using temporal ranking constraints for Multiple Instance Learning can increase the performance of these models, allowing the focus on the most informative instances. Moreover, the results suggest that altering the ranking process to include information about adjacent instances generates best-performing models.

2025

Correction: Guimarães et al. A Review of Recent Advances and Challenges in Grocery Label Detection and Recognition. Appl. Sci. 2023, 13, 2871

Authors
Guimarães, V; Nascimento, J; Viana, P; Carvalho, P;

Publication
Applied Sciences

Abstract
There was an error in the original publication [...]

2025

Exploring Motion Information in Homography Calculation for Football Matches With Moving Cameras

Authors
Gomes, C; Mastralexi, C; Carvalho, P;

Publication
IEEE ACCESS

Abstract
In football, where minor differences can significantly affect outcomes and performance, automatic video analysis has become a critical tool for analyzing and optimizing team strategies. However, many existing solutions require expensive and complex hardware comprising multiple cameras, sensors, or GPS devices, limiting accessibility for many clubs, particularly those with limited resources. Using images and video from a moving camera can help a wider audience benefit from video analysis, but it introduces new challenges related to motion. To address this, we explore an alternative homography estimation in moving camera scenarios. Homography plays a crucial role in video analysis, but presents challenges when keypoints are sparse, especially in dynamic environments. Existing techniques predominantly rely on visible keypoints and apply homography transformations on a frame-by-frame basis, often lacking temporal consistency and facing challenges in areas with sparse keypoints. This paper explores the use of estimated motion information for homography computation. Our experimental results reveal that integrating motion data directly into homography estimations leads to reduced errors in keypoint-sparse frames, surpassing state-of-the-art methods, filling a current gap in moving camera scenarios.

2024

Improving Efficiency in Facial Recognition Tasks Through a Dataset Optimization Approach

Authors
Vilça, L; Viana, P; Carvalho, P; Andrade, MT;

Publication
IEEE ACCESS

Abstract
It is well known that the performance of Machine Learning techniques, notably when applied to Computer Vision (CV), depends heavily on the amount and quality of the training data set. However, large data sets lead to time-consuming training loops and, in many situations, are difficult or even impossible to create. Therefore, there is a need for solutions to reduce their size while ensuring good levels of performance, i.e., solutions that obtain the best tradeoff between the amount/quality of training data and the model's performance. This paper proposes a dataset reduction approach for training data used in Deep Learning methods in Facial Recognition (FR) problems. We focus on maximizing the variability of representations for each subject (person) in the training data, thus favoring quality instead of size. The main research questions are: 1) Which facial features better discriminate different identities? 2) Will it be possible to significantly reduce the training time without compromising performance? 3) Should we favor quality over quantity for very large datasets in FR? This analysis uses a pipeline to discriminate a set of features suitable for capturing the diversity and a cluster-based sampling to select the best images for each training subject, i.e., person. Results were obtained using VGGFace2 and Labeled Faces in the Wild (for benchmarking) and show that, with the proposed approach, a data reduction is possible while ensuring similar levels of accuracy.

2024

A Transition Towards Virtual Representations of Visual Scenes

Authors
Pereira, A; Carvalho, P; Côrte Real, L;

Publication
Advances in Internet of Things & Embedded Systems

Abstract
We propose a unified architecture for visual scene understanding, aimed at overcoming the limitations of traditional, fragmented approaches in computer vision. Our work focuses on creating a system that accurately and coherently interprets visual scenes, with the ultimate goal to provide a 3D virtual representation, which is particularly useful for applications in virtual and augmented reality. By integrating various visual and semantic processing tasks into a single, adaptable framework, our architecture simplifies the design process, ensuring a seamless and consistent scene interpretation. This is particularly important in complex systems that rely on 3D synthesis, as the need for precise and semantically coherent scene descriptions keeps on growing. Our unified approach addresses these challenges, offering a flexible and efficient solution. We demonstrate the practical effectiveness of our architecture through a proof-of-concept system and explore its potential in various application domains, proving its value in advancing the field of computer vision.

Supervised
thesis

2023

Image Processing of Grocery Labels for Assisted Analysis

Author
Jéssica Mireie Fernandes do Nascimento

Institution

2023

Synthesing Human Activity for Data Generation

Author
Ana Ysabella Rodrigues Romero

Institution

2022

Image Processing for Football Game Analysis

Author
Francisco Gonçalves Sousa

Institution

2022

Visual Data Processing for Anomaly Detection

Author
Francisco Tiago de Espírito Santo e Caetano

Institution

2022

Identification and extraction of floor planes for 3D representation

Author
Carlos Miguel Guerra Soeiro

Institution