Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
About

About

I was born in the district of Porto. I got a degree in Eletric and Computer Engeneering in 2001, a Master degre in Networks and Communication Services in 2004 and the PhD degree in Eletric and COmputer Engeneering in 2012, all from the Faculty of Engeneering of the University of Porto. I've been a collaborator of INESC TEC since 2001 and I'm currently a Senior Researcher at the Center of Telecommunications and Multimedia. I'm also an Invited Adjunct Professor at the School f Engeneering of the Polythecnic Institute of Porto. My current reseach interests include image and video processing, multimedia systems and computer vision. 

Interest
Topics
Details

Details

  • Name

    Pedro Miguel Carvalho
  • Role

    Senior Researcher
  • Since

    01st September 2001
014
Publications

2025

Automatic Visual Inspection for Industrial Application

Authors
Ribeiro, AG; Vilaça, L; Costa, C; da Costa, TS; Carvalho, PM;

Publication
JOURNAL OF IMAGING

Abstract
Quality control represents a critical function in industrial environments, ensuring that manufactured products meet strict standards and remain free from defects. In highly regulated sectors such as the pharmaceutical industry, traditional manual inspection methods remain widely used. However, these are time-consuming and prone to human error, and they lack the reliability required for large-scale operations, highlighting the urgent need for automated solutions. This is crucial for industrial applications, where environments evolve and new defect types can arise unpredictably. This work proposes an automated visual defect detection system specifically designed for pharmaceutical bottles, with potential applicability in other manufacturing domains. Various methods were integrated to create robust tools capable of real-world deployment. A key strategy is the use of incremental learning, which enables machine learning models to incorporate new, unseen data without full retraining, thus enabling adaptation to new defects as they appear, allowing models to handle rare cases while maintaining stability and performance. The proposed solution incorporates a multi-view inspection setup to capture images from multiple angles, enhancing accuracy and robustness. Evaluations in real-world industrial conditions demonstrated high defect detection rates, confirming the effectiveness of the proposed approach.

2025

Exploring Motion Information in Homography Calculation for Football Matches With Moving Cameras

Authors
Gomes, C; Mastralexi, C; Carvalho, P;

Publication
IEEE ACCESS

Abstract
In football, where minor differences can significantly affect outcomes and performance, automatic video analysis has become a critical tool for analyzing and optimizing team strategies. However, many existing solutions require expensive and complex hardware comprising multiple cameras, sensors, or GPS devices, limiting accessibility for many clubs, particularly those with limited resources. Using images and video from a moving camera can help a wider audience benefit from video analysis, but it introduces new challenges related to motion. To address this, we explore an alternative homography estimation in moving camera scenarios. Homography plays a crucial role in video analysis, but presents challenges when keypoints are sparse, especially in dynamic environments. Existing techniques predominantly rely on visible keypoints and apply homography transformations on a frame-by-frame basis, often lacking temporal consistency and facing challenges in areas with sparse keypoints. This paper explores the use of estimated motion information for homography computation. Our experimental results reveal that integrating motion data directly into homography estimations leads to reduced errors in keypoint-sparse frames, surpassing state-of-the-art methods, filling a current gap in moving camera scenarios.

2025

Correction to: A Review of Recent Advances and Challenges in Grocery Label Detection and Recognition (Applied Sciences, (2023), 13, 5, (2871), 10.3390/app13052871)

Authors
Guimarães, V; Nascimento, J; Viana, P; Carvalho, P;

Publication
Applied Sciences (Switzerland)

Abstract
There was an error in the original publication [1]. The statement in the Acknowledgments section is incorrect and should be removed because the official start of the project WATSON was after the paper’s publication date. The authors state that the scientific conclusions are unaffected. This correction was approved by the Academic Editor. The original publication has also been updated. © 2025 by the authors.

2025

Enhancing Weakly-Supervised Video Anomaly Detection With Temporal Constraints

Authors
Caetano, F; Carvalho, P; Mastralexi, C; Cardoso, JS;

Publication
IEEE ACCESS

Abstract
Anomaly Detection has been a significant field in Machine Learning since it began gaining traction. In the context of Computer Vision, the increased interest is notorious as it enables the development of video processing models for different tasks without the need for a cumbersome effort with the annotation of possible events, that may be under represented. From the predominant strategies, weakly and semi-supervised, the former has demonstrated potential to achieve a higher score in its analysis, adding to its flexibility. This work shows that using temporal ranking constraints for Multiple Instance Learning can increase the performance of these models, allowing the focus on the most informative instances. Moreover, the results suggest that altering the ranking process to include information about adjacent instances generates best-performing models.

2024

A Transition Towards Virtual Representations of Visual Scenes

Authors
Pereira, A; Carvalho, P; Côrte Real, L;

Publication
Advances in Internet of Things & Embedded Systems

Abstract
We propose a unified architecture for visual scene understanding, aimed at overcoming the limitations of traditional, fragmented approaches in computer vision. Our work focuses on creating a system that accurately and coherently interprets visual scenes, with the ultimate goal to provide a 3D virtual representation, which is particularly useful for applications in virtual and augmented reality. By integrating various visual and semantic processing tasks into a single, adaptable framework, our architecture simplifies the design process, ensuring a seamless and consistent scene interpretation. This is particularly important in complex systems that rely on 3D synthesis, as the need for precise and semantically coherent scene descriptions keeps on growing. Our unified approach addresses these challenges, offering a flexible and efficient solution. We demonstrate the practical effectiveness of our architecture through a proof-of-concept system and explore its potential in various application domains, proving its value in advancing the field of computer vision.