Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by Pedro Miguel Carvalho

2025

Enhancing Weakly-Supervised Video Anomaly Detection With Temporal Constraints

Authors
Caetano, F; Carvalho, P; Mastralexi, C; Cardoso, JS;

Publication
IEEE ACCESS

Abstract
Anomaly Detection has been a significant field in Machine Learning since it began gaining traction. In the context of Computer Vision, the increased interest is notorious as it enables the development of video processing models for different tasks without the need for a cumbersome effort with the annotation of possible events, that may be under represented. From the predominant strategies, weakly and semi-supervised, the former has demonstrated potential to achieve a higher score in its analysis, adding to its flexibility. This work shows that using temporal ranking constraints for Multiple Instance Learning can increase the performance of these models, allowing the focus on the most informative instances. Moreover, the results suggest that altering the ranking process to include information about adjacent instances generates best-performing models.

2024

A Transition Towards Virtual Representations of Visual Scenes

Authors
Pereira, A; Carvalho, P; Côrte Real, L;

Publication
Advances in Internet of Things & Embedded Systems

Abstract
We propose a unified architecture for visual scene understanding, aimed at overcoming the limitations of traditional, fragmented approaches in computer vision. Our work focuses on creating a system that accurately and coherently interprets visual scenes, with the ultimate goal to provide a 3D virtual representation, which is particularly useful for applications in virtual and augmented reality. By integrating various visual and semantic processing tasks into a single, adaptable framework, our architecture simplifies the design process, ensuring a seamless and consistent scene interpretation. This is particularly important in complex systems that rely on 3D synthesis, as the need for precise and semantically coherent scene descriptions keeps on growing. Our unified approach addresses these challenges, offering a flexible and efficient solution. We demonstrate the practical effectiveness of our architecture through a proof-of-concept system and explore its potential in various application domains, proving its value in advancing the field of computer vision.

2024

Improving Efficiency in Facial Recognition Tasks Through a Dataset Optimization Approach

Authors
Vilça, L; Viana, P; Carvalho, P; Andrade, MT;

Publication
IEEE ACCESS

Abstract
It is well known that the performance of Machine Learning techniques, notably when applied to Computer Vision (CV), depends heavily on the amount and quality of the training data set. However, large data sets lead to time-consuming training loops and, in many situations, are difficult or even impossible to create. Therefore, there is a need for solutions to reduce their size while ensuring good levels of performance, i.e., solutions that obtain the best tradeoff between the amount/quality of training data and the model's performance. This paper proposes a dataset reduction approach for training data used in Deep Learning methods in Facial Recognition (FR) problems. We focus on maximizing the variability of representations for each subject (person) in the training data, thus favoring quality instead of size. The main research questions are: 1) Which facial features better discriminate different identities? 2) Will it be possible to significantly reduce the training time without compromising performance? 3) Should we favor quality over quantity for very large datasets in FR? This analysis uses a pipeline to discriminate a set of features suitable for capturing the diversity and a cluster-based sampling to select the best images for each training subject, i.e., person. Results were obtained using VGGFace2 and Labeled Faces in the Wild (for benchmarking) and show that, with the proposed approach, a data reduction is possible while ensuring similar levels of accuracy.

2023

Synthesizing Human Activity for Data Generation

Authors
Romero, A; Carvalho, P; Corte-Real, L; Pereira, A;

Publication
JOURNAL OF IMAGING

Abstract
The problem of gathering sufficiently representative data, such as those about human actions, shapes, and facial expressions, is costly and time-consuming and also requires training robust models. This has led to the creation of techniques such as transfer learning or data augmentation. However, these are often insufficient. To address this, we propose a semi-automated mechanism that allows the generation and editing of visual scenes with synthetic humans performing various actions, with features such as background modification and manual adjustments of the 3D avatars to allow users to create data with greater variability. We also propose an evaluation methodology for assessing the results obtained using our method, which is two-fold: (i) the usage of an action classifier on the output data resulting from the mechanism and (ii) the generation of masks of the avatars and the actors to compare them through segmentation. The avatars were robust to occlusion, and their actions were recognizable and accurate to their respective input actors. The results also showed that even though the action classifier concentrates on the pose and movement of the synthetic humans, it strongly depends on contextual information to precisely recognize the actions. Generating the avatars for complex activities also proved problematic for action recognition and the clean and precise formation of the masks.

2007

An MPEG-21 Web Peer for the consumption of Digital Items

Authors
Ciobanu, G; Andrade, MT; Carvalho, P; Carrapatoso, E;

Publication
NOVAS PERSPECTIVAS EM SISTEMAS E TECNOLOGIAS DE INFORMACAO, VOL II

Abstract
MPEG-21 enables content consumers to access and interoperate with a large variety of multimedia resources and their descriptions in a flexible manner. Considering the great heterogeneity that presently exists across the entire multimedia content chain and the growing importance of open standards to facilitate the interoperations across environments, applications and formats, an MPEG-21 Peer was developed to process and present complex multimedia content, represented as MPEG-21 Digital Items. The novelty of the work essentially relies on the adoption of a Web Services architecture, based on a single Digital Items processing core available for all types of terminal devices.

2008

A MULTIMEDIA TERMINAL FOR ADAPTATION AND END-TO-END QOS CONTROL

Authors
Shao, BL; Mattavelli, M; Renzi, D; Andrade, MT; Battista, S; Keller, S; Ciobanu, G; Carvalho, P;

Publication
2008 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-4

Abstract
This paper addresses multimedia end user system design for content distribution over heterogeneous networks and terminals, with particular focus on End-to-End quality of service (QoS) control. A multimedia terminal comprising content-related metadata processor, usage environment characteristics provider, end user QoS monitor and H.264's extension Scalable Video Coding (SVC) audio-visual player in coordination under a terminal middleware, has been conceived and implemented. This end user terminal enables End-to-End QoS control for content adaptation solution both in semantic and physical approaches to maximize end user's perceptual experience and minimize resources. Such design approach illustrates a possible architecture for next generation multimedia end user terminal supporting MPEG-21 and H.264's extension SVC codec standards.

  • 5
  • 7