Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
About

About

I am a Coordinator Professor at the Polytechnic of Porto and a Researcher at INESC TEC, where I lead the Multimedia Communications Technology Area. I obtained my PhD from University of Porto in the area of multimedia content management. I have been responsible for the participation of INESC TEC in several national and European projects, involving universities and media industries. Author of several publications, I am also an active reviewer for journals and conferences and engaged in the organization of workshops and program committees in the area of Multimedia. Recently I co-chaired the Immersive Media Experiences workshop series (2013-2015) at ACM MM. Additionally I am also often engaged in the evaluation of European and Portuguese research proposals and projects. My main research activities and interests are in the field of networked audiovisual systems, including digital television and new services, content management, personalization and recomendation, new media formats and immersive and interactive media.

Interest
Topics
Details

Details

  • Name

    Paula Viana
  • Role

    Area Manager
  • Since

    01st January 1993
020
Publications

2025

A Survey of Recent Advances and Challenges in Deep Audio-Visual Correlation Learning

Authors
Vilaça, L; Yu, Y; Viana, P;

Publication
ACM Computing Surveys

Abstract
Audio-visual correlation learning aims to capture and understand natural phenomena between audio and visual data. The rapid growth of Deep Learning propelled the development of proposals that process audio-visual data and can be observed in the number of proposals in the past years. Thus encouraging the development of a comprehensive survey. Besides analyzing the models used in this context, we also discuss some tasks of definition and paradigm applied in AI multimedia. In addition, we investigate objective functions frequently used and discuss how audio-visual data is exploited in the optimization process, i.e., the different methodologies for representing knowledge in the audio-visual domain. In fact, we focus on how human-understandable mechanisms, i.e., structured knowledge that reflects comprehensible knowledge, can guide the learning process. Most importantly, we provide a summarization of the recent progress of Audio-Visual Correlation Learning (AVCL) and discuss the future research directions.

2025

Correction: Guimarães et al. A Review of Recent Advances and Challenges in Grocery Label Detection and Recognition. Appl. Sci. 2023, 13, 2871

Authors
Guimarães, V; Nascimento, J; Viana, P; Carvalho, P;

Publication
Applied Sciences

Abstract
There was an error in the original publication [...]

2024

A Machine Learning App for Monitoring Physical Therapy at Home

Authors
Pereira, B; Cunha, B; Viana, P; Lopes, M; Melo, ASC; Sousa, ASP;

Publication
SENSORS

Abstract
Shoulder rehabilitation is a process that requires physical therapy sessions to recover the mobility of the affected limbs. However, these sessions are often limited by the availability and cost of specialized technicians, as well as the patient's travel to the session locations. This paper presents a novel smartphone-based approach using a pose estimation algorithm to evaluate the quality of the movements and provide feedback, allowing patients to perform autonomous recovery sessions. This paper reviews the state of the art in wearable devices and camera-based systems for human body detection and rehabilitation support and describes the system developed, which uses MediaPipe to extract the coordinates of 33 key points on the patient's body and compares them with reference videos made by professional physiotherapists using cosine similarity and dynamic time warping. This paper also presents a clinical study that uses QTM, an optoelectronic system for motion capture, to validate the methods used by the smartphone application. The results show that there are statistically significant differences between the three methods for different exercises, highlighting the importance of selecting an appropriate method for specific exercises. This paper discusses the implications and limitations of the findings and suggests directions for future research.

2024

Improving Efficiency in Facial Recognition Tasks Through a Dataset Optimization Approach

Authors
Vilça, L; Viana, P; Carvalho, P; Andrade, MT;

Publication
IEEE ACCESS

Abstract
It is well known that the performance of Machine Learning techniques, notably when applied to Computer Vision (CV), depends heavily on the amount and quality of the training data set. However, large data sets lead to time-consuming training loops and, in many situations, are difficult or even impossible to create. Therefore, there is a need for solutions to reduce their size while ensuring good levels of performance, i.e., solutions that obtain the best tradeoff between the amount/quality of training data and the model's performance. This paper proposes a dataset reduction approach for training data used in Deep Learning methods in Facial Recognition (FR) problems. We focus on maximizing the variability of representations for each subject (person) in the training data, thus favoring quality instead of size. The main research questions are: 1) Which facial features better discriminate different identities? 2) Will it be possible to significantly reduce the training time without compromising performance? 3) Should we favor quality over quantity for very large datasets in FR? This analysis uses a pipeline to discriminate a set of features suitable for capturing the diversity and a cluster-based sampling to select the best images for each training subject, i.e., person. Results were obtained using VGGFace2 and Labeled Faces in the Wild (for benchmarking) and show that, with the proposed approach, a data reduction is possible while ensuring similar levels of accuracy.

2024

Movie trailer genre classification using multimodal pretrained features

Authors
Sulun, S; Viana, P; Davies, MEP;

Publication
EXPERT SYSTEMS WITH APPLICATIONS

Abstract
We introduce a novel method for movie genre classification, capitalizing on a diverse set of readily accessible pretrained models. These models extract high-level features related to visual scenery, objects, characters, text, speech, music, and audio effects. To intelligently fuse these pretrained features, we train small classifier models with low time and memory requirements. Employing the transformer model, our approach utilizes all video and audio frames of movie trailers without performing any temporal pooling, efficiently exploiting the correspondence between all elements, as opposed to the fixed and low number of frames typically used by traditional methods. Our approach fuses features originating from different tasks and modalities, with different dimensionalities, different temporal lengths, and complex dependencies as opposed to current approaches. Our method outperforms state-of-the-art movie genre classification models in terms of precision, recall, and mean average precision (mAP). To foster future research, we make the pretrained features for the entire MovieNet dataset, along with our genre classification code and the trained models, publicly available.

Supervised
thesis

2023

Image Processing of Grocery Labels for Assisted Analysis

Author
Jéssica Mireie Fernandes do Nascimento

Institution
IPP-ISEP

2023

Solução de Mobilidade numa Cidade Inteligente: Um Sistema de Informação ao Público em Tempo-real

Author
RODRIGO TEIXEIRA GUILHERME AGUIAR RODRIGUES

Institution
IPP-ISEP

2023

Deteção de Veículos Industriais e Pedestres em armazéns utilizando YOLOv3

Author
EDUARDO DA SILVA MIRANDA

Institution
IPP-ISEP

2023

BatEval - Study on different battery technologies for IoT

Author
AFONSO SERRA DUQUE

Institution
IPP-ISEP

2023

Enhancing Indoor Localisation: a Bluetooth Low Energy (BLE) Beacon Placement approach

Author
JOÃO PEDRO DA SILVA DIAS

Institution
IPP-ISEP