Cookies

O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais

Instituição
Investigação
Domínios de Investigação
Inteligência Artificial

Bioengenharia

Comunicações

Ciência e Engenharia dos Computadores

Fotónica

Sistemas de Energia

Robótica

Engenharia e Gestão de Sistemas
CENTROS DE INVESTIGAÇÃO
Porto, Portugal

+351 222 094 000

info@inesctec.pt
Inovação
Inovação / Tec4

TEC4AGRO-FOOD

TEC4ENERGY

TEC4HEALTH

TEC4INDUSTRY

TEC4SEA

TECPARTNERSHIPS

Tecnologias Disponíveis
Porto, Portugal

+351 222 094 000

info@inesctec.pt
Laboratórios
Laboratórios de Investigação

iilab
Comunicação
Notícias

Eventos

Media

Boletim Informativo
Porto, Portugal

+351 222 094 000

info@inesctec.pt
Junte-se a nós
Contactos

Home
Pessoas
Paula Viana

Ler apresentação completa

Sou Professora Coordenadora no Politécnico do Porto e Investigadora no INESC TEC, no Centro de Telecomunicações e Multimédia, onde lidero a área de Tecnologias de Comunicação Multimédia. Tenho um Doutoramento em Engenharia Electrotécnica e de Computadores pela Universidade do Porto, com um foco na àrea da Gestão de Conteúdos Audiovisuais. Enquanto investigadora do INESC TEC, tenho sido responsável por diversos projectos Europeus e Nacionais, envolvendo parceiros da área da indústria, media e academia. Autora de diversas publicações, sou também revisora activa de artigos submetidos a conferências e revistas, membro de comissões científicas e de organização de conferências. Recentemente, organizei a série de Workshops com o tema "Immersive Media Experiences" (2013-2015) na maior conferência na área de multimédia (ACM Multimedia). Participo frequentemente como perita da Comissão Europeia ou de organismos nacionais na avaliação de propostas de investigação. Os meus interesses de investigação centram-na na área dos sistema de comunicação multimedia, incluindo televisão e novos serviços, gestão de conteúdos, personalização e recomendação, novos formatos e conteúdos imersivos e interactivos.

Ler apresentação completa

Sobre

Sobre

Sou Professora Coordenadora no Politécnico do Porto e Investigadora no INESC TEC, no Centro de Telecomunicações e Multimédia, onde lidero a área de Tecnologias de Comunicação Multimédia. Tenho um Doutoramento em Engenharia Electrotécnica e de Computadores pela Universidade do Porto, com um foco na àrea da Gestão de Conteúdos Audiovisuais. Enquanto investigadora do INESC TEC, tenho sido responsável por diversos projectos Europeus e Nacionais, envolvendo parceiros da área da indústria, media e academia. Autora de diversas publicações, sou também revisora activa de artigos submetidos a conferências e revistas, membro de comissões científicas e de organização de conferências. Recentemente, organizei a série de Workshops com o tema "Immersive Media Experiences" (2013-2015) na maior conferência na área de multimédia (ACM Multimedia). Participo frequentemente como perita da Comissão Europeia ou de organismos nacionais na avaliação de propostas de investigação. Os meus interesses de investigação centram-na na área dos sistema de comunicação multimedia, incluindo televisão e novos serviços, gestão de conteúdos, personalização e recomendação, novos formatos e conteúdos imersivos e interactivos.

Tópicos
de interesse

Detalhes

Detalhes

Nome
Paula Viana
Cargo
Responsável de Área
Desde
01 janeiro 1993

Nacionalidade
Portugal
Centro
Telecomunicações e Multimédia
Contactos
+351222094299
paula.viana@inesctec.pt

009

Publicações

Ler todas as publicações

2025

Video Soundtrack Generation by Aligning Emotions and Temporal Boundaries

Autores
Sulun, S; Viana, P; Davies, MEP;

Publicação
CoRR

Abstract
Providing soundtracks for videos remains a costly and time-consuming challenge for multimedia content creators. We introduce EMSYNC, an automatic video-based symbolic music generator that creates music aligned with a video's emotional content and temporal boundaries. It follows a two-stage framework, where a pretrained video emotion classifier extracts emotional features, and a conditional music generator produces MIDI sequences guided by both emotional and temporal cues. We introduce boundary offsets, a novel temporal conditioning mechanism that enables the model to anticipate upcoming video scene cuts and align generated musical chords with them. We also propose a mapping scheme that bridges the discrete categorical outputs of the video emotion classifier with the continuous valence-arousal inputs required by the emotion-conditioned MIDI generator, enabling seamless integration of emotion information across different representations. Our method outperforms state-of-the-art models in objective and subjective evaluations across different video datasets, demonstrating its effectiveness in generating music aligned to video both emotionally and temporally. Our demo and output samples are available at https://serkansulun.com/emsync.

FecharLer Abstract

2025

Converge: towards an efficient multi-modal sensing research infrastructure for next-generation 6 G networks

Autores
Teixeira, FB; Ricardo, M; Coelho, A; Oliveira, HP; Viana, P; Paulino, N; Fontes, H; Marques, P; Campos, R; Pessoa, L;

Publicação
JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING

Abstract
Telecommunications and computer vision solutions have evolved significantly in recent years, allowing a huge advance in the functionalities and applications offered. However, these two fields have been making their way as separate areas, not exploring the potential benefits of merging the innovations brought from each of them. In challenging environments, for example, combining radio sensing and computer vision can strongly contribute to solving problems such as those introduced by obstructions or limited lighting. Machine learning algorithms, able to fuse heterogeneous and multi-modal data, are also a key element for understanding and inferring additional knowledge from raw and low-level data, able to create a new abstracting level that can significantly enhance many applications. This paper introduces the CONVERGE vision-radio concept, a new paradigm that explores the benefits of integrating two fields of knowledge towards the vision of View-to-Communicate, Communicate-to-View. The main concepts behind this vision, including supporting use cases and the proposed architecture, are presented. CONVERGE introduces a set of tools integrating wireless communications and computer vision to create a novel experimental infrastructure that will provide open datasets to the scientific community of both experimental and simulated data, enabling new research addressing various 6 G verticals, including telecommunications, automotive, manufacturing, media, and health.

FecharLer Abstract

2025

A Survey of Recent Advances and Challenges in Deep Audio-Visual Correlation Learning

Autores
Vilaça, L; Yu, Y; Viana, P;

Publicação
ACM COMPUTING SURVEYS

Abstract
Audio-visual correlation learning aims at capturing and understanding natural phenomena between audio and visual data. The rapid growth of Deep Learning propelled the development of proposals that process audio-visual data and can be observed in the number of proposals in the past years. Thus encouraging the development of a comprehensive survey. Besides analyzing the models used in this context, we also discuss some tasks of definition and paradigm applied in AI multimedia. In addition, we investigate objective functions frequently used and discuss how audio-visual data is exploited in the optimization process, i.e., the different methodologies for representing knowledge in the audio-visual domain. In fact, we focus on how human-understandable mechanisms, i.e., structured knowledge that reflects comprehensible knowledge, can guide the learning process. Most importantly, we provide a summarization of the recent progress of Audio-Visual Correlation Learning (AVCL) and discuss the future research directions.

FecharLer Abstract

2025

Correction to: A Review of Recent Advances and Challenges in Grocery Label Detection and Recognition (Applied Sciences, (2023), 13, 5, (2871), 10.3390/app13052871)

Autores
Guimarães, V; Nascimento, J; Viana, P; Carvalho, P;

Publicação
Applied Sciences (Switzerland)

Abstract
There was an error in the original publication [1]. The statement in the Acknowledgments section is incorrect and should be removed because the official start of the project WATSON was after the paper’s publication date. The authors state that the scientific conclusions are unaffected. This correction was approved by the Academic Editor. The original publication has also been updated. © 2025 by the authors.

FecharLer Abstract

2025

Dialogue-AV: A Dialogue-Attended Audiovisual Dataset

Autores
Vilaça, L; Viana, P; Yu, Y;

Publicação
CBMI

Abstract
This work introduces Dialogue-AV, a benchmarking dataset for Audio-Video-Language (AVL). We propose using dialogue to describe video content instead of single captions, capturing nuances and shared meanings between audio and visual elements. This approach contributes significantly to improving the diversity of video descriptions and enables comprehensive evaluation of AVL learning across different downstream tasks, such as Cross-Modal Retrieval, Visual Question-Answering, and Video Captioning. Our dataset comprises approximately 258k audiovisual samples accompanied by dialogue-based descriptions for benchmarking. Dialogue-AV builds upon existing State-of-the-Art (SOTA) datasets that feature human-generated descriptions, enhancing them with model-generated ones that describe all modalities. We also present zero-shot baseline results utilising SOTA Visual-Language Models (VLMs), demonstrating that Dialogue-AV is capable of benchmarking a variety of downstream tasks with diverse inputs. Our key contributions include: 1) Dialogue-AV, a benchmark dataset for dialogue-based AVL models; and 2) benchmarks that expose the limitations of current SOTA VLMs. The code and dataset are accessible at: github.com/lvilaca16/dialogue-av. © 2025 IEEE.

FecharLer Abstract