Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Tópicos
de interesse
Detalhes

Detalhes

  • Nome

    Nádia Sousa Carvalho
  • Cargo

    Assistente de Investigação
  • Desde

    01 outubro 2021
001
Publicações

2025

Motiv: A Dataset of Latent Space Representations of Musical Phrase Motions

Autores
Carvalho, N; Sousa, J; Bernardes, G; Portovedo, H;

Publicação
PROCEEDINGS OF THE 20TH INTERNATIONAL AUDIO MOSTLY CONFERENCE, AM 2025

Abstract
This paper introduces Motiv, a dataset of expert saxophonist recordings illustrating parallel, similar, oblique, and contrary motions. These motions are variations of three phrases from Jesus VillaRojo's Lamento, with controlled similarities. The dataset includes 116 audio samples recorded by four tenor saxophonists, each annotated with descriptions of motions, musical scores, and latent space vectors generated using the VocalSet RAVE model. Motiv enables the analysis of motion types and their geometric relationships in latent spaces. Our preliminary dataset analysis shows that parallel motions align closely with original phrases, while contrary motions exhibit the largest deviations, and oblique motions show mixed patterns. The dataset also highlights the impact of individual performer nuances. Motiv supports a variety of music information retrieval (MIR) tasks, including gesture-based recognition, performance analysis, and motion-driven retrieval. It also provides insights into the relationship between human motion and music, contributing to real-time music interaction and automated performance systems.

2025

Toward Musicologically-Informed Retrieval: Enhancing MEI with Computational Metadata

Autores
Carvalho, Nádia; Bernardes, Gilberto;

Publicação

Abstract
We present a metadata enrichment framework for Music Encoding Initiative (MEI) files, featuring mid- to higher-level multimodal features to support content-driven (similarity) retrieval with semantic awareness across large collections. While traditional metadata captures basic bibliographic and structural elements, it often lacks the depth required for advanced retrieval tasks that rely on musical phrases, form, key or mode, idiosyncratic patterns, and textual topics. To address this, we propose a system that fosters the computational analysis and edition of MEI encodings at scale. Inserting extended metadata derived from computational analysis and heuristic rules lays the groundwork for more nuanced retrieval tools. A batch environment and a lightweight JavaScript web-based application propose a complementary workflow by offering large-scale annotations and an interactive environment for reviewing, validating, and refining MEI files' metadata. Development is informed by user-centered methodologies, including consultations with music editors and digital musicologists, and has been co-designed in the context of orally transmitted folk music traditions, ensuring that both the batch processes and interactive tools align with scholarly and domain-specific needs.

2025

Computational Phrase Segmentation of Iberian Folk Traditions: An Optimized LBDM Model

Autores
Orouji, Amir Abbas; Carvalho, Nadia; Sá Pinto, António; Bernardes, Gilberto;

Publicação

Abstract
Phrase segmentation is a fundamental preprocessing step for computational folk music similarity, specifically in identifying tune families within digital corpora. Furthermore, recent literature increasingly recognizes the need for tradition-specific frameworks that accommodate the structural idiosyncrasies of each tradition. In this context, this study presents a culturally informed adaptation of the established rule-based Local Boundary Detection Model (LBDM) algorithm to underrepresented Iberian folk repertoires. Our methodological enhancement expands the LBDM baseline, which traditionally analyzes rests, pitch intervals, and inter-onset duration functions to identify potential segmentation boundaries, by integrating a sub-structure surface repetition function coupled with an optimized peak-selection algorithm. Furthermore, we implement a genetic algorithm to maximize segmentation accuracy by weighting coefficients for each function while calibrating the meta-parameters of the peak-selection process. Empirical evaluation on the I-Folk digital corpus, comprising 802 symbolically encoded folk melodies from Portuguese and Spanish traditions, demonstrates improvements in segmentation F-measure of six and sixteen percentage points~(p.p.) relative to established baseline methodologies for Portuguese and Spanish repertoires, respectively.

2025

Exploring timbre latent spaces: motion-enhanced sampling for musical co-improvisation

Autores
Carvalho, N; Sousa, J; Portovedo, H; Bernardes, G;

Publicação
INTERNATIONAL JOURNAL OF PERFORMANCE ARTS AND DIGITAL MEDIA

Abstract
This article investigates sampling strategies in latent space navigation to enhance co-creative music systems, focusing on timbre latent spaces. Adopting Villa-Rojo's 'Lamento' for tenor saxophone and tape as a case study, we conducted two experiments. The first assessed traditional corpus-based concatenative synthesis sampling within the RAVE model's latent space, finding that sampling strategies gradually deviate from a given target sonority while still relating to the original morphology. The second experiment aims at defining sampling strategies for creating variations of an input signal, namely parallel, contrary, and oblique motions. The findings expose the need to explore individual model layers and the geometric transformation nature of the contrary and oblique motions that tend to dilate the original shape. The findings highlight the potential of motion-aware sampling for more contextually aware and expressive control of music structures via CBCS.

2024

UNVEILING THE TIMBRE LANDSCAPE: A LAYERED ANALYSIS OF TENOR SAXOPHONE IN RAVE MODELS

Autores
Carvalho, N; Sousa, J; Bernardes, G; Portovedo, H;

Publicação
Proceedings of the Sound and Music Computing Conferences

Abstract
This paper presents a comprehensive investigation into the explainability and creative affordances derived from navigating a latent space generated by Realtime Audio Variational AutoEncoder (RAVE) models. We delve into the intricate layers of the RAVE model's encoder and decoder outputs by leveraging a novel timbre latent space that captures micro-timbral variations from a wide range of saxophone extended techniques. Our analysis dissects each layer's output independently, shedding light on the distinct transformations and representations occurring at different stages of the encoding and decoding processes and their sensitivity to a spectrum of low-to-high-level musical attributes. Remarkably, our findings reveal consistent patterns across various models, with the first layer consistently capturing changes in dynamics while remaining insensitive to pitch or register alterations. By meticulously examining and comparing layer outputs, we elucidate the underlying mechanisms governing saxophone timbre representation within the RAVE framework. These insights not only deepen our understanding of neural network behavior but also offer valuable contributions to the broader fields of music informatics and audio signal processing, ultimately enhancing the degree of transparency and control in co-creative practices within deep learning music frameworks. © 2024. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 Unported License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original.