Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por CTM

2024

Fourier (Common-Tone) Phase Spaces are in Tune with Variational Autoencoders' Latent Space

Autores
Carvalho, N; Bernardes, G;

Publicação
MATHEMATICS AND COMPUTATION IN MUSIC, MCM 2024

Abstract
Expanding upon the potential of generative machine learning to create atemporal latent space representations of musical-theoretical and cognitive interest, we delve into their explainability by formulating and testing hypotheses on their alignment with DFT phase spaces from {0, 1}(12) pitch classes and {0, 1}(128) pitch distributions - capturing common-tone tonal functional harmony and parsimonious voice-leading principles, respectively. We use 371 J.S. Bach chorales as a benchmark to train a Variational Autoencoder on a representative piano roll encoding. The Spearman rank correlation between the latent space and the two before-mentioned DFT phase spaces exhibits a robust rank association of approximately .65 +/- .05 for pitch classes and .61 +/- .05 for pitch distributions, denoting an effective preservation of harmonic functional clusters per region and parsimonious voice-leading. Furthermore, our analysis prompts essential inquiries about the stylistic characteristics inferred from the rank deviations to the DFT phase space and the balance between the two DFT phase spaces.

2024

Modal Pitch Space: A Computational Model of Melodic Pitch Attraction in Folk Music

Autores
Bernardes, G; Carvalho, N;

Publicação
MATHEMATICS AND COMPUTATION IN MUSIC, MCM 2024

Abstract
We introduce a computational model that quantifies melodic pitch attraction in diatonic modal folk music, extending Lerdahl's Tonal Pitch Space. The model incorporates four melodic pitch indicators: vertical embedding distance, horizontal step distance, semitone interval distance, and relative stability. Its scalability is exclusively achieved through prior mode and tonic information, eliminating the need in existing models for additional chordal context. Noteworthy contributions encompass the incorporation of empirically-driven folk music knowledge and the calculation of indicator weights. Empirical evaluation, spanning Dutch, Irish, and Spanish folk traditions across Ionian, Dorian, Mixolydian, and Aeolian modes, uncovers a robust linear relationship between melodic pitch transitions and the pitch attraction model infused with empirically-derived knowledge. Indicator weights demonstrate cross-tradition generalizability, highlighting the significance of vertical embedding distance and relative stability. In contrast, semitone and horizontal step distances assume residual and null functions, respectively.

2024

UNVEILING THE TIMBRE LANDSCAPE: A LAYERED ANALYSIS OF TENOR SAXOPHONE IN RAVE MODELS

Autores
Carvalho, N; Sousa, J; Bernardes, G; Portovedo, H;

Publicação
Proceedings of the Sound and Music Computing Conferences

Abstract
This paper presents a comprehensive investigation into the explainability and creative affordances derived from navigating a latent space generated by Realtime Audio Variational AutoEncoder (RAVE) models. We delve into the intricate layers of the RAVE model's encoder and decoder outputs by leveraging a novel timbre latent space that captures micro-timbral variations from a wide range of saxophone extended techniques. Our analysis dissects each layer's output independently, shedding light on the distinct transformations and representations occurring at different stages of the encoding and decoding processes and their sensitivity to a spectrum of low-to-high-level musical attributes. Remarkably, our findings reveal consistent patterns across various models, with the first layer consistently capturing changes in dynamics while remaining insensitive to pitch or register alterations. By meticulously examining and comparing layer outputs, we elucidate the underlying mechanisms governing saxophone timbre representation within the RAVE framework. These insights not only deepen our understanding of neural network behavior but also offer valuable contributions to the broader fields of music informatics and audio signal processing, ultimately enhancing the degree of transparency and control in co-creative practices within deep learning music frameworks. © 2024. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 Unported License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original.

2024

ASSESSING MUSICAL PREFERENCES OF CHILDREN ON THE AUTISTIC SPECTRUM: IMPLICATIONS FOR THERAPY

Autores
Santos, N; Bernardes, G; Cotta, R; Coelho, N; Baganha, A;

Publicação
Proceedings of the Sound and Music Computing Conferences

Abstract
Music-based therapies have been yielding favorable clinical outcomes in children with Autism Spectrum Disorder (ASD). However, there is a lack of guidelines for content selection in music-based interventions. In this context, we propose a methodology for conducting experimental studies on musical preferences in children diagnosed with ASD. It consists of a generative music system with seven manipulable musical parameters where participants are encouraged to create music content according to their preferences. We conducted a preliminary transversal study with 24 children in the state of Pará, Brazil. The results suggest preferences for fast tempo, higher pitch, consonance, high event density, and timbres with smooth attacks. Intriguingly, the results revealed inconsistency in the identified preferences across therapy sessions. The critical need for personalized regulation in music-based interventions for children with ASD highlights the unique nature of individual responses, emphasizing the imperative of tailoring therapeutic approaches accordingly. © 2024. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 Unported License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original.

2024

STATISTICAL ANALYSIS OF MUSICAL FEATURES FOR EMOTIONAL SEMANTIC DIFFERENTIATION IN HUMAN AND AI DATABASES

Autores
Braga, F; Forero, J; Bernardes, G;

Publicação
Proceedings of the Sound and Music Computing Conferences

Abstract
Understanding the structural features of perceived musical emotions is crucial for various applications, including content generation and mood-driven playlists. This study performs a comparative statistical analysis to examine the association of a set of musical features with emotions, described using adjectives. The analysis uses two datasets containing rock and pop musical fragments, categorized as human-generated and AI-generated. Focusing on four emotional adjectives (happy, sad, angry, tender-gentle) representing each valence-arousal plane's quadrant, we analyzed semantic differential meanings reported as symmetric pairs for all possible combinations of quadrants through diagonals, vertical, and horizontal axes. The results obtained were discussed based on Livingstone's circular representation of emotional features in music. Our findings demonstrate that the human and AI-generated datasets could be considered equivalent for diagonal symmetries, while horizontal and vertical symmetries show discrepancies. Furthermore, we assessed significant separability for both happy-sad and angry-tender pairs in the human dataset. In contrast, the AI-generated music exhibits a strong differentiation mainly in the angry-gentle pair. © 2024. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 Unported License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original.

2024

EXPLORING SAMPLING STRATEGIES IN LATENT SPACES FOR MUSIC GENERATION

Autores
Carvalho, N; Bernardes, G;

Publicação
Proceedings of the Sound and Music Computing Conferences

Abstract
This paper investigates sampling strategies within latent spaces for music generation, focusing on (chordified) J.S. Bach Chorales and utilizing MusicVAE as the generative model. We conduct an experiment comparing three sampling and interpolation strategies within the latent space to generate chord progressions - from a discrete vocabulary of Bach's chords - to Bach's original chord sequences. Given a three-chord sequence from an original Bach chorale, we assess sampling strategies for replacing the middle chord. In detail, we adopt the following sampling strategies: (1) traditional linear interpolation, (2) k-nearest neighbors, and (3) k-nearest neighbors combined with angular alignment. The study evaluates their alignment with music theory principles of functional harmony embedding and voice-leading to mirror Bach's original chord sequences. Preliminary findings suggest that knearest neighbors and k-nearest neighbors combined with angular alignment closely align with the tonal function of the original chord, with k-nearest neighbors excelling in bass line interpolation and the combined strategy potentially enhancing voice-leading in upper voices. Linear interpolation maintains aspects of voice-leading but confines selections within defined tonal spaces, reflecting the nonlinear characteristics of the original sequences. Our study contributes to the dynamics of latent space sampling for music generation, offering potential avenues for enhancing explainable creative strategies. © 2024. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 Unported License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original.

  • 38
  • 384