Publicacoes - INESC TEC

Publicações

Publicações por CTM

2024

EXPLORING SAMPLING STRATEGIES IN LATENT SPACES FOR MUSIC GENERATION

Autores
Carvalho, N; Bernardes, G;

Publicação
Proceedings of the Sound and Music Computing Conferences

Abstract
This paper investigates sampling strategies within latent spaces for music generation, focusing on (chordified) J.S. Bach Chorales and utilizing MusicVAE as the generative model. We conduct an experiment comparing three sampling and interpolation strategies within the latent space to generate chord progressions - from a discrete vocabulary of Bach's chords - to Bach's original chord sequences. Given a three-chord sequence from an original Bach chorale, we assess sampling strategies for replacing the middle chord. In detail, we adopt the following sampling strategies: (1) traditional linear interpolation, (2) k-nearest neighbors, and (3) k-nearest neighbors combined with angular alignment. The study evaluates their alignment with music theory principles of functional harmony embedding and voice-leading to mirror Bach's original chord sequences. Preliminary findings suggest that knearest neighbors and k-nearest neighbors combined with angular alignment closely align with the tonal function of the original chord, with k-nearest neighbors excelling in bass line interpolation and the combined strategy potentially enhancing voice-leading in upper voices. Linear interpolation maintains aspects of voice-leading but confines selections within defined tonal spaces, reflecting the nonlinear characteristics of the original sequences. Our study contributes to the dynamics of latent space sampling for music generation, offering potential avenues for enhancing explainable creative strategies. © 2024. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 Unported License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original.

FecharLer Abstract

2024

BISAID: BIPOLAR SEMANTIC ADJECTIVES ICONS AND EARCONS DATASET

Autores
Cao, Z; Pinto, S; Bernardes, G;

Publicação
Proceedings of the Sound and Music Computing Conferences

Abstract
This paper presents BiSAID, a dataset for exploring bipolar semantic adjectives in non-speech auditory cues, including earcons and auditory icons, i.e., sounds used to signify specific events or relay information in auditory interfaces from recorded or synthetic sources, respectively. In total, our dataset includes 599 non-speech auditory cues with different semantic labels, covering temperature (cold vs. warm), brightness (bright vs. dark), sharpness (sharp vs. dull), shape (curved vs. flat), and accuracy (correct vs. incorrect). Furthermore, we advance a preliminary analysis of brightness and accuracy earcon pairs from the BiSAID dataset to infer idiosyncratic sonic structures of each semantic earcon label from 66 instantaneous low- and mid-level descriptors, covering temporal, spectral, rhythmic, and tonal descriptors. Ultimately, we aim to unveil the relationship between sonic parameters behind earcon design, thus systematizing their structural foundations and shedding light on the metaphorical semantic nature of their description. This exploration revealed that spectral characteristics (e.g. spectral flux and spectral complexity) serve as the most relevant acoustic correlates in differentiating earcons on the dimensions of brightness and accuracy, respectively. The methodology holds great promise for systematizing earcon design and generating hypotheses for in-depth perceptual studies. © 2024. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 Unported License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original.

FecharLer Abstract

2024

Exploring Mode Identification in Irish Folk Music with Unsupervised Machine Learning and Template-Based Techniques

Autores
Navarro-Cáceres, JJ; Carvalho, N; Bernardes, G; Jiménez-Bravo, DM; Navarro-Cáceres, M;

Publicação
MATHEMATICS AND COMPUTATION IN MUSIC, MCM 2024

Abstract
Extensive computational research has been dedicated to detecting keys and modes in tonal Western music within the major and minor modes. Little research has been dedicated to other modes and musical expressions, such as folk or non-Western music. This paper tackles this limitation by comparing traditional template-based with unsupervised machine-learning methods for diatonic mode detection within folk music. Template-based methods are grounded in music theory and cognition and use predefined profiles from which we compare a musical piece. Unsupervised machine learning autonomously discovers patterns embedded in the data. As a case study, the authors apply the methods to a dataset of Irish folk music called The Session on four diatonic modes: Ionian, Dorian, Mixolydian, and Aeolian. Our evaluation assesses the performance of template-based and unsupervised methods, reaching an average accuracy of about 80%. We discuss the applicability of the methods, namely the potential of unsupervised learning to process unknown musical sources beyond modes with predefined templates.

FecharLer Abstract

2024

Fourier Qualia Wavescapes: Hierarchical Analyses of Set Class Quality and Ambiguity

Autores
Pereira, S; Affatato, G; Bernardes, G; Moss, FC;

Publicação
MATHEMATICS AND COMPUTATION IN MUSIC, MCM 2024

Abstract
We introduce a novel perspective on set-class analysis combining the DFT magnitudes with the music visualisation technique of wavescapes. With such a combination, we create a visual representation of a piece's multidimensional qualia, where different colours indicate saliency in chromaticity, diadicity, triadicity, octatonicity, diatonicity, and whole-tone quality. At the centre of our methods are: 1) the formal definition of the Fourier Qualia Space (FQS), 2) its particular ordering of DFT coefficients that delineate regions linked to different musical aesthetics, and 3) the mapping of such regions into a coloured wavescape. Furthermore, we demonstrate the intrinsic capability of the FQS to express qualia ambiguity and map it into a synopsis wavescape. Finally, we showcase the application of our methods by presenting a few analytical remarks on Bach's Three-part Invention BWV 795, Debussy's Reflets dans l'eau, andWebern's Four Pieces for Violin and Piano, Op. 7, No. 1, unveiling increasingly ambiguous wavescapes.

FecharLer Abstract

2024

Fourier (Common-Tone) Phase Spaces are in Tune with Variational Autoencoders' Latent Space

Autores
Carvalho, N; Bernardes, G;

Publicação
MATHEMATICS AND COMPUTATION IN MUSIC, MCM 2024

Abstract
Expanding upon the potential of generative machine learning to create atemporal latent space representations of musical-theoretical and cognitive interest, we delve into their explainability by formulating and testing hypotheses on their alignment with DFT phase spaces from {0, 1}(12) pitch classes and {0, 1}(128) pitch distributions - capturing common-tone tonal functional harmony and parsimonious voice-leading principles, respectively. We use 371 J.S. Bach chorales as a benchmark to train a Variational Autoencoder on a representative piano roll encoding. The Spearman rank correlation between the latent space and the two before-mentioned DFT phase spaces exhibits a robust rank association of approximately .65 +/- .05 for pitch classes and .61 +/- .05 for pitch distributions, denoting an effective preservation of harmonic functional clusters per region and parsimonious voice-leading. Furthermore, our analysis prompts essential inquiries about the stylistic characteristics inferred from the rank deviations to the DFT phase space and the balance between the two DFT phase spaces.

FecharLer Abstract

2024

Modal Pitch Space: A Computational Model of Melodic Pitch Attraction in Folk Music

Autores
Bernardes, G; Carvalho, N;

Publicação
MATHEMATICS AND COMPUTATION IN MUSIC, MCM 2024

Abstract
We introduce a computational model that quantifies melodic pitch attraction in diatonic modal folk music, extending Lerdahl's Tonal Pitch Space. The model incorporates four melodic pitch indicators: vertical embedding distance, horizontal step distance, semitone interval distance, and relative stability. Its scalability is exclusively achieved through prior mode and tonic information, eliminating the need in existing models for additional chordal context. Noteworthy contributions encompass the incorporation of empirically-driven folk music knowledge and the calculation of indicator weights. Empirical evaluation, spanning Dutch, Irish, and Spanish folk traditions across Ionian, Dorian, Mixolydian, and Aeolian modes, uncovers a robust linear relationship between melodic pitch transitions and the pitch attraction model infused with empirically-derived knowledge. Indicator weights demonstrate cross-tradition generalizability, highlighting the significance of vertical embedding distance and relative stability. In contrast, semitone and horizontal step distances assume residual and null functions, respectively.

FecharLer Abstract