Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by Gilberto Bernardes Almeida

2023

Are words enough? On the semantic conditioning of affective music generation

Authors
Forero, Jorge; Bernardes, Gilberto; Mendes, Mónica;

Publication

Abstract
https://aimc2023.pubpub.org/pub/9z68g7d2 Music has been commonly recognized as a means of expressing emotions. In this sense, an intense debate emerges from the need to verbalize musical emotions. This concern seems highly relevant today, considering the exponential growth of natural language processing using deep learning models where it is possible to prompt semantic propositions to generate music automatically. This scoping review aims to analyze and discuss the possibilities of music generation conditioned by emotions. To address this topic, we propose a historical perspective that encompasses the different disciplines and methods contributing to this topic. In detail, we review two main paradigms adopted in automatic music generation: rules-based and machine-learning models. Of note are the deep learning architectures that aim to generate high-fidelity music from textual descriptions. These models raise fundamental questions about the expressivity of music, including whether emotions can be represented with words or expressed through them. We conclude that overcoming the limitation and ambiguity of language to express emotions through music, some of the use of deep learning with natural language has the potential to impact the creative industries by providing powerful tools to prompt and generate new musical works.

2023

Exploring Latent Spaces of Tonal Music using Variational Autoencoders

Authors
Carvalho, Nádia; Bernardes, Gilberto;

Publication

Abstract
https://aimc2023.pubpub.org/pub/latent-spaces-tonal-music Variational Autoencoders (VAEs) have proven to be effective models for producing latent representations of cognitive and semantic value. We assess the degree to which VAEs trained on a prototypical tonal music corpus of 371 Bach's chorales define latent spaces representative of the circle of fifths and the hierarchical relation of each key component pitch as drawn in music cognition. In detail, we compare the latent space of different VAE corpus encodings — Piano roll, MIDI, ABC, Tonnetz, DFT of pitch, and pitch class distributions — in providing a pitch space for key relations that align with cognitive distances. We evaluate the model performance of these encodings using objective metrics to capture accuracy, mean square error (MSE), KL- divergence, and computational cost. The ABC encoding performs the best in reconstructing the original data, while the Pitch DFT seems to capture more information from the latent space. Furthermore, an objective evaluation of 12 major or minor transpositions per piece is adopted to quantify the alignment of 1) intra- and inter-segment distances per key and 2) the key distances to cognitive pitch spaces. Our results show that Pitch DFT VAE latent spaces align best with cognitive spaces and provide a common-tone space where overlapping objects within a key are fuzzy clusters, which impose a well-defined order of structural significance or stability — i.e., a tonal hierarchy. Tonal hierarchies of different keys can be used to measure key distances and the relationships of their in-key components at multiple hierarchies (e.g., notes and chords). The implementation of our VAE and the encodings framework are made available online.

2016

Relational interactive art: A framework for interaction in a social context

Authors
Cabrita N.; Bernardes G.;

Publication
5th Joint Symposium on Computational Aesthetics Sketch Based Interfaces and Modeling and Non Photorealistic Animation and Rendering Expressive 2016 Posters Artworks and Bridging Papers Proceedings

Abstract
Interactive art implies an active dialogue between the participant and the surrounding space, mediated by a computational system. Reciprocity and recursiveness are key principles to the bidirectional flux of information in this setting, guaranteeing a continuous interaction loop between the participant and the digital system. Viewing the human body as a natural interface, we focus on non-invasive tracking methods for embodiment sensing, such as infra-red depth cameras. Current limitations in participant engagement of interactive artworks in public spaces are introduced and analyzed from the perspective of group dynamics. In this paper we approach Bourriaud's concept of relational aesthetics, relate it to the inherent social context of interactive artwork exhibition, and propose a framework for the development of relational interactive artworks. Interactive art implies an active dialogue between the participant and the surrounding space, mediated by a computational system. Reciprocity and recursiveness are key principles to the bidirectional flux of information in this setting, guaranteeing a continuous interaction loop between the participant and the digital system. Viewing the human body as a natural interface, we focus on non-invasive tracking methods for embodiment sensing, such as infra-red depth cameras. Current limitations in participant engagement of interactive artworks in public spaces are introduced and analyzed from the perspective of group dynamics. In this paper we approach Bourriaud's concept of relational aesthetics and relate it to the inherent social context of interactive artwork exhibition, and propose a framework for the development of relational interactive artworks.

2025

Qualia Motion in Fourier Space: Formalizing Linear, Nondirected and Contrapuntal Ambiguity in Schoenberg's Op. 19, No. 1

Authors
Pereira, S; Bernardes, G; Martins, JO;

Publication
Music Theory Spectrum

Abstract
Abstract In this article, we formalize and analyze qualia motion, i.e., the process by which a composition transitions across distinct harmonic qualities through the Fourier qualia space (FQS)—a multidimensional and transposition-independent space based on the discrete Fourier transform (DFT) coefficients’ magnitude. In the FQS, the plot of set classes relies on their harmonic qualities—such as diatonicity and octatonicity—enabling us to (1) identify the pitch-class set in a musical phrase that best represents its qualia—a reference sonority; (2) define a harmonic progression using all sequential reference sonorities in a piece; (3) visualize trajectory in space; and (4) establish a statistical metric for the ambiguity of harmonic qualia. Finally, we discuss Schoenberg's Op. 19, No. 1, analyzing the sense of its harmonic path. The proposed space leverages a bipartite, symmetrical, and consequential structure and unveils ambiguity as an element of nondirected linearity and counterpoint.

2025

Motiv: A Dataset of Latent Space Representations of Musical Phrase Motions

Authors
Carvalho, N; Sousa, J; Bernardes, G; Portovedo, H;

Publication
Proceedings of the 20th International Audio Mostly Conference

Abstract
This paper introduces Motiv, a dataset of expert saxophonist recordings illustrating parallel, similar, oblique, and contrary motions. These motions are variations of three phrases from Jesús Villa-Rojo's "Lamento,"with controlled similarities. The dataset includes 116 audio samples recorded by four tenor saxophonists, each annotated with descriptions of motions, musical scores, and latent space vectors generated using the VocalSet RAVE model. Motiv enables the analysis of motion types and their geometric relationships in latent spaces. Our preliminary dataset analysis shows that parallel motions align closely with original phrases, while contrary motions exhibit the largest deviations, and oblique motions show mixed patterns. The dataset also highlights the impact of individual performer nuances. Motiv supports a variety of music information retrieval (MIR) tasks, including gesture-based recognition, performance analysis, and motion-driven retrieval. It also provides insights into the relationship between human motion and music, contributing to real-time music interaction and automated performance systems. © 2025 Copyright held by the owner/author(s).

2024

UNVEILING THE TIMBRE LANDSCAPE: A LAYERED ANALYSIS OF TENOR SAXOPHONE IN RAVE MODELS

Authors
Carvalho, N; Sousa, J; Bernardes, G; Portovedo, H;

Publication
Proceedings of the Sound and Music Computing Conferences

Abstract
This paper presents a comprehensive investigation into the explainability and creative affordances derived from navigating a latent space generated by Realtime Audio Variational AutoEncoder (RAVE) models. We delve into the intricate layers of the RAVE model's encoder and decoder outputs by leveraging a novel timbre latent space that captures micro-timbral variations from a wide range of saxophone extended techniques. Our analysis dissects each layer's output independently, shedding light on the distinct transformations and representations occurring at different stages of the encoding and decoding processes and their sensitivity to a spectrum of low-to-high-level musical attributes. Remarkably, our findings reveal consistent patterns across various models, with the first layer consistently capturing changes in dynamics while remaining insensitive to pitch or register alterations. By meticulously examining and comparing layer outputs, we elucidate the underlying mechanisms governing saxophone timbre representation within the RAVE framework. These insights not only deepen our understanding of neural network behavior but also offer valuable contributions to the broader fields of music informatics and audio signal processing, ultimately enhancing the degree of transparency and control in co-creative practices within deep learning music frameworks. © 2024. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 Unported License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original.

  • 8
  • 15