Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by Gilberto Bernardes Almeida

2023

Challenging Beat Tracking: Tackling Polyrhythm, Polymetre, and Polytempo with Human-in-the-Loop Adaptation

Authors
Pinto, AS; Bernardes, G; Davies, MEP;

Publication
Music and Sound Generation in the AI Era - 16th International Symposium, CMMR 2023, Tokyo, Japan, November 13-17, 2023, Revised Selected Papers

Abstract
Deep-learning beat-tracking algorithms have achieved remarkable accuracy in recent years. However, despite these advancements, challenges persist with musical examples featuring complex rhythmic structures, especially given their under-representation in training corpora. Expanding on our prior work, this paper demonstrates how our user-centred beat-tracking methodology effectively handles increasingly demanding musical scenarios. We evaluate its adaptability and robustness through musical pieces that exhibit rhythmic dissonance, while maintaining ease of integration with leading methods through minimal user annotations. The selected musical works—Uruguayan Candombe, Colombian Bambuco, and Steve Reich’s Piano Phase—present escalating levels of rhythmic complexity through their respective polyrhythm, polymetre, and polytempo characteristics. These examples not only validate our method’s effectiveness but also demonstrate its capability across increasingly challenging scenarios, culminating in the novel application of beat tracking to polytempo contexts. The results show notable improvements in terms of the F-measure, ranging from 2 to 5 times the state-of-the-art performance. The beat annotations used in fine-tuning reduce the correction edit operations from 1.4 to 2.8 times, while reducing the global annotation effort to between 16% and 37% of the baseline approach. Our experiments demonstrate the broad applicability of our human-in-the-loop strategy in the domain of Computational Ethnomusicology, confronting the prevalent Music Information Retrieval (MIR) constraints found in non-Western musical scenarios. Beyond beat tracking and computational rhythm analysis, this user-driven adaptation framework suggests wider implications for various MIR technologies, particularly in scenarios where musical signal ambiguity and human subjectivity challenge conventional algorithms. © 2025 Elsevier B.V., All rights reserved.

2025

A Tripartite Framework for Immersive Music Production: Concepts and Methodologies

Authors
José Ricardo Barboza; Gilberto Bernardes; Eduardo Magalhães;

Publication
2025 Immersive and 3D Audio: from Architecture to Automotive (I3DA)

Abstract

2025

Semantic and Spatial Sound-Object Recognition for Assistive Navigation

Authors
Gea, Daniel; Bernardes, Gilberto;

Publication

Abstract
Building on theories of human sound perception and spatial cognition, this paper introduces a sonification method that facilitates navigation by auditory cues. These cues help users recognize objects and key urban architectural elements, encoding their semantic and spatial properties using non-speech audio signals. The study reviews advances in object detection and sonification methodologies, proposing a novel approach that maps semantic properties (i.e., material, width, interaction level) to timbre, pitch, and gain modulation and spatial properties (i.e., distance, position, elevation) to gain, panning, and melodic sequences. We adopt a three-phase methodology to validate our method. First, we selected sounds to represent the object’s materials based on the acoustic properties of crowdsourced annotated samples. Second, we conducted an online perceptual experiment to evaluate intuitive mappings between sounds and object semantic attributes. Finally, in-person navigation experiments were conducted in virtual reality to assess semantic and spatial recognition. The results demonstrate a notable perceptual differentiation between materials, with a global accuracy of .69 ± .13 and a mean navigation accuracy of .73 ± .16, highlighting the method’s effectiveness. Furthermore, the results suggest a need for improved associations between sounds and objects and reveal demographic factors that are influential in the perception of sounds.

2025

A Scoping Review of Emerging AI Technologies in Mental Health Care: Towards Personalized Music Therapy

Authors
Santos, Natália; Bernardes, Gilberto;

Publication

Abstract
Music therapy has emerged as a promising approach to support various mental health conditions, offering non-pharmacological therapies with evidence of improved well-being. Rapid advancements in artificial intelligence (AI) have recently opened new possibilities for ‘personalized’ musical interventions in mental health care. This article explores the application of AI in the context of mental health, focusing on the use of machine learning (ML), deep learning (DL), and generative music (GM) to personalize musical interventions. The methodology included a scoping review in the Scopus and PubMed databases, using keywords denoting emerging AI technologies, music-related contexts, and application domains within mental health and well-being. Identified research lines encompass the analysis and generation of emotional patterns in music using ML, DL, and GM techniques to create musical experiences adapted to user needs. The results highlight that these technologies effectively promote emotional and cognitive well-being, enabling personalized interventions that expand mental health therapies.

2024

UNVEILING THE TIMBRE LANDSCAPE: A LAYERED ANALYSIS OF TENOR SAXOPHONE IN RAVE MODELS

Authors
Carvalho, N; Sousa, J; Bernardes, G; Portovedo, H;

Publication
Proceedings of the Sound and Music Computing Conferences

Abstract
This paper presents a comprehensive investigation into the explainability and creative affordances derived from navigating a latent space generated by Realtime Audio Variational AutoEncoder (RAVE) models. We delve into the intricate layers of the RAVE model's encoder and decoder outputs by leveraging a novel timbre latent space that captures micro-timbral variations from a wide range of saxophone extended techniques. Our analysis dissects each layer's output independently, shedding light on the distinct transformations and representations occurring at different stages of the encoding and decoding processes and their sensitivity to a spectrum of low-to-high-level musical attributes. Remarkably, our findings reveal consistent patterns across various models, with the first layer consistently capturing changes in dynamics while remaining insensitive to pitch or register alterations. By meticulously examining and comparing layer outputs, we elucidate the underlying mechanisms governing saxophone timbre representation within the RAVE framework. These insights not only deepen our understanding of neural network behavior but also offer valuable contributions to the broader fields of music informatics and audio signal processing, ultimately enhancing the degree of transparency and control in co-creative practices within deep learning music frameworks. © 2024. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 Unported License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original.

2025

Explicit Tonal Tension Conditioning via Dual-Level Beam Search for Symbolic Music Generation

Authors
Ebrahimzadeh, Maral; Bernardes, Gilberto; Stober, Sebastian;

Publication

Abstract
State-of-the-art symbolic music generation models have recently achieved remarkable output quality, yet explicit control over compositional features, such as tonal tension, remains challenging. We propose a novel approach that integrates a computational tonal tension model, based on tonal interval vector analysis, into a Transformer framework. Our method employs a two-level beam search strategy during inference. At the token level, generated candidates are re-ranked using model probability and diversity metrics to maintain overall quality. At the bar level, a tension-based re-ranking is applied to ensure that the generated music aligns with a desired tension curve. Objective evaluations indicate that our approach effectively modulates tonal tension, and subjective listening tests confirm that the system produces outputs that align with the target tension. These results demonstrate that explicit tension conditioning through a dual-level beam search provides a powerful and intuitive tool to guide AI-generated music. Furthermore, our experiments demonstrate that our method can generate multiple distinct musical interpretations under the same tension condition.

  • 13
  • 14