2024
Authors
Cao, Z; Pinto, S; Bernardes, G;
Publication
Proceedings of the Sound and Music Computing Conferences
Abstract
This paper presents BiSAID, a dataset for exploring bipolar semantic adjectives in non-speech auditory cues, including earcons and auditory icons, i.e., sounds used to signify specific events or relay information in auditory interfaces from recorded or synthetic sources, respectively. In total, our dataset includes 599 non-speech auditory cues with different semantic labels, covering temperature (cold vs. warm), brightness (bright vs. dark), sharpness (sharp vs. dull), shape (curved vs. flat), and accuracy (correct vs. incorrect). Furthermore, we advance a preliminary analysis of brightness and accuracy earcon pairs from the BiSAID dataset to infer idiosyncratic sonic structures of each semantic earcon label from 66 instantaneous low- and mid-level descriptors, covering temporal, spectral, rhythmic, and tonal descriptors. Ultimately, we aim to unveil the relationship between sonic parameters behind earcon design, thus systematizing their structural foundations and shedding light on the metaphorical semantic nature of their description. This exploration revealed that spectral characteristics (e.g. spectral flux and spectral complexity) serve as the most relevant acoustic correlates in differentiating earcons on the dimensions of brightness and accuracy, respectively. The methodology holds great promise for systematizing earcon design and generating hypotheses for in-depth perceptual studies. © 2024. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 Unported License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original.
2023
Authors
Pinto, AS; Bernardes, G; Davies, MEP;
Publication
Music and Sound Generation in the AI Era - 16th International Symposium, CMMR 2023, Tokyo, Japan, November 13-17, 2023, Revised Selected Papers
Abstract
Deep-learning beat-tracking algorithms have achieved remarkable accuracy in recent years. However, despite these advancements, challenges persist with musical examples featuring complex rhythmic structures, especially given their under-representation in training corpora. Expanding on our prior work, this paper demonstrates how our user-centred beat-tracking methodology effectively handles increasingly demanding musical scenarios. We evaluate its adaptability and robustness through musical pieces that exhibit rhythmic dissonance, while maintaining ease of integration with leading methods through minimal user annotations. The selected musical works—Uruguayan Candombe, Colombian Bambuco, and Steve Reich’s Piano Phase—present escalating levels of rhythmic complexity through their respective polyrhythm, polymetre, and polytempo characteristics. These examples not only validate our method’s effectiveness but also demonstrate its capability across increasingly challenging scenarios, culminating in the novel application of beat tracking to polytempo contexts. The results show notable improvements in terms of the F-measure, ranging from 2 to 5 times the state-of-the-art performance. The beat annotations used in fine-tuning reduce the correction edit operations from 1.4 to 2.8 times, while reducing the global annotation effort to between 16% and 37% of the baseline approach. Our experiments demonstrate the broad applicability of our human-in-the-loop strategy in the domain of Computational Ethnomusicology, confronting the prevalent Music Information Retrieval (MIR) constraints found in non-Western musical scenarios. Beyond beat tracking and computational rhythm analysis, this user-driven adaptation framework suggests wider implications for various MIR technologies, particularly in scenarios where musical signal ambiguity and human subjectivity challenge conventional algorithms. © 2025 Elsevier B.V., All rights reserved.
2025
Authors
Barboza, JR; Bernardes, G; Magalhães, E;
Publication
2025 Immersive and 3D Audio: from Architecture to Automotive (I3DA)
Abstract
Music production has long been characterized by well-defined concepts and techniques. However, a notable gap exists in applying these established principles to music production within immersive media. This paper addresses this gap by examining post-production processes applied to three case studies, i.e., three songs with unique instrumental features and narratives. The primary objective is to facilitate an in-depth analysis of technical and artistic challenges in musical production for immersive media. From a detailed analysis of technical and artistic post-production decisions in the three case studies and a critical examination of theories and techniques from sound design and music production, we propose a framework with a tripartite mixing categorization for immersive media: Traditional Production, Expanded Traditional Production, and Nontraditional Production. These concepts expand music production methodologies in the context of immersive media, offering a framework for understanding the complexities of spatial audio. By exploring these interdisciplinary connections, we aim to enrich the discourse surrounding music production, rethinking its conceptual plane into more integrative media practices outside the core music production paradigm, thus contributing to developing innovative production methodologies. © 2025 IEEE.
2025
Authors
Gea, Daniel; Bernardes, Gilberto;
Publication
Abstract
Building on theories of human sound perception and spatial cognition, this paper introduces a sonification method that facilitates navigation by auditory cues. These cues help users recognize objects and key urban architectural elements, encoding their semantic and spatial properties using non-speech audio signals. The study reviews advances in object detection and sonification methodologies, proposing a novel approach that maps semantic properties (i.e., material, width, interaction level) to timbre, pitch, and gain modulation and spatial properties (i.e., distance, position, elevation) to gain, panning, and melodic sequences. We adopt a three-phase methodology to validate our method. First, we selected sounds to represent the object’s materials based on the acoustic properties of crowdsourced annotated samples. Second, we conducted an online perceptual experiment to evaluate intuitive mappings between sounds and object semantic attributes. Finally, in-person navigation experiments were conducted in virtual reality to assess semantic and spatial recognition. The results demonstrate a notable perceptual differentiation between materials, with a global accuracy of .69 ± .13 and a mean navigation accuracy of .73 ± .16, highlighting the method’s effectiveness. Furthermore, the results suggest a need for improved associations between sounds and objects and reveal demographic factors that are influential in the perception of sounds.
2025
Authors
Santos, Natália; Bernardes, Gilberto;
Publication
Abstract
Music therapy has emerged as a promising approach to support various mental health conditions, offering non-pharmacological therapies with evidence of improved well-being. Rapid advancements in artificial intelligence (AI) have recently opened new possibilities for ‘personalized’ musical interventions in mental health care. This article explores the application of AI in the context of mental health, focusing on the use of machine learning (ML), deep learning (DL), and generative music (GM) to personalize musical interventions. The methodology included a scoping review in the Scopus and PubMed databases, using keywords denoting emerging AI technologies, music-related contexts, and application domains within mental health and well-being. Identified research lines encompass the analysis and generation of emotional patterns in music using ML, DL, and GM techniques to create musical experiences adapted to user needs. The results highlight that these technologies effectively promote emotional and cognitive well-being, enabling personalized interventions that expand mental health therapies.
2025
Authors
Braga, F; Bernardes, G; Dannenberg, RB; Correia, N;
Publication
PROCEEDINGS OF THE THIRTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI 2025)
Abstract
This paper describes an approach to algorithmic music composition that takes narrative structures as input, allowing composers to create music directly from narrative elements. Creating narrative development in music remains a challenging task in algorithmic composition. Our system addresses this by combining leitmotifs to represent characters, generative grammars for harmonic coherence, and evolutionary algorithms to align musical tension with narrative progression. The system operates at different scales, from overall plot structure to individual motifs, enabling both autonomous composition and co-creation with varying degrees of user control. Evaluation with compositions based on tales demonstrated the system's ability to compose music that supports narrative listening and aligns with its source narratives, while being perceived as familiar and enjoyable.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.