2023
Authors
Silva, DTE; Cruz, R; Goncalves, T; Carneiro, D;
Publication
FIFTEENTH INTERNATIONAL CONFERENCE ON MACHINE VISION, ICMV 2022
Abstract
Semantic segmentation consists of classifying each pixel according to a set of classes. This process is particularly slow for high-resolution images, which are present in many applications, ranging from biomedicine to the automotive industry. In this work, we propose an algorithm targeted to segment high-resolution images based on two stages. During stage 1, a lower-resolution interpolation of the image is the input of a first neural network, whose low-resolution output is resized to the original resolution. Then, in stage 2, the probabilities resulting from stage 1 are divided into contiguous patches, with less confident ones being collected and refined by a second neural network. The main novelty of this algorithm is the aggregation of the low-resolution result from stage 1 with the high-resolution patches from stage 2. We propose the U-Net architecture segmentation, evaluated in six databases. Our method shows similar results to the baseline regarding the Dice coefficient, with fewer arithmetic operations.
2023
Authors
Serrano e Silva, P; Cruz, R; Shihavuddin, ASM; Gonçalves, T;
Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Abstract
2023
Authors
Bernardo, G; Bernardes, G;
Publication
Pers. Ubiquitous Comput.
Abstract
2023
Authors
Forero, J; Bernardes, G; Mendes, M;
Publication
AIMC
Abstract
https://aimc2023.pubpub.org/pub/9z68g7d2 Music has been commonly recognized as a means of expressing emotions. In this sense, an intense debate emerges from the need to verbalize musical emotions. This concern seems highly relevant today, considering the exponential growth of natural language processing using deep learning models where it is possible to prompt semantic propositions to generate music automatically. This scoping review aims to analyze and discuss the possibilities of music generation conditioned by emotions. To address this topic, we propose a historical perspective that encompasses the different disciplines and methods contributing to this topic. In detail, we review two main paradigms adopted in automatic music generation: rules-based and machine-learning models. Of note are the deep learning architectures that aim to generate high-fidelity music from textual descriptions. These models raise fundamental questions about the expressivity of music, including whether emotions can be represented with words or expressed through them. We conclude that overcoming the limitation and ambiguity of language to express emotions through music, some of the use of deep learning with natural language has the potential to impact the creative industries by providing powerful tools to prompt and generate new musical works.
2023
Authors
Carvalho, N; Bernardes, G;
Publication
AIMC
Abstract
https://aimc2023.pubpub.org/pub/latent-spaces-tonal-music Variational Autoencoders (VAEs) have proven to be effective models for producing latent representations of cognitive and semantic value. We assess the degree to which VAEs trained on a prototypical tonal music corpus of 371 Bach's chorales define latent spaces representative of the circle of fifths and the hierarchical relation of each key component pitch as drawn in music cognition. In detail, we compare the latent space of different VAE corpus encodings — Piano roll, MIDI, ABC, Tonnetz, DFT of pitch, and pitch class distributions — in providing a pitch space for key relations that align with cognitive distances. We evaluate the model performance of these encodings using objective metrics to capture accuracy, mean square error (MSE), KL- divergence, and computational cost. The ABC encoding performs the best in reconstructing the original data, while the Pitch DFT seems to capture more information from the latent space. Furthermore, an objective evaluation of 12 major or minor transpositions per piece is adopted to quantify the alignment of 1) intra- and inter-segment distances per key and 2) the key distances to cognitive pitch spaces. Our results show that Pitch DFT VAE latent spaces align best with cognitive spaces and provide a common-tone space where overlapping objects within a key are fuzzy clusters, which impose a well-defined order of structural significance or stability — i.e., a tonal hierarchy. Tonal hierarchies of different keys can be used to measure key distances and the relationships of their in-key components at multiple hierarchies (e.g., notes and chords). The implementation of our VAE and the encodings framework are made available online.
2023
Authors
Forero, J; Mendes, M; Bernardes, G;
Publication
KUI
Abstract
This study explores the development of intelligent affective virtual environments generated by bimodal emotion recognition techniques and multimodal feedback. A semantic and acoustic analysis predicts emotions conveyed by spoken language, fostering an expressive and transparent control structure. Textual contents and emotional predictions are mapped to virtual environments in real locations as audiovisual feedback. To demonstrate the application of this system, we developed a case study titled "En train d'oublier,"focusing on a train cemetery in Uyuni, Bolivia. The train cemetery holds historical significance as a site where abandoned trains symbolize the passage of time and the interaction between human activities and nature's reclamation. The space is transformed into an immersive and emotionally poetic experience through oral language and affective virtual environments that activate memories, as the system utilizes the transcribed text to synthesize images and modifies the musical output based on the predicted emotional states. The proposed bimodal emotion recognition techniques achieve 94% and 89% accuracy. The audiovisual mapping strategy allows for considering divergence in predictions generating an intended tension between the graphical and the musical representation. Using video and web art techniques, we experimented with the environments generated to create diverses poetic proposals. © 2023 ACM.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.