Publicacoes - INESC TEC

Publicações

Publicações por CTM

2023

Two-Stage Semantic Segmentation in Neural Networks

Autores
Silva, DTE; Cruz, R; Goncalves, T; Carneiro, D;

Publicação
FIFTEENTH INTERNATIONAL CONFERENCE ON MACHINE VISION, ICMV 2022

Abstract
Semantic segmentation consists of classifying each pixel according to a set of classes. This process is particularly slow for high-resolution images, which are present in many applications, ranging from biomedicine to the automotive industry. In this work, we propose an algorithm targeted to segment high-resolution images based on two stages. During stage 1, a lower-resolution interpolation of the image is the input of a first neural network, whose low-resolution output is resized to the original resolution. Then, in stage 2, the probabilities resulting from stage 1 are divided into contiguous patches, with less confident ones being collected and refined by a second neural network. The main novelty of this algorithm is the aggregation of the low-resolution result from stage 1 with the high-resolution patches from stage 2. We propose the U-Net architecture segmentation, evaluated in six databases. Our method shows similar results to the baseline regarding the Dice coefficient, with fewer arithmetic operations.

FecharLer Abstract

2023

Interpretability-Guided Human Feedback During Neural Network Training

Autores
Serrano e Silva, P; Cruz, R; Shihavuddin, ASM; Gonçalves, T;

Publicação
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract

2023

Leveraging compatibility and diversity in computer-aided music mashup creation

Autores
Bernardo, G; Bernardes, G;

Publicação
Pers. Ubiquitous Comput.

Abstract
AbstractWe advance Mixmash-AIS, a multimodal optimization music mashup creation model for loop recombination at scale. Our motivation is to (1) tackle current scalability limitations in state-of-the-art (brute force) computational mashup models while enforcing the (2) compatibility of audio loops and (3) a pool of diverse mashups that can accommodate user preferences. To this end, we adopt the artificial immune system (AIS) opt-aiNet algorithm to efficiently compute a population of compatible and diverse music mashups from loop recombinations. Optimal mashups result from local minima in a feature space representing harmonic, rhythmic, and spectral musical audio compatibility. We objectively assess the compatibility, diversity, and computational performance of Mixmash-AIS generated mashups compared to a standard genetic algorithm (GA) and a brute force (BF) approach. Furthermore, we conducted a perceptual test to validate the objective evaluation function within Mixmash-AIS in capturing user enjoyment of the computer-generated loop mashups. Our results show that while the GA stands as the most efficient algorithm, the AIS opt-aiNet outperforms both the GA and BF approaches in terms of compatibility and diversity. Our listening test has shown that Mixmash-AIS objective evaluation function significantly captures the perceptual compatibility of loop mashups (p < .001).

FecharLer Abstract

2023

Are words enough? On the semantic conditioning of affective music generation

Autores
Forero, J; Bernardes, G; Mendes, M;

Publicação
AIMC

Abstract
https://aimc2023.pubpub.org/pub/9z68g7d2 Music has been commonly recognized as a means of expressing emotions. In this sense, an intense debate emerges from the need to verbalize musical emotions. This concern seems highly relevant today, considering the exponential growth of natural language processing using deep learning models where it is possible to prompt semantic propositions to generate music automatically. This scoping review aims to analyze and discuss the possibilities of music generation conditioned by emotions. To address this topic, we propose a historical perspective that encompasses the different disciplines and methods contributing to this topic. In detail, we review two main paradigms adopted in automatic music generation: rules-based and machine-learning models. Of note are the deep learning architectures that aim to generate high-fidelity music from textual descriptions. These models raise fundamental questions about the expressivity of music, including whether emotions can be represented with words or expressed through them. We conclude that overcoming the limitation and ambiguity of language to express emotions through music, some of the use of deep learning with natural language has the potential to impact the creative industries by providing powerful tools to prompt and generate new musical works.

FecharLer Abstract

2023

Exploring Latent Spaces of Tonal Music using Variational Autoencoders

Autores
Carvalho, N; Bernardes, G;

Publicação
AIMC

Abstract
https://aimc2023.pubpub.org/pub/latent-spaces-tonal-music Variational Autoencoders (VAEs) have proven to be effective models for producing latent representations of cognitive and semantic value. We assess the degree to which VAEs trained on a prototypical tonal music corpus of 371 Bach's chorales define latent spaces representative of the circle of fifths and the hierarchical relation of each key component pitch as drawn in music cognition. In detail, we compare the latent space of different VAE corpus encodings — Piano roll, MIDI, ABC, Tonnetz, DFT of pitch, and pitch class distributions — in providing a pitch space for key relations that align with cognitive distances. We evaluate the model performance of these encodings using objective metrics to capture accuracy, mean square error (MSE), KL- divergence, and computational cost. The ABC encoding performs the best in reconstructing the original data, while the Pitch DFT seems to capture more information from the latent space. Furthermore, an objective evaluation of 12 major or minor transpositions per piece is adopted to quantify the alignment of 1) intra- and inter-segment distances per key and 2) the key distances to cognitive pitch spaces. Our results show that Pitch DFT VAE latent spaces align best with cognitive spaces and provide a common-tone space where overlapping objects within a key are fuzzy clusters, which impose a well-defined order of structural significance or stability — i.e., a tonal hierarchy. Tonal hierarchies of different keys can be used to measure key distances and the relationships of their in-key components at multiple hierarchies (e.g., notes and chords). The implementation of our VAE and the encodings framework are made available online.

FecharLer Abstract

2023

En train d'oublier: toward affective virtual environments

Autores
Forero, J; Mendes, M; Bernardes, G;

Publicação
KUI

Abstract
This study explores the development of intelligent affective virtual environments generated by bimodal emotion recognition techniques and multimodal feedback. A semantic and acoustic analysis predicts emotions conveyed by spoken language, fostering an expressive and transparent control structure. Textual contents and emotional predictions are mapped to virtual environments in real locations as audiovisual feedback. To demonstrate the application of this system, we developed a case study titled "En train d'oublier,"focusing on a train cemetery in Uyuni, Bolivia. The train cemetery holds historical significance as a site where abandoned trains symbolize the passage of time and the interaction between human activities and nature's reclamation. The space is transformed into an immersive and emotionally poetic experience through oral language and affective virtual environments that activate memories, as the system utilizes the transcribed text to synthesize images and modifies the musical output based on the predicted emotional states. The proposed bimodal emotion recognition techniques achieve 94% and 89% accuracy. The audiovisual mapping strategy allows for considering divergence in predictions generating an intended tension between the graphical and the musical representation. Using video and web art techniques, we experimented with the environments generated to create diverses poetic proposals. © 2023 ACM.

FecharLer Abstract