Pinto, AS; Böck, S; Cardoso, JS; Davies, MEP;
The extraction of the beat from musical audio signals represents a foundational task in the field of music information retrieval. While great advances in performance have been achieved due the use of deep neural networks, significant shortcomings still remain. In particular, performance is generally much lower on musical content that differs from that which is contained in existing annotated datasets used for neural network training, as well as in the presence of challenging musical conditions such as rubato. In this paper, we positioned our approach to beat tracking from a real-world perspective where an end-user targets very high accuracy on specific music pieces and for which the current state of the art is not effective. To this end, we explored the use of targeted fine-tuning of a state-of-the-art deep neural network based on a very limited temporal region of annotated beat locations. We demonstrated the success of our approach via improved performance across existing annotated datasets and a new annotation-correction approach for evaluation. Furthermore, we highlighted the ability of content-specific fine-tuning to learn both what is and what is not the beat in challenging musical conditions. © 2021 by the authors. Licensee MDPI, Basel, Switzerland.
Pinto, AS; Davies, MEP;
Lecture Notes in Computer Science - Perception, Representations, Image, Sound, Music
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.