2020
Autores
Silva, JP; Oliveira, MA; Cardoso, CF; Ferreira, AJ;
Publicação
IEEE Workshop on Signal Processing Systems, SiPS: Design and Implementation
Abstract
In this paper, we present a computationally efficient and fully parametric harmonic speech model that is suitable for real-time flexible frame-based analysis and synthesis implementation in the frequency domain. We carry out a performance comparison between this vocoder and similar ones, such as WORLD and HPMD. Then, a deliberate manipulation of the speaker's fundamental frequency micro-variations is performed in order to understand in which way it conveys prosodic and idiosyncratic information. We conclude our discussion by evaluating the impact of these manipulations through the realization of perceptual tests. © 2020 IEEE.
2021
Autores
Silva, J; Oliveira, M; Ferreira, A;
Publicação
28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020)
Abstract
Whispered-voice to normal-voice conversion is typically achieved using codec-based analysis and re-synthesis, using statistical conversion of important spectral and prosodic features, or using data-driven end-to-end signal conversion. These approaches are however highly constrained by the architecture of the codec, the statistical projection, or the size and quality of the training data. In this paper, we presume direct implantation of voiced phonemes in whispered speech and we focus on fully flexible parametric models that i) can be independently controlled, ii) synthesize natural and linguistically correct voiced phonemes, iii) preserve idiosyncratic characteristics of a given speaker, and iv) are amenable to co-articulation effects through simple model interpolation. We use natural spoken and sung vowels to illustrate these capabilities in a signal modeling and re-synthesis process where spectral magnitude, phase structure, F-0 contour and sound morphing can be independently controlled in arbitrary ways.
2015
Autores
Ferreira, A; Sinha, D;
Publicação
139th Audio Engineering Society International Convention, AES 2015
Abstract
Narrow band parametric speech coding and wideband audio coding represent opposite coding paradigms involving audible information, namely in terms of the specificity of the audio material, target bit rates, audio quality and application scenarios. In this paper we explore a new avenue addressing parametric coding of wideband speech, using the potential and accuracy provided by frequency-domain signal analysis and modeling techniques that typically belong to the realm of high-quality audio coding. A first analysis-synthesis validation framework is described that illustrates the decomposition, parametric representation and synthesis of perceptually and linguistically relevant speech components while preserving naturalness and speaker specific information.
2025
Autores
Ferreira, JS; Jesus, MT; Leal, LM; Spratley, JEF;
Publicação
Journal of Voice
Abstract
This paper addresses two challenges that are intertwined and are key in informing signal processing methods restoring natural (voiced) speech from whispered speech. The first challenge involves characterizing and modeling the evolution of the harmonic phase/magnitude structure of a sequence of individual pitch periods in a voiced region of natural speech comprising sustained or co-articulated vowels. A novel algorithm segmenting individual pitch pulses is proposed, which is then used to obtain illustrative results highlighting important differences between sustained and co-articulated vowels, and suggesting practical synthetic voicing approaches. The second challenge involves model-based synthetic voicing restoration in real-time and on-the-fly. Three implementation alternatives are described that differ in their signal reconstruction approaches: frequency-domain, combined frequency- and time-domain, and physiologically inspired filtering of glottal excitation pulses individually generated. The three alternatives are compared objectively using illustrative examples, and subjectively using the results of listening tests involving synthetic voicing of sustained and co-articulated vowels in word context. © 2025 Elsevier B.V., All rights reserved.
2024
Autores
Ferreira, A; Santos, V; Oliveira, M;
Publicação
2024 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS, SIPS
Abstract
The phase response of all-pole (AP) models is known to be non-linear and highly dependent on the frequency response magnitude. The objective and perceptual impact of the group delay of AP models in the synthesis of vowel sounds has not been thoroughly addressed in the literature. In this paper, we use a dedicated frequency-domain framework so as to i) synthesize a plausible glottal excitation setting the ground-truth for the harmonic phase structure and replicating the fundamental frequency contour of natural vowels, ii) synthesize realistic vowel sounds through all-zero (AZ) and all-pole (AP) models sharing the same frequency response magnitude, and iii) assess the objective and perceptual impact of the group delay of AP models taking as a reference natural vowels and, in particular, the ground-truth harmonic phase structure of the glottal excitation. Our findings emphasize that the non-linear phase characteristics of AP models degrade the harmonic phase structure of synthetic vowels significantly beyond what is found in natural vowels, however, that is not always clearly audible.
2024
Autores
Luis Jesus; Sara Castilho; Aníbal JS Ferreira; Maria Conceição Costa;
Publicação
ISSP 2024 - 13th International Seminar on Speech Production
Abstract
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.