2018
Autores
Derogarian, F; Ferreira, JC; Grade Tavares, VM;
Publicação
J. Mobile Multimedia
Abstract
2018
Autores
Santos, PV; Alves, JC; Ferreira, JC;
Publicação
U.Porto Journal of Engineering
Abstract
2018
Autores
Vaz Freitas, S; Pestana, PM; Almeida, V; Ferreira, A;
Publicação
BIOMEDICAL SIGNAL PROCESSING AND CONTROL
Abstract
Objectives: To describe the results of the acoustic analysis of a database of 90 voice samples with distinct dysphonia levels, using four different - commercial and open source - software programs. Study design: Exploratory, transversal. Methods: The samples were analyzed by four different types of software programs that perform acoustical evaluation - one open source software (Praat) and three commercial ones (Multi Dimensional Voice Program - MDVP by Kay Elemetrics; VoiceStudio by Seegnal; and Dr. Speech by Tiger Electronics) - for comparison among the most commonly used acoustic measures (frequency, perturbation and noise measures). Results: There is a moderate to strong,correlation, positive and statistically significant among the software programs. The mean FO is not statistically different among the used applications. The other acoustic measures revealed statistically significant differences. Conclusion: Even though it is easier to access software programs and there are numerous proposals for acoustic measures, not all of them are statistically representative nor have numeric semblance among the different applications.
2018
Autores
Ferreira, AJ;
Publicação
145th Audio Engineering Society International Convention, AES 2018
Abstract
Magnitude-oriented approaches dominate the voice analysis front-ends of most current technologies addressing e.g. speaker identification, speech coding/compression, voice reconstruction and re-synthesis. A popular technique is all-pole vocal tract modeling. The phase response of all-pole models is known to be non-linear and highly dependent on the magnitude frequency response. In this paper, we use a shift-invariant phase-related feature that is estimated from signal harmonics in order to study the impact of all-pole models on the phase structure of voiced sounds. We relate that impact to the phase structure that is found in natural voiced sounds to conclude on the physiological validity of the group delay of all-pole vocal tract modeling. Our findings emphasize that harmonic phase models are idiosyncratic, and this is important in speaker identification, and in fostering the quality and naturalness of synthetic and reconstructed speech. © 2018 KASHYAP.
2018
Autores
Ferreira, A;
Publicação
ICETE 2018 - Proceedings of the 15th International Joint Conference on e-Business and Telecommunications
Abstract
In this paper we report on a number of speaker identification experiments that assume a phonetic-oriented segmentation scheme exists such as to motivate the extraction of psychoacoustically-motivated phase and pitch related features. MFCC features are also considered for benchmarking. An emphasis is given to an innovative shift-invariant phase-related feature that is closely linked to the glottal source. A very simple statistical modeling is proposed and adapted in order to highlight the relative discrimination capabilities of different feature types. Results are presented for individual features and a discussion is also developed regarding possibilities of fusing features at the speaker modeling stage, or fusing distances at the speaker identification stage. Copyright
2018
Autores
Ferreira, AJ; Tribolet, JM;
Publicação
DAFx 2018 - Proceedings: 21st International Conference on Digital Audio Effects
Abstract
This paper addresses a phase-related feature that is time-shift invariant, and that expresses the relative phases of all harmonics with respect to that of the fundamental frequency. We identify the feature as Normalized Relative Delay (NRD) and we show that it is particularly useful to describe the holistic phase properties of voiced sounds produced by a human speaker, notably vowel sounds. We illustrate the NRD feature with real data that is obtained from five sustained vowels uttered by 20 female speakers and 17 male speakers. It is shown that not only NRD coefficients carry idiosyncratic information, but also their estimation is quite stable and robust for all harmonics encompassing, for most vowels, at least the first four formant frequencies. The average NRD model that is estimated using data pertaining to all speakers in our database is compared to that of the idealized Liljencrants-Fant (LF) and Rosenberg glottal models. We also present results on the phase effects of linear-phase FIR and IIR vocal tract filter models when a plausible source excitation is used that corresponds to the derivative of the L-F glottal flow model. These results suggest that the shape of NRD feature vectors is mainly determined by the glottal pulse and only marginally affected by either the group delay of the vocal tract filter model, or by the acoustic coupling between glottis and vocal tract structures. Copyright
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.