Detalhes
Nome
Aníbal FerreiraCargo
Investigador SéniorDesde
22 novembro 1995
Nacionalidade
PortugalCentro
Centro de Telecomunicações e MultimédiaContactos
+351222094299
anibal.ferreira@inesctec.pt
2025
Autores
da Silva, JMPP; Duarte Nunes, G; Ferreira, A;
Publicação
Abstract
2025
Autores
Yamamura, F; Scalassara, R; Oliveira, A; Ferreira, JS;
Publicação
U.Porto Journal of Engineering
Abstract
Whispers are common and essential for secondary communication. Nonetheless, individuals with aphonia, including laryngectomees, rely on whispers as their primary means of communication. Due to the distinct features between whispered and regular speech, debates have emerged in the field of speech recognition, highlighting the challenge of effectively converting between them. This study investigates the characteristics of whispered speech and proposes a system for converting whispered vowels into normal ones. The system is developed using multilayer perceptron networks and two types of generative adversarial networks. Three metrics are analyzed to evaluate the performance of the system: mel-cepstral distortion, root mean square error of the fundamental frequency, and accuracy with f1-score of a vowel classifier. Overall, the perceptron networks demonstrated better results, with no significant differences observed between male and female voices or the presence/absence of speech silence, except for improved accuracy in estimating the fundamental frequency during the conversion process. © 2025, Universidade do Porto - Faculdade de Engenharia. All rights reserved.
2024
Autores
Oliveira, M; Santos, V; Saraiva, A; Ferreira, A;
Publicação
Abstract
2024
Autores
Oliveira, M; Santos, V; Saraiva, A; Ferreira, A;
Publicação
SIGNALS
Abstract
Many natural signals exhibit quasi-periodic behaviors and are conveniently modeled as combinations of several harmonic sinusoids whose relative frequencies, magnitudes, and phases vary with time. The waveform shapes of those signals reflect important physical phenomena underlying their generation, requiring those parameters to be accurately estimated and modeled. In the literature, accurate phase estimation and modeling have received significantly less attention than frequency or magnitude estimation. This paper first addresses accurate DFT-based phase estimation of individual sinusoids across six scenarios involving two DFT-based filter banks and three different windows. It has been shown that bias in phase estimation is less than 0.001 radians when the SNR is equal to or larger than 2.5 dB. Using the Cram & eacute;r-Rao lower bound as a reference, it has been demonstrated that one particular window offers performance of practical interest by better approximating the CRLB under favorable signal conditions and minimizing performance deviation under adverse conditions. This paper describes the development of a shift-invariant phase-related feature that characterizes the harmonic phase structure. This feature motivates a new signal processing paradigm that greatly simplifies the parametric modeling, transformation, and synthesis of harmonic signals. It also aids in understanding and reverse engineering the phasegram. The theory and results are discussed from a reproducible perspective, with dedicated experiments supported by code, allowing for the replication of figures and results presented in this paper and facilitating further research.
2024
Autores
Ferreira, A; Santos, V; Oliveira, M;
Publicação
2024 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS, SIPS
Abstract
The phase response of all-pole (AP) models is known to be non-linear and highly dependent on the frequency response magnitude. The objective and perceptual impact of the group delay of AP models in the synthesis of vowel sounds has not been thoroughly addressed in the literature. In this paper, we use a dedicated frequency-domain framework so as to i) synthesize a plausible glottal excitation setting the ground-truth for the harmonic phase structure and replicating the fundamental frequency contour of natural vowels, ii) synthesize realistic vowel sounds through all-zero (AZ) and all-pole (AP) models sharing the same frequency response magnitude, and iii) assess the objective and perceptual impact of the group delay of AP models taking as a reference natural vowels and, in particular, the ground-truth harmonic phase structure of the glottal excitation. Our findings emphasize that the non-linear phase characteristics of AP models degrade the harmonic phase structure of synthetic vowels significantly beyond what is found in natural vowels, however, that is not always clearly audible.
Teses supervisionadas
2023
Autor
Gonçalo Duarte Nunes
Instituição
UP-FEUP
2023
Autor
Gonçalo Duarte Nunes
Instituição
UP-FEUP
2023
Autor
Gonçalo Duarte Nunes
Instituição
UP-FEUP
2023
Autor
Gonçalo Duarte Nunes
Instituição
UP-FEUP
2023
Autor
Nélio David de Freitas Gonçalves
Instituição
UP-FEUP
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.