Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Interest
Topics
Details

Details

  • Name

    Aníbal Ferreira
  • Role

    Senior Researcher
  • Since

    22nd November 1995
Publications

2025

A Review of Voicing Decision in Whispered Speech: From Rules to Machine Learning

Authors
da Silva, JMPP; Duarte Nunes, G; Ferreira, A;

Publication

Abstract

2025

Neural network models for whisper to normal speech conversion

Authors
Yamamura, F; Scalassara, R; Oliveira, A; Ferreira, JS;

Publication
U.Porto Journal of Engineering

Abstract
Whispers are common and essential for secondary communication. Nonetheless, individuals with aphonia, including laryngectomees, rely on whispers as their primary means of communication. Due to the distinct features between whispered and regular speech, debates have emerged in the field of speech recognition, highlighting the challenge of effectively converting between them. This study investigates the characteristics of whispered speech and proposes a system for converting whispered vowels into normal ones. The system is developed using multilayer perceptron networks and two types of generative adversarial networks. Three metrics are analyzed to evaluate the performance of the system: mel-cepstral distortion, root mean square error of the fundamental frequency, and accuracy with f1-score of a vowel classifier. Overall, the perceptron networks demonstrated better results, with no significant differences observed between male and female voices or the presence/absence of speech silence, except for improved accuracy in estimating the fundamental frequency during the conversion process. © 2025, Universidade do Porto - Faculdade de Engenharia. All rights reserved.

2025

Accurate Analysis of the Pitch Pulse-Based Magnitude/Phase Structure of Natural Vowels and Assessment of Three Lightweight Time/Frequency Voicing Restoration Methods

Authors
Ferreira, JS; Jesus, MT; Leal, LM; Spratley, JEF;

Publication
Journal of Voice

Abstract
This paper addresses two challenges that are intertwined and are key in informing signal processing methods restoring natural (voiced) speech from whispered speech. The first challenge involves characterizing and modeling the evolution of the harmonic phase/magnitude structure of a sequence of individual pitch periods in a voiced region of natural speech comprising sustained or co-articulated vowels. A novel algorithm segmenting individual pitch pulses is proposed, which is then used to obtain illustrative results highlighting important differences between sustained and co-articulated vowels, and suggesting practical synthetic voicing approaches. The second challenge involves model-based synthetic voicing restoration in real-time and on-the-fly. Three implementation alternatives are described that differ in their signal reconstruction approaches: frequency-domain, combined frequency- and time-domain, and physiologically inspired filtering of glottal excitation pulses individually generated. The three alternatives are compared objectively using illustrative examples, and subjectively using the results of listening tests involving synthetic voicing of sustained and co-articulated vowels in word context. © 2025 Elsevier B.V., All rights reserved.

2024

Demystifying DFT-Based Harmonic Phase Estimation, Transformation, and Synthesis

Authors
Oliveira, M; Santos, V; Saraiva, A; Ferreira, A;

Publication

Abstract
Many natural signals exhibit a quasi-periodic behavior and are conveniently modeled as a combination of several harmonic sinusoids whose relative frequencies, magnitudes and phases vary with time. The waveform shape of those signals reflects important physical phenomena underlying their generation, which requires that those parameters be accurately estimated and modeled. In the literature, accurate phase estimation and modeling has received much less research effort than frequency estimation, or magnitude estimation. First, this paper addresses accurate DFT-based phase estimation of individual sinusoids in six scenarios involving two DFT-based filter banks and three different windows. It is shown that bias in phase estimation is less than 1E-3 radians when the SNR is equal to or larger than 2.5 dB. Taking as a reference the Cramér-Rao Lower Bound, it is shown that one particular window offers a performance of practical interest by approximating better the CRLB when signal conditions are favorable, and by minimizing the performance deviation when signal conditions are adverse. Second, this paper explains how a shift-invariant phase-related feature can be devised that characterizes harmonic phase structure, which motivates a signal processing paradigm that greatly simplifies parametric modeling, transformation and synthesis of harmonics signals, in addition to facilitating the understanding and reverse engineering of the phasegram. Theory and results are discussed in a reproducible perspective using dedicated experiments that are supported with code allowing not only to replicate figures and results in this paper, but also to expand research.

2024

Demystifying DFT-Based Harmonic Phase Estimation, Transformation, and Synthesis

Authors
Oliveira, M; Santos, V; Saraiva, A; Ferreira, A;

Publication
SIGNALS

Abstract
Many natural signals exhibit quasi-periodic behaviors and are conveniently modeled as combinations of several harmonic sinusoids whose relative frequencies, magnitudes, and phases vary with time. The waveform shapes of those signals reflect important physical phenomena underlying their generation, requiring those parameters to be accurately estimated and modeled. In the literature, accurate phase estimation and modeling have received significantly less attention than frequency or magnitude estimation. This paper first addresses accurate DFT-based phase estimation of individual sinusoids across six scenarios involving two DFT-based filter banks and three different windows. It has been shown that bias in phase estimation is less than 0.001 radians when the SNR is equal to or larger than 2.5 dB. Using the Cram & eacute;r-Rao lower bound as a reference, it has been demonstrated that one particular window offers performance of practical interest by better approximating the CRLB under favorable signal conditions and minimizing performance deviation under adverse conditions. This paper describes the development of a shift-invariant phase-related feature that characterizes the harmonic phase structure. This feature motivates a new signal processing paradigm that greatly simplifies the parametric modeling, transformation, and synthesis of harmonic signals. It also aids in understanding and reverse engineering the phasegram. The theory and results are discussed from a reproducible perspective, with dedicated experiments supported by code, allowing for the replication of figures and results presented in this paper and facilitating further research.

Supervised
thesis

2023

Whispered speech segmentation based on Deep Learning

Author
Gonçalo Duarte Nunes

Institution
UP-FEUP

2023

Vozeamento sintético de voz disfónica através da síntese digital de estruturas harmónicas em tempo real

Author
Nélio David de Freitas Gonçalves

Institution
UP-FEUP

2023

Whispered speech segmentation based on Deep Learning

Author
Gonçalo Duarte Nunes

Institution
UP-FEUP

2023

Dysphonic to natural voice reconstruction based on adaptive phonetic segmentation and synthetic implantation

Author
João Miguel Pinto Pereira da Silva

Institution
UP-FEUP

2023

Whispered speech segmentation based on Deep Learning

Author
Gonçalo Duarte Nunes

Institution
UP-FEUP