2024
Authors
Luis Jesus; Sara Castilho; Aníbal JS Ferreira; Maria Conceição Costa;
Publication
ISSP 2024 - 13th International Seminar on Speech Production
Abstract
2025
Authors
Yamamura, F; Scalassara, R; Oliveira, A; Ferreira, JS;
Publication
U.Porto Journal of Engineering
Abstract
Whispers are common and essential for secondary communication. Nonetheless, individuals with aphonia, including laryngectomees, rely on whispers as their primary means of communication. Due to the distinct features between whispered and regular speech, debates have emerged in the field of speech recognition, highlighting the challenge of effectively converting between them. This study investigates the characteristics of whispered speech and proposes a system for converting whispered vowels into normal ones. The system is developed using multilayer perceptron networks and two types of generative adversarial networks. Three metrics are analyzed to evaluate the performance of the system: mel-cepstral distortion, root mean square error of the fundamental frequency, and accuracy with f1-score of a vowel classifier. Overall, the perceptron networks demonstrated better results, with no significant differences observed between male and female voices or the presence/absence of speech silence, except for improved accuracy in estimating the fundamental frequency during the conversion process. © 2025, Universidade do Porto - Faculdade de Engenharia. All rights reserved.
2025
Authors
da Silva, JMPP; Duarte Nunes, G; Ferreira, A;
Publication
Abstract
2024
Authors
Oliveira, M; Santos, V; Saraiva, A; Ferreira, A;
Publication
SIGNALS
Abstract
Many natural signals exhibit quasi-periodic behaviors and are conveniently modeled as combinations of several harmonic sinusoids whose relative frequencies, magnitudes, and phases vary with time. The waveform shapes of those signals reflect important physical phenomena underlying their generation, requiring those parameters to be accurately estimated and modeled. In the literature, accurate phase estimation and modeling have received significantly less attention than frequency or magnitude estimation. This paper first addresses accurate DFT-based phase estimation of individual sinusoids across six scenarios involving two DFT-based filter banks and three different windows. It has been shown that bias in phase estimation is less than 0.001 radians when the SNR is equal to or larger than 2.5 dB. Using the Cram & eacute;r-Rao lower bound as a reference, it has been demonstrated that one particular window offers performance of practical interest by better approximating the CRLB under favorable signal conditions and minimizing performance deviation under adverse conditions. This paper describes the development of a shift-invariant phase-related feature that characterizes the harmonic phase structure. This feature motivates a new signal processing paradigm that greatly simplifies the parametric modeling, transformation, and synthesis of harmonic signals. It also aids in understanding and reverse engineering the phasegram. The theory and results are discussed from a reproducible perspective, with dedicated experiments supported by code, allowing for the replication of figures and results presented in this paper and facilitating further research.
2023
Authors
Oliveira, M; Almeida, V; Silva, J; Ferreira, A;
Publication
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Abstract
Cricket sounds are usually regarded as pleasant and, thus, can be used as suitable test signals in psychoacoustic experiments assessing the human listening acuity to specific temporal and spectral features. In addition, the simple structure of cricket sounds makes them prone to reverse engineering such that they can be analyzed and re-synthesized with desired alterations in their defining parameters. This paper describes cricket sounds from a parametric point of view, characterizes their main temporal and spectral features, namely jitter, shimmer and frequency sweeps, and explains a re-synthesis process generating modified natural cricket sounds. These are subsequently used in listening tests helping to shed light on the sound identification and discrimination capabilities of humans that are important, for example, in voice recognition. © 2023 IEEE.
2023
Authors
Silva, JM; Oliveira, MA; Saraiva, AF; Ferreira, AJS;
Publication
ACOUSTICS
Abstract
The estimation of the frequency of sinusoids has been the object of intense research for more than 40 years. Its importance in classical fields such as telecommunications, instrumentation, and medicine has been extended to numerous specific signal processing applications involving, for example, speech, audio, and music processing. In many cases, these applications run in real-time and, thus, require accurate, fast, and low-complexity algorithms. Taking the normalized Cramer-Rao lower bound as a reference, this paper evaluates the relative performance of nine non-iterative discrete Fourier transform-based individual sinusoid frequency estimators when the target sinusoid is affected by full-bandwidth quasi-harmonic interference, in addition to stationary noise. Three levels of the quasi-harmonic interference severity are considered: no harmonic interference, mild harmonic interference, and strong harmonic interference. Moreover, the harmonic interference is amplitude-modulated and frequency-modulated reflecting real-world conditions, e.g., in singing and musical chords. Results are presented for when the Signal-to-Noise Ratio varies between -10 dB and 70 dB, and they reveal that the relative performance of different frequency estimators depends on the SNR and on the selectivity and leakage of the window that is used, but also changes drastically as a function of the severity of the quasi-harmonic interference. In particular, when this interference is strong, the performance curves of the majority of the tested frequency estimators collapse to a few trends around and above 0.4% of the DFT bin width.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.