Publicacoes - INESC TEC

Publicações

Publicações por Aníbal Ferreira

2023

Analysis and Re-Synthesis of Natural Cricket Sounds Assessing the Perceptual Relevance of Idiosyncratic Parameters

Autores
Oliveira, M; Almeida, V; Silva, J; Ferreira, A;

Publicação
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Abstract
Cricket sounds are usually regarded as pleasant and, thus, can be used as suitable test signals in psychoacoustic experiments assessing the human listening acuity to specific temporal and spectral features. In addition, the simple structure of cricket sounds makes them prone to reverse engineering such that they can be analyzed and re-synthesized with desired alterations in their defining parameters. This paper describes cricket sounds from a parametric point of view, characterizes their main temporal and spectral features, namely jitter, shimmer and frequency sweeps, and explains a re-synthesis process generating modified natural cricket sounds. These are subsequently used in listening tests helping to shed light on the sound identification and discrimination capabilities of humans that are important, for example, in voice recognition. © 2023 IEEE.

FecharLer Abstract

2024

Demystifying DFT-Based Harmonic Phase Estimation, Transformation, and Synthesis

Autores
Oliveira, M; Santos, V; Saraiva, A; Ferreira, A;

Publicação

Abstract
Many natural signals exhibit a quasi-periodic behavior and are conveniently modeled as a combination of several harmonic sinusoids whose relative frequencies, magnitudes and phases vary with time. The waveform shape of those signals reflects important physical phenomena underlying their generation, which requires that those parameters be accurately estimated and modeled. In the literature, accurate phase estimation and modeling has received much less research effort than frequency estimation, or magnitude estimation. First, this paper addresses accurate DFT-based phase estimation of individual sinusoids in six scenarios involving two DFT-based filter banks and three different windows. It is shown that bias in phase estimation is less than 1E-3 radians when the SNR is equal to or larger than 2.5 dB. Taking as a reference the Cramér-Rao Lower Bound, it is shown that one particular window offers a performance of practical interest by approximating better the CRLB when signal conditions are favorable, and by minimizing the performance deviation when signal conditions are adverse. Second, this paper explains how a shift-invariant phase-related feature can be devised that characterizes harmonic phase structure, which motivates a signal processing paradigm that greatly simplifies parametric modeling, transformation and synthesis of harmonics signals, in addition to facilitating the understanding and reverse engineering of the phasegram. Theory and results are discussed in a reproducible perspective using dedicated experiments that are supported with code allowing not only to replicate figures and results in this paper, but also to expand research.

FecharLer Abstract

2024

Demystifying DFT-Based Harmonic Phase Estimation, Transformation, and Synthesis

Autores
Oliveira, M; Santos, V; Saraiva, A; Ferreira, A;

Publicação
SIGNALS

Abstract
Many natural signals exhibit quasi-periodic behaviors and are conveniently modeled as combinations of several harmonic sinusoids whose relative frequencies, magnitudes, and phases vary with time. The waveform shapes of those signals reflect important physical phenomena underlying their generation, requiring those parameters to be accurately estimated and modeled. In the literature, accurate phase estimation and modeling have received significantly less attention than frequency or magnitude estimation. This paper first addresses accurate DFT-based phase estimation of individual sinusoids across six scenarios involving two DFT-based filter banks and three different windows. It has been shown that bias in phase estimation is less than 0.001 radians when the SNR is equal to or larger than 2.5 dB. Using the Cram & eacute;r-Rao lower bound as a reference, it has been demonstrated that one particular window offers performance of practical interest by better approximating the CRLB under favorable signal conditions and minimizing performance deviation under adverse conditions. This paper describes the development of a shift-invariant phase-related feature that characterizes the harmonic phase structure. This feature motivates a new signal processing paradigm that greatly simplifies the parametric modeling, transformation, and synthesis of harmonic signals. It also aids in understanding and reverse engineering the phasegram. The theory and results are discussed from a reproducible perspective, with dedicated experiments supported by code, allowing for the replication of figures and results presented in this paper and facilitating further research.

FecharLer Abstract

2025

A Review of Voicing Decision in Whispered Speech: From Rules to Machine Learning

Autores
da Silva, JMPP; Duarte Nunes, G; Ferreira, A;

Publicação

Abstract

2024

On the mismatch between the phase structure of all-pole-based synthetic vowels and natural vowels

Autores
Ferreira, A; Santos, V; Oliveira, M;

Publicação
2024 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS, SIPS

Abstract
The phase response of all-pole (AP) models is known to be non-linear and highly dependent on the frequency response magnitude. The objective and perceptual impact of the group delay of AP models in the synthesis of vowel sounds has not been thoroughly addressed in the literature. In this paper, we use a dedicated frequency-domain framework so as to i) synthesize a plausible glottal excitation setting the ground-truth for the harmonic phase structure and replicating the fundamental frequency contour of natural vowels, ii) synthesize realistic vowel sounds through all-zero (AZ) and all-pole (AP) models sharing the same frequency response magnitude, and iii) assess the objective and perceptual impact of the group delay of AP models taking as a reference natural vowels and, in particular, the ground-truth harmonic phase structure of the glottal excitation. Our findings emphasize that the non-linear phase characteristics of AP models degrade the harmonic phase structure of synthetic vowels significantly beyond what is found in natural vowels, however, that is not always clearly audible.

FecharLer Abstract

2024

Attributes Associated with Consonantal Place and Voicing in Whispered Speech

Autores
Luis Jesus; Sara Castilho; Aníbal JS Ferreira; Maria Conceição Costa;

Publicação
ISSP 2024 - 13th International Seminar on Speech Production

Abstract