Publications

Publications by Aníbal Ferreira

2016

Advances to a frequency-domain parametric coder of wideband speech

Authors
Ferreira, A; Sinha, D;

Publication
140th Audio Engineering Society International Convention 2016, AES 2016

Abstract
In recent years, tools in perceptual coding of high-quality audio have been tailored to capture highly detailed information regarding signal components so that they gained an intrinsic ability to represent audio parametrically. In a recent paper, we described a first validation model to such an approach applied to parametric coding of wideband speech. In this paper we describe specific advances to such an approach that improve coding efficiency and signal quality. A special focus is devoted to the fact that persistent transmission to the decoder of phase information is avoided, to the synthesis of both impulse-like and noise-based plosives using short-term windows, to improved ways of spectral envelope modelling, and to the fact that direct synthesis in the time-domain of the periodic content of speech is allowed in order to cope with fast F0 changes. A few examples of signal coding and transformation illustrate the impact of those improvements.

CloseRead Abstract

2017

CARMIE

Authors
Lobo, J; Ferreira, L; Ferreira, AJ;

Publication
Health Care Delivery and Clinical Science

Abstract
The incidence of chronic diseases is increasing and monitoring patients in a home environment is recommended. Noncompliance with prescribed medication regimens is a concern, especially among older people. Heart failure is a chronic disease that requires patients to follow strict medication plans permanently. With the objective of helping these patients managing information about their medicines and increasing adherence, the personal medication advisor CARMIE was developed as a conversational agent capable of interacting, in Portuguese, with users through spoken natural language. The system architecture is based on a language parser, a dialog manager, and a language generator, integrated with already existing tools for speech recognition and synthesis. All modules work together and interact with the user through an Android application, supporting users to manage information about their prescribed medicines. The authors also present a preliminary usability study and further considerations on CARMIE.

CloseRead Abstract

2018

First experiments on speaker identification combining a new shift-invariant phase-related feature (NRD), MFCCs and F0 information

Authors
Ferreira, A;

Publication
ICETE 2018 - Proceedings of the 15th International Joint Conference on e-Business and Telecommunications

Abstract
In this paper we report on a number of speaker identification experiments that assume a phonetic-oriented segmentation scheme exists such as to motivate the extraction of psychoacoustically-motivated phase and pitch related features. MFCC features are also considered for benchmarking. An emphasis is given to an innovative shift-invariant phase-related feature that is closely linked to the glottal source. A very simple statistical modeling is proposed and adapted in order to highlight the relative discrimination capabilities of different feature types. Results are presented for individual features and a discussion is also developed regarding possibilities of fusing features at the speaker modeling stage, or fusing distances at the speaker identification stage. Copyright

CloseRead Abstract

2019

Phonetic-oriented identification of twin speakers using 4-second vowel sounds and a combination of a shift-invariant phase feature (NRD), MFCCs and F0 information

Authors
Ferreira, AJ;

Publication
2019 AES INTERNATIONAL CONFERENCE ON AUDIO FORENSICS

Abstract
Automatic speaker identification typically relies on sophisticated statistical modeling and classification which requires large amounts of data for good performance. However, in actual audio forensics casework, frequently only a few seconds of speech material are available. In this paper, we favor diversity in feature extraction, simple modeling and classification, and constructive combination of congruent classification scores. We use phase, spectral magnitude and F0-related features in speaker identification experiments on a database of 35 speakers most of whom are twins. Using only 4.4 sec. of vowel-like sounds per speaker, we characterize the performance that is reached with individual features and we characterize simple and yet effective ways of classification score fusion. Insights for further research are also presented.

CloseRead Abstract

2018

A holistic glottal phase-related feature

Authors
Ferreira, AJ; Tribolet, JM;

Publication
DAFx 2018 - Proceedings: 21st International Conference on Digital Audio Effects

Abstract
This paper addresses a phase-related feature that is time-shift invariant, and that expresses the relative phases of all harmonics with respect to that of the fundamental frequency. We identify the feature as Normalized Relative Delay (NRD) and we show that it is particularly useful to describe the holistic phase properties of voiced sounds produced by a human speaker, notably vowel sounds. We illustrate the NRD feature with real data that is obtained from five sustained vowels uttered by 20 female speakers and 17 male speakers. It is shown that not only NRD coefficients carry idiosyncratic information, but also their estimation is quite stable and robust for all harmonics encompassing, for most vowels, at least the first four formant frequencies. The average NRD model that is estimated using data pertaining to all speakers in our database is compared to that of the idealized Liljencrants-Fant (LF) and Rosenberg glottal models. We also present results on the phase effects of linear-phase FIR and IIR vocal tract filter models when a plausible source excitation is used that corresponds to the derivative of the L-F glottal flow model. These results suggest that the shape of NRD feature vectors is mainly determined by the glottal pulse and only marginally affected by either the group delay of the vocal tract filter model, or by the acoustic coupling between glottis and vocal tract structures. Copyright

CloseRead Abstract

2020

IMPACT OF A SHIFT-INVARIANT HARMONIC PHASE MODEL IN FULLY PARAMETRIC HARMONIC VOICE REPRESENTATION AND TIME/FREQUENCY SYNTHESIS

Authors
Ferreira, A; Silva, J; Brito, F; Sinha, D;

Publication
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING

Abstract
Harmonic representation models are widely used, notably in speech coding and synthesis. In this paper, we describe two fully parametric harmonic representation and signal reconstruction alternatives that rely on a shift-invariant harmonic phase model and that implement accurate frame-based synthesis in the frequency-domain, and accurate pitch pulse-based synthesis in the time-domain. We use natural spoken and sung voice signals in order to assess the objective and subjective quality of both alternatives when parameters are exact, and when they are replaced by compact and shift-invariant harmonic phase and magnitude approximation models. We highlight the flexibility of these models and present results indicating that not only does the compact shift-invariant phase model cause a smaller impact than that caused by harmonic magnitude modeling, but it also compares favorably to results presented in the literature.

CloseRead Abstract