Cookies
Usamos cookies para melhorar nosso site e a sua experiência. Ao continuar a navegar no site, você aceita a nossa política de cookies. Ver mais
Aceitar Rejeitar
  • Menu
Tópicos
de interesse
Detalhes

Detalhes

Publicações

2021

Flexible parametric implantation of voicing in whispered speech under scarce training data

Autores
Silva, J; Oliveira, M; Ferreira, A;

Publicação
European Signal Processing Conference

Abstract
Whispered-voice to normal-voice conversion is typically achieved using codec-based analysis and re-synthesis, using statistical conversion of important spectral and prosodic features, or using data-driven end-to-end signal conversion. These approaches are however highly constrained by the architecture of the codec, the statistical projection, or the size and quality of the training data. In this paper, we presume direct implantation of voiced phonemes in whispered speech and we focus on fully flexible parametric models that i) can be independently controlled, ii) synthesize natural and linguistically correct voiced phonemes, iii) preserve idiosyncratic characteristics of a given speaker, and iv) are amenable to co-articulation effects through simple model interpolation. We use natural spoken and sung vowels to illustrate these capabilities in a signal modeling and re-synthesis process where spectral magnitude, phase structure, F0 contour and sound morphing can be independently controlled in arbitrary ways.

2020

Impact of a shift-invariant harmonic phase model in fully parametric harmonic voice representation and time/frequency synthesis

Autores
Ferreira, A; Silva, J; Brito, F; Sinha, D;

Publicação
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Abstract
Harmonic representation models are widely used, notably in speech coding and synthesis. In this paper, we describe two fully parametric harmonic representation and signal reconstruction alternatives that rely on a shift-invariant harmonic phase model and that implement accurate frame-based synthesis in the frequency-domain, and accurate pitch pulse-based synthesis in the time-domain. We use natural spoken and sung voice signals in order to assess the objective and subjective quality of both alternatives when parameters are exact, and when they are replaced by compact and shift-invariant harmonic phase and magnitude approximation models. We highlight the flexibility of these models and present results indicating that not only does the compact shift-invariant phase model cause a smaller impact than that caused by harmonic magnitude modeling, but it also compares favorably to results presented in the literature. © 2020 IEEE

2020

Manipulation of the Fundamental Frequency Micro-Variations using a Fully Parametric and Computationally Efficient Speech Model

Autores
Silva, JP; Oliveira, MA; Cardoso, CF; Ferreira, AJ;

Publicação
IEEE Workshop on Signal Processing Systems, SiPS: Design and Implementation

Abstract
In this paper, we present a computationally efficient and fully parametric harmonic speech model that is suitable for real-time flexible frame-based analysis and synthesis implementation in the frequency domain. We carry out a performance comparison between this vocoder and similar ones, such as WORLD and HPMD. Then, a deliberate manipulation of the speaker's fundamental frequency micro-variations is performed in order to understand in which way it conveys prosodic and idiosyncratic information. We conclude our discussion by evaluating the impact of these manipulations through the realization of perceptual tests. © 2020 IEEE.

2019

Phonetic-oriented identification of twin speakers using 4-second vowel sounds and a combination of a shift-invariant phase feature (NRD), MFCCs and F0 information

Autores
Ferreira, AJ;

Publicação
Proceedings of the AES International Conference

Abstract
Automatic speaker identification typically relies on sophisticated statistical modeling and classification which requires large amounts of data for good performance. However, in actual audio forensics casework, frequently only a few seconds of speech material are available. In this paper, we favor diversity in feature extraction, simple modeling and classification, and constructive combination of congruent classification scores. We use phase, spectral magnitude and F0-related features in speaker identification experiments on a database of 35 speakers most of whom are twins. Using only 4.4 sec. of vowel-like sounds per speaker, we characterize the performance that is reached with individual features and we characterize simple and yet effective ways of classification score fusion. Insights for further research are also presented.

2019

Phonetic-oriented identification of twin speakers using 4-second vowel sounds and a combination of a shift-invariant phase feature (NRD), MFCCs and F0 information

Autores
Ferreira, AJ;

Publicação
2019 AES INTERNATIONAL CONFERENCE ON AUDIO FORENSICS

Abstract
Automatic speaker identification typically relies on sophisticated statistical modeling and classification which requires large amounts of data for good performance. However, in actual audio forensics casework, frequently only a few seconds of speech material are available. In this paper, we favor diversity in feature extraction, simple modeling and classification, and constructive combination of congruent classification scores. We use phase, spectral magnitude and F0-related features in speaker identification experiments on a database of 35 speakers most of whom are twins. Using only 4.4 sec. of vowel-like sounds per speaker, we characterize the performance that is reached with individual features and we characterize simple and yet effective ways of classification score fusion. Insights for further research are also presented.

Teses
supervisionadas

2020

Modelização precisa de filtro de trato vocal para reconstrução de voz disfónica

Autor
Marco António da Mota Oliveira

Instituição
UP-FEUP

2020

Modelização de filtro de trato vocal para reconstrução de voz disfónica

Autor
Marco António da Mota Oliveira

Instituição
UP-FEUP

2020

Accurate glottal source estimation and modelling

Autor
Bruno Miguel Silva Santos

Instituição
UP-FEUP

2020

Dysphonic to natural voice reconstruction based on adaptive phonetic segmentation and synthetic implantation

Autor
João Miguel Pinto Pereira da Silva

Instituição
UP-FEUP

2019

AutoSpeech: Automatic Speech Analysis of Verbal Fluency for Older Adults

Autor
João António Fernandes da Costa

Instituição
UP-FEUP