Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Tópicos
de interesse
Detalhes

Detalhes

Publicações

2021

Flexible parametric implantation of voicing in whispered speech under scarce training data

Autores
Silva, J; Oliveira, M; Ferreira, A;

Publicação
European Signal Processing Conference

Abstract
Whispered-voice to normal-voice conversion is typically achieved using codec-based analysis and re-synthesis, using statistical conversion of important spectral and prosodic features, or using data-driven end-to-end signal conversion. These approaches are however highly constrained by the architecture of the codec, the statistical projection, or the size and quality of the training data. In this paper, we presume direct implantation of voiced phonemes in whispered speech and we focus on fully flexible parametric models that i) can be independently controlled, ii) synthesize natural and linguistically correct voiced phonemes, iii) preserve idiosyncratic characteristics of a given speaker, and iv) are amenable to co-articulation effects through simple model interpolation. We use natural spoken and sung vowels to illustrate these capabilities in a signal modeling and re-synthesis process where spectral magnitude, phase structure, F0 contour and sound morphing can be independently controlled in arbitrary ways.

2020

Impact of a shift-invariant harmonic phase model in fully parametric harmonic voice representation and time/frequency synthesis

Autores
Ferreira, A; Silva, J; Brito, F; Sinha, D;

Publicação
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Abstract
Harmonic representation models are widely used, notably in speech coding and synthesis. In this paper, we describe two fully parametric harmonic representation and signal reconstruction alternatives that rely on a shift-invariant harmonic phase model and that implement accurate frame-based synthesis in the frequency-domain, and accurate pitch pulse-based synthesis in the time-domain. We use natural spoken and sung voice signals in order to assess the objective and subjective quality of both alternatives when parameters are exact, and when they are replaced by compact and shift-invariant harmonic phase and magnitude approximation models. We highlight the flexibility of these models and present results indicating that not only does the compact shift-invariant phase model cause a smaller impact than that caused by harmonic magnitude modeling, but it also compares favorably to results presented in the literature. © 2020 IEEE

2020

Manipulation of the Fundamental Frequency Micro-Variations using a Fully Parametric and Computationally Efficient Speech Model

Autores
Silva, JP; Oliveira, MA; Cardoso, CF; Ferreira, AJ;

Publicação
IEEE Workshop on Signal Processing Systems, SiPS: Design and Implementation

Abstract
In this paper, we present a computationally efficient and fully parametric harmonic speech model that is suitable for real-time flexible frame-based analysis and synthesis implementation in the frequency domain. We carry out a performance comparison between this vocoder and similar ones, such as WORLD and HPMD. Then, a deliberate manipulation of the speaker's fundamental frequency micro-variations is performed in order to understand in which way it conveys prosodic and idiosyncratic information. We conclude our discussion by evaluating the impact of these manipulations through the realization of perceptual tests. © 2020 IEEE.

2019

Phonetic-oriented identification of twin speakers using 4-second vowel sounds and a combination of a shift-invariant phase feature (NRD), MFCCs and F0 information

Autores
Ferreira, AJ;

Publicação
Proceedings of the AES International Conference

Abstract
Automatic speaker identification typically relies on sophisticated statistical modeling and classification which requires large amounts of data for good performance. However, in actual audio forensics casework, frequently only a few seconds of speech material are available. In this paper, we favor diversity in feature extraction, simple modeling and classification, and constructive combination of congruent classification scores. We use phase, spectral magnitude and F0-related features in speaker identification experiments on a database of 35 speakers most of whom are twins. Using only 4.4 sec. of vowel-like sounds per speaker, we characterize the performance that is reached with individual features and we characterize simple and yet effective ways of classification score fusion. Insights for further research are also presented.

2018

Acoustic analysis of voice signal: Comparison of four applications software

Autores
Vaz Freitas, S; Pestana, PM; Almeida, V; Ferreira, A;

Publicação
BIOMEDICAL SIGNAL PROCESSING AND CONTROL

Abstract
Objectives: To describe the results of the acoustic analysis of a database of 90 voice samples with distinct dysphonia levels, using four different - commercial and open source - software programs. Study design: Exploratory, transversal. Methods: The samples were analyzed by four different types of software programs that perform acoustical evaluation - one open source software (Praat) and three commercial ones (Multi Dimensional Voice Program - MDVP by Kay Elemetrics; VoiceStudio by Seegnal; and Dr. Speech by Tiger Electronics) - for comparison among the most commonly used acoustic measures (frequency, perturbation and noise measures). Results: There is a moderate to strong,correlation, positive and statistically significant among the software programs. The mean FO is not statistically different among the used applications. The other acoustic measures revealed statistically significant differences. Conclusion: Even though it is easier to access software programs and there are numerous proposals for acoustic measures, not all of them are statistically representative nor have numeric semblance among the different applications.

Teses
supervisionadas

2020

Accurate glottal source estimation and modelling

Autor
Bruno Miguel Silva Santos

Instituição
UP-FEUP

2020

Dysphonic to natural voice reconstruction based on adaptive phonetic segmentation and synthetic implantation

Autor
João Miguel Pinto Pereira da Silva

Instituição
UP-FEUP

2020

Modelização precisa de filtro de trato vocal para reconstrução de voz disfónica

Autor
Marco António da Mota Oliveira

Instituição
UP-FEUP

2020

Modelização de filtro de trato vocal para reconstrução de voz disfónica

Autor
Marco António da Mota Oliveira

Instituição
UP-FEUP

2019

Adaptation of an Harp for MIDI Implementation and Sound Amplification

Autor
João Miguel Almeida Beleza

Instituição
UP-FEUP