Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por Aníbal Ferreira

2020

Manipulation of the Fundamental Frequency Micro-Variations using a Fully Parametric and Computationally Efficient Speech Model

Autores
Silva, JP; Oliveira, MA; Cardoso, CF; Ferreira, AJ;

Publicação
IEEE Workshop on Signal Processing Systems, SiPS: Design and Implementation

Abstract
In this paper, we present a computationally efficient and fully parametric harmonic speech model that is suitable for real-time flexible frame-based analysis and synthesis implementation in the frequency domain. We carry out a performance comparison between this vocoder and similar ones, such as WORLD and HPMD. Then, a deliberate manipulation of the speaker's fundamental frequency micro-variations is performed in order to understand in which way it conveys prosodic and idiosyncratic information. We conclude our discussion by evaluating the impact of these manipulations through the realization of perceptual tests. © 2020 IEEE.

2021

Flexible parametric implantation of voicing in whispered speech under scarce training data

Autores
Silva, J; Oliveira, M; Ferreira, A;

Publicação
28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020)

Abstract
Whispered-voice to normal-voice conversion is typically achieved using codec-based analysis and re-synthesis, using statistical conversion of important spectral and prosodic features, or using data-driven end-to-end signal conversion. These approaches are however highly constrained by the architecture of the codec, the statistical projection, or the size and quality of the training data. In this paper, we presume direct implantation of voiced phonemes in whispered speech and we focus on fully flexible parametric models that i) can be independently controlled, ii) synthesize natural and linguistically correct voiced phonemes, iii) preserve idiosyncratic characteristics of a given speaker, and iv) are amenable to co-articulation effects through simple model interpolation. We use natural spoken and sung vowels to illustrate these capabilities in a signal modeling and re-synthesis process where spectral magnitude, phase structure, F-0 contour and sound morphing can be independently controlled in arbitrary ways.

2015

Frequency-domain parametric coding of wideband speech - A first validation model

Autores
Ferreira, A; Sinha, D;

Publicação
139th Audio Engineering Society International Convention, AES 2015

Abstract
Narrow band parametric speech coding and wideband audio coding represent opposite coding paradigms involving audible information, namely in terms of the specificity of the audio material, target bit rates, audio quality and application scenarios. In this paper we explore a new avenue addressing parametric coding of wideband speech, using the potential and accuracy provided by frequency-domain signal analysis and modeling techniques that typically belong to the realm of high-quality audio coding. A first analysis-synthesis validation framework is described that illustrates the decomposition, parametric representation and synthesis of perceptually and linguistically relevant speech components while preserving naturalness and speaker specific information.

2022

Simple and effective signal processing pinpointing subtle premature ventricular contractions inferred from increasing physical effort

Autores
Ferreira, AJS;

Publicação
2022 13th International Symposium on Communication Systems, Networks and Digital Signal Processing, CSNDSP 2022

Abstract
Premature ventricular contractions (PVC), or extrasystoles, represent a type of cardiac arrhythmia that is common among the general population and, notably, among athletes or individuals who exercise frequently. PVC may be asymptomatic and not clinically relevant when their rate is low, up to around 0.5%, or may be symptomatic and clinically relevant when it is high, in the order of or above 10%. ECG analysis in association with a cardiac stress test is important to detect and characterize PVC and to diagnose the heart condition and operation. In this paper, we describe and test a simple signal processing approach that can be used to effectively pinpoint subtle PVC occurrences in various physical effort conditions. In this regard, we discuss i) three important conditions to be met such that PVC are categorized as benign, ii) the design and implementation of a cardiac stress test and ECG data collection, iii) the algorithm analyzing and extracting information from the detected PVC occurrences, and iv) we present and discuss the obtained results, and conclude on their significance. © 2022 IEEE.

2023

Discriminative segmental cues to vowel height and consonantal place and voicing in whispered speech

Autores
Jesus, LMT; Castilho, S; Ferreira, A; Costa, MC;

Publicação
JOURNAL OF PHONETICS

Abstract
Purpose: The acoustic signal attributes of whispered speech potentially carry sufficiently distinct information to define vowel spaces and to disambiguate consonant place and voicing, but what these attributes are and the underlying production mechanisms are not fully known. The purpose of this study was to define segmental cues to place and voicing of vowels and sibilant fricatives and to develop an articulatory interpretation of acoustic data.Method: Seventeen speakers produced sustained sibilants and oral vowels, disyllabic words, sentences and read a phonetically balanced text. All the tasks were repeated in voiced and whispered speech, and the sound source and filter analysed using the following parameters: Fundamental frequency, spectral peak frequencies and levels, spectral slopes, sound pressure level and durations. Logistic linear mixed-effects models were developed to understand what acoustic signal attributes carry sufficiently distinct information to disambiguate /i, a/ and /s, ?/.Results: Vowels were produced with significantly different spectral slope, sound pressure level, first and second formant frequencies in voiced and whispered speech. The low frequencies spectral slope of voiced sibilants was significantly different between whispered and voiced speech. The odds of choosing /a/ instead of /i/ were esti-mated to be lower for whispered speech when compared to voiced speech. Fricatives' broad peak frequency was statistically significant when discriminating between /s/ and /?/.Conclusions: First formant frequency and relative duration of vowels are consistently used as height cues, and spectral slope and broad peak frequency are attributes associated with consonantal place of articulation. The rel-ative duration of same-place voiceless fricatives was higher than voiced fricatives both in voiced and whispered speech. The evidence presented in this paper can be used to restore voiced speech signals, and to inform reha-bilitation strategies that can safely explore the production mechanisms of whispering.CO 2023 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http:// creativecommons.org/licenses/by/4.0/).

2010

Singing voice resynthesis using vocal sound libraries

Autores
Fonseca, N; Ferreira, A;

Publicação
13th International Conference on Digital Audio Effects, DAFx 2010 Proceedings

Abstract
Although resynthesis may seem a simple analysis/synthesis process, it is a quite complex task, even more when it comes to recreating a singing voice. This paper presents a system whose goal is to start with an original audio stream of someone singing and recreate the same performance (melody, phonetics, dynam-ics) using an internal vocal sound library (choir or solo voice). By extracting dynamics and pitch information, and looking for phonetic similarities between the original audio frames and the frames of the sound library, a completely new audio stream is created. The obtained audio results, although not perfect (mainly due to the existence of audio artifacts), show that this technologi-cal approach may become an extremely powerful audio tool.

  • 4
  • 13