2007
Authors
Annadana, R; Harinarayanan, EV; Sinha, D; Ferreira, A;
Publication
Audio Engineering Society - 123rd Audio Engineering Society Convention 2007
Abstract
Low bit rate audio coding often results in the loss of a number of key audio attributes such as audio bandwidth and stereo separation. Additionally, there is also typically a loss in the level of details and intelligibility and/or warmth in the signal. Due to the proliferation, e.g. on Internet, of low bit rate audio coded using a variety of coding schemes and bit rates over which the listener has no control, it is becoming increasingly attractive to incorporate processing tools in the player which can ensure a consistent listener experience. We describe a novel post-processing toolkit which incorporates tools for (i) Stereo Enhancement, (ii) Blind Bandwidth Extension, (iii) Automatic Noise Removal and Audio Enhancement, and, (iv) Blind 2-to-5 channel upmixing. Algorithmic details, listening results, and audio demonstrations are presented.
2008
Authors
Sousa, R; Ferreira, A;
Publication
New Trends in Audio and Video - Signal Processing: Algorithms, Architectures, Arrangements, and Applications, NTAV / SPA 2008 - Conference Proceedings
Abstract
In this paper, an evaluation of several methods allowing the estimation of the Harmonic-to-Noise Ratio (HNR) of sustained vowels was conducted. The HNR estimation methods are mainly based on time, spectral, and cepstral signal representations. An algorithm was implemented for each method and was tested with synthesized voice sounds in order to evaluate their accuracy. Tests were also conducted with real pathological voice sounds in order to evaluate the behaviour of the different methods under real conditions. © 2008 Division of Signal Processin.
2011
Authors
Sousa, R; Ferreira, A; Alku, P;
Publication
Models and Analysis of Vocal Emissions for Biomedical Applications - 7th International Workshop, MAVEBA 2011
Abstract
This paper describes an algorithm which enables harmonic and noise splitting of the glottal excitation of voiced speech. The algorithm utilizes a straightforward harmonic and noise splitter which is utilized prior to glottal inverse filtering. The results show improved estimates of the glottal excitation in comparison to a known inverse filtering method.
2012
Authors
Mendes, D; Ferreira, A;
Publication
Proceedings of the AES International Conference
Abstract
Current state-of-The-Art speaker identification systems achieve high performances in reasonably well controlled conditions. However, some scenarios still elicit significant challenges, particularly in audio forensics when voice records are typically just a few seconds long and are severely affected by distortion, interferences, and abnormal speaking attitudes. In this paper we are inspired by the concept of minutiae in the context of fingerprinting, and try to extract localized, phase-related singularities from the speech signal denoting glottal source idiosyncratic information. First, we perform MFCC+GMM experiments in order to find the most effective phonetic segmentation of the speech signal for speaker modelling and discrimination. Secondly, we rely on effective phonetic segmentation and, in addition to MFCC features, we extract Normalized Relative Delays (NRDs) obtained from the phase of spectral harmonics. We use a Nearest Neighbour generalized classifier for speaker modelling and identification. Our results indicate that combining a careful phonetic segmentation and the inclusion of phase-related information, performance in speaker identification may increase significantly. Copyright © 2012 Audio Engineering Society, Inc.
2005
Authors
Ferreira, AJS; Sinha, D;
Publication
Audio Engineering Society - 118th Convention Spring Preprints 2005
Abstract
Recent advances in perceptual audio coding are strongly based on the concept of bandwidth extension. Most techniques implementing bandwidth extension require an analysis/synthesis filter bank in addition to that used by the associated perceptual audio coder, which increases the overall system complexity and coding delay, and makes difficult the correct alignment between the operation of the audio coder and the operation of the bandwidth extension technique. We present a new Accurate Spectral Replacement (ASR) technique that is based on a suitable decomposition of the MDCT filter bank, and that implements synthesis of sinusoidal components with an accuracy much higher than the natural frequency resolution of the filter bank. The ASR technique is described, its performance is assessed with both synthetic and natural audio signals, and its main areas of application are addressed. Audio demos are available at http://www.atc-labs.com/asr/.
2005
Authors
Ferreira, AJS; Sinlia, D;
Publication
Audio Engineering Society - 119th Convention Fall Preprints 2005
Abstract
High-quality audio bit-rate reduction systems are widely used in many application areas involving audio broadcast, streaming and download services. With the advent of 3G mobile and wireless communication networks, there is a clear opportunity for new multimedia services, notably those relying on two-way high- quality audio communication. In t his paper we describe a new source/perceptual audio coder that features low-delay, intrinsic error robustness and high subjective audio quality at competitive compression ratios. The structure of the audio coder is described and an emphasis is given on its innovative approaches to semantic signal segmentation and decomposition, independent coding of sinusoidal and noise components, and bandwidth extension using Accurate Spectral Replacement. A few test results are presented that illustrate the operation and performance of the new coder.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.