2005
Autores
Sinha, D; Ferreira, AJS;
Publicação
Audio Engineering Society - 119th Convention Fall Preprints 2005
Abstract
In this paper we describe the components of a novel audio coding algorithm capable of delivering high-fidelity CDlike stereo audio at the bit rates of 40-48 kbps and natural sounding FM grade mono at the bit rates of 18-22 kbps. Bandwidth Extension has emerged as an important tool for the satisfactory performance of low bit rate audio codecs. Recently we proposed two new bandwidth extension algorithms, Fractal Self-Similarity Model (FSSM) and Accurate Spectral Replacement (ASR), which belong to a new class of Bandwidth Extension techniques which are applied directly to the high resolution frequency representation of the signal (e.g., MDCT or ODFT). The proposed coding scheme uses FSSM and ASR in an adaptive and complementary framework. Another important component of the proposed codec is a wideband psychoacoustic model that makes an explicit use of the Comodulation Release of Masking (CMR) phenomenon. It also includes a novel parametric stereo coding technique. The proposed audio coding scheme is geared towards broadcast applications where codec latency and encoder complexity is generally not an overriding concern. In this paper we present algorithmic details of the new codec, audio demonstrations, and, comparison to other audio coding schemes. Further information and audio demonstrations are available at http://www.atc-labs.com/teslapro.
2005
Autores
Sinha, D; Ferreira, AJS;
Publicação
Audio Engineering Society - 119th Convention Fall Preprints 2005
Abstract
In this paper we describe a new family of smooth power complementary windows which exhibit a very high level of localization in both time and frequency domain. This window family is parameterized by a "smoothness quotient". As the smoothness quotient increases the window becomes increasingly localized in time (most of the energy gets concentrated in the center half of the window) and frequency (far field rejection becomes increasing stronger to the order of 150 dB or higher). A closed form solution for such window function exists and the associated design procedure is described. The new class of windows is quite attractive for a number of applications as switching functions, equalization functions, or as windows for overlap-add and modulated filter banks. An extension to the family of smooth windows which exhibits improved near-field response in the frequency domain is also discussed. More information is available at http://www.atc-labs.com/technology/misc/windows.
2005
Autores
Ferreira, AJS;
Publicação
9th European Conference on Speech Communication and Technology
Abstract
Current signal processing techniques do not match the astonishing ability of the Human Auditory System in recognizing isolated vowels, particularly in the case of female or child speech. As didactic and clinical interactive applications are needed using sound as the main medium of interaction, new signal features must be used that capture important perceptual cues more effectively than popular features such as formants. In this paper we propose the new concept of Perceptual Spectral Cluster (PSC) and describe its implementation. Test results are presented for child and adult speech, and indicate that features elicited by the PSC concept permit reliable and robust identification of vowels, even at high pitches.
2006
Autores
Ferreira, AJS; Sirilia, D;
Publicação
Audio Engineering Society - 120th Convention Spring Preprints 2006
Abstract
3G mobile and wireless communication networks elicit new ways of multimedia human interaction and communication, notably two-way high-quality audio communication. This is inline with both the consumer expectation of new audio experiences and functionalities, and with the motivation of Telecom Operators to offer consumers new services and communication modalities. In this paper we describe the design and optimization of a nioriophonic audio coder (Audio Communication Coder -ACC) that features low-delay coding (< 50 ms) and intrinsic error robustness, while minimizing complexity and achieving competitive coding gains and audio quality at bit rates around 32 kbit/s and higher. ACC source, perceptual and bandwidth extension tools are described and an emphasis is placed on ACC structural and operational features making it suitable for real-time, two-way audio communication. A few performance results are also presented. Audio demos are available at http://www.atc-labs.corn/acc/ .
2008
Autores
Ferreira, A;
Publicação
SIGMAP 2008: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND MULTIMEDIA APPLICATIONS
Abstract
Vowel recognition is frequently based on Linear Prediction (LP) analysis and formant estimation techniques. However, the performance of these techniques decreases in the case of female or child speech because at high pitch frequencies (F0) the magnitude spectrum is scarcely sampled making formant estimation unreliable. In this paper we describe the implementation of a perceptually motivated concept of vowel recognition that is based on Perceptual Spectral Clusters (PSC) of harmonic partials. PSC based features were evaluated in automatic recognition tests using the Mahalanobis distance and using a data base of five natural Portuguese vowel sounds uttered by 44 speakers, 27 of whom are child speakers. LP based features and Mel-Frequency Cepstral Coefficients (MFCC) were also included in the tests as a reference. Results show that while the recognition performance of PSC features falls between that of LP based features and that of MFCC coefficients, the normalization of PSC features by F0 increases the performance and approaches that of MFCC coefficients. PSC features are not only amenable to a psychophysical interpretation (as LP based features are) but have also the potential to compete with global shape features such as MFCCs.
2011
Autores
Sousa, R; Ferreira, A;
Publicação
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5
Abstract
In this paper we introduce new phase-related features denoting the delay between the harmonics and the fundamental frequency of a periodic signal, notably of voiced singing. These features are identified as Normalized Relative Delay (NRD) and denote the phase contribution to the shape invariance of a periodic signal. Thus, NRDs are amenable to a physical and psychophysical interpretation and are structurally independent of the overall time shift of the signal, an important property that is shared with the magnitude spectrum in the case of a locally stationary signal. We describe the NRD and report on preliminary studies testing the discrimination capability of NRDs applied to singing signals.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.