Skip to content
Skip to main navigation
Skip to first column
Skip to second column

Speech and Vision Lab

Home

Publications

Extraction and representation of prosodic features for language and speaker recognition

Research Area:	Uncategorized	Year:	2008
Type of Publication:	Article	Keywords:	Prosody; Vowel onset point; Intonation; Stress; Rhythm; Language recognition; Speaker recognition; Multilayer feedforward neural network; Autoassociative neural network
Authors:	Leena Mary, B. Yegnanarayana




Note:
http://www.sciencedirect.com/science/article/B6V1C-4SHVSRT-1/2/0120b8ce1cb86c9d44bb17c154aa7f88
Abstract:
In this paper, we propose a new approach for extracting and representing prosodic features directly from the speech signal. We hypothesize that prosody is linked to linguistic units such as syllables, and it is manifested in terms of changes in measurable parameters such as fundamental frequency (F0), duration and energy. In this work, syllable-like unit is chosen as the basic unit for representing the prosodic characteristics. Approximate segmentation of continuous speech into syllable-like units is obtained by locating the vowel onset points (VOP) automatically. The knowledge of the VOPs serve as reference for extracting prosodic features from the speech signal. Quantitative parameters are used to represent F0 and energy contour in each region between two consecutive VOPs. Prosodic features extracted using this approach may be useful in applications such as recognition of language or speaker, where explicit phoneme/syllable boundaries are not easily available. The effectiveness of the derived prosodic features for language and speaker recognition is evaluated in the case of NIST language recognition evaluation 2003 and the extended data task of NIST speaker recognition evaluation 2003, respectively.
Digital version

Main Menu

Login Form

Who's Online

We have 23 guests online