Speech and Vision Lab

  • Increase font size
  • Default font size
  • Decrease font size
Home Publications
Intonation modeling for Indian languages
Research Area: Uncategorized Year: 2009
Type of Publication: Article Keywords: Intonation models; Feedforward neural network; Prediction accuracy; F0 of syllable, and Classification and regression tree models
Authors: K. Sreenivasa Rao, B. Yegnanarayana  
In this paper we propose models for predicting the intonation for the sequence of syllables present in the utterance. The term intonation refers to the temporal changes of the fundamental frequency (F0). Neural networks are used to capture the implicit intonation knowledge in the sequence of syllables of an utterance. We focus on the development of intonation models for predicting the sequence of fundamental frequency values for a given sequence of syllables. Labeled broadcast news data in the languages Hindi, Telugu and Tamil is used to develop neural network models in order to predict the F0 of syllables in these languages. The input to the neural network consists of a feature vector representing the positional, contextual and phonological constraints. The interaction between duration and intonation constraints can be exploited for improving the accuracy further. From the studies we find that 88% of the F0 values (pitch) of the syllables could be predicted from the models within 15% of the actual F0. The performance of the intonation models is evaluated using objective measures such as average prediction error ([mu]), standard deviation ([sigma]) and correlation coefficient ([gamma]). The prediction accuracy of the intonation models is further evaluated using listening tests. The prediction performance of the proposed intonation models using neural networks is compared with Classification and Regression Tree (CART) models.
Digital version