|Type of Publication:||Article|
|Authors:||G. V. Ramana Rao, B. Yegnanarayana|
|Journal:||Computer Speech & Language||Volume:||5|
|Number:||4||Pages:||379 - 392|
This paper proposes a method for hypothesizing word boundaries in Hindi speech. The method is based on the observation that function words such as case markers, pronouns and conjunctions occur frequently in Hindi text and spotting of these frequently occurring patterns is proposed as a means for hypothesizing word boundaries in a speech-to-text conversion system for Hindi. Initially, the idea was tested on a correct text with all word boundaries (except sentence boundaries) removed; the results showed that nearly 67% of the word boundaries were correctly hypothesized. Later, experiments with input containing errors simulated to represent speech environment showed that the proposed method is effective even at error levels as high as 50%. The implications of these results in the development of a speech-to-text conversion system for Hindi are discussed.