Skip to content
Skip to main navigation
Skip to first column
Skip to second column

Speech and Vision Lab

Home

Publications

Segmentation of monologues in audio books for building synthetic voice

Research Area:	Speech Synthesis	Year:	2011
Type of Publication:	Article	Keywords:	Audio books, forced alignment, large speech files, text to speech
Authors:	Kishore S. Prahallad, A. W. Black




Abstract:
One of the issues in using audio books for building a synthetic voice is the segmentation of large speech files. The use of the Viterbi algorithm to obtain phone boundaries on large audio files fails primarily because of huge memory requirements. Earlier works have attempted to resolve this problem by using large vocabulary speech recognition system employing restricted dictionary and language model. In this paper, we propose suitable modifications to the Viterbi algorithm and demonstrate its usefulness for segmentation of large speech files in audio books. The utterances obtained from large speech files in audio books are used to build synthetic voices. We show that synthetic voices built from audio books in the public domain have Mel-cepstral distortion scores in the range of 4–7, which is similar to voices built from studio quality recordings such as CMU ARCTIC.
Digital version

Main Menu

Login Form

Who's Online

We have 20 guests online