Skip to content
Skip to main navigation
Skip to first column
Skip to second column

Speech and Vision Lab

Home

People

Spectral mapping using artificial neural networks for voice conversion

Research Area:	Uncategorized	Year:	2010
Type of Publication:	Article	Keywords:	Artificial neural networks, cross lingual, error correction, speaker specific characteristics, spectral mapping, voice conversion
Authors:	Srinivas Desai, A. W. Black, B. Yegnanarayana, Kishore S. Prahallad




Abstract:
In this paper, we use artificial neural networks (ANNs) for voice conversion and exploit the mapping abilities of an ANN model to perform mapping of spectral features of a source speaker to that of a target speaker. A comparative study of voice conversion using an ANN model and the state-of-the-art Gaussian mixture model (GMM) is conducted. The results of voice conversion, evaluated using subjective and objective measures, confirm that an ANN-based VC system performs as good as that of a GMM-based VC system, and the quality of the transformed speech is intelligible and possesses the characteristics of a target speaker. In this paper, we also address the issue of dependency of voice conversion techniques on parallel data between the source and the target speakers. While there have been efforts to use nonparallel data and speaker adaptation techniques, it is important to investigate techniques which capture speaker-specific characteristics of a target speaker, and avoid any need for source speaker’s data either for training or for adaptation. In this paper, we propose a voice conversion approach using an ANN model to capture speaker-specific characteristics of a target speaker and demonstrate that such a voice conversion approach can perform monolingual as well as cross-lingual voice conversion of an arbitrary source speaker.
Digital version

Main Menu

Login Form

Who's Online

We have 12 guests online