Speech and Vision Lab

  • Increase font size
  • Default font size
  • Decrease font size
Home Publications Advanced Publication Search
Spectral mapping using artificial neural networks for voice conversion
Research Area: Uncategorized Year: 2010
Type of Publication: Article Keywords: Artificial neural networks, cross lingual, error correction, speaker specific characteristics, spectral mapping, voice conversion
Authors: Srinivas Desai, A. W. Black, B. Yegnanarayana, Kishore S. Prahallad  
In this paper, we use artificial neural networks (ANNs) for voice conversion and exploit the mapping abilities of an ANN model to perform mapping of spectral features of a source speaker to that of a target speaker. A comparative study of voice conversion using an ANN model and the state-of-the-art Gaussian mixture model (GMM) is conducted. The results of voice conversion, evaluated using subjective and objective measures, confirm that an ANN-based VC system performs as good as that of a GMM-based VC system, and the quality of the transformed speech is intelligible and possesses the characteristics of a target speaker. In this paper, we also address the issue of dependency of voice conversion techniques on parallel data between the source and the target speakers. While there have been efforts to use nonparallel data and speaker adaptation techniques, it is important to investigate techniques which capture speaker-specific characteristics of a target speaker, and avoid any need for source speaker’s data either for training or for adaptation. In this paper, we propose a voice conversion approach using an ANN model to capture speaker-specific characteristics of a target speaker and demonstrate that such a voice conversion approach can perform monolingual as well as cross-lingual voice conversion of an arbitrary source speaker.
Digital version