Research Area: | Uncategorized | Year: | 2010 | ||||
Type of Publication: | Article | Keywords: | Artificial neural networks, cross lingual, error correction, speaker specific characteristics, spectral mapping, voice conversion | ||||
Authors: | Srinivas Desai, A. W. Black, B. Yegnanarayana, Kishore S. Prahallad | ||||||
Abstract: | |||||||
In this paper, we use artificial neural networks
(ANNs) for voice conversion and exploit the mapping abilities
of an ANN model to perform mapping of spectral features of a
source speaker to that of a target speaker. A comparative study
of voice conversion using an ANN model and the state-of-the-art
Gaussian mixture model (GMM) is conducted. The results of voice
conversion, evaluated using subjective and objective measures,
confirm that an ANN-based VC system performs as good as that
of a GMM-based VC system, and the quality of the transformed
speech is intelligible and possesses the characteristics of a target
speaker. In this paper, we also address the issue of dependency of
voice conversion techniques on parallel data between the source
and the target speakers. While there have been efforts to use
nonparallel data and speaker adaptation techniques, it is important
to investigate techniques which capture speaker-specific
characteristics of a target speaker, and avoid any need for source
speaker’s data either for training or for adaptation. In this paper,
we propose a voice conversion approach using an ANN model to
capture speaker-specific characteristics of a target speaker and
demonstrate that such a voice conversion approach can perform
monolingual as well as cross-lingual voice conversion of an arbitrary
source speaker. |
|||||||
Digital version |