Research Area: | Speech Recognition | Year: | 2010 | ||||
Type of Publication: | Mastersthesis | Keywords: | automatic speaker verification, text-dependent, distant speech, high signal-to-noise ratio, pitch, duration | ||||
Authors: | B. Avinash | ||||||
Abstract: | |||||||
Automatic speaker verification (ASV) is the task of verifying a person’s claimed
identity from his/her voice using a digital computer. The existing ASV systems
perform with high accuracy of verification when the speech signal is collected close
to the mouth of the speaker (peaker, for text-dependent ASV
system. The distant speech signal is collected using single channel microphone. An
acoustic feature derived from short segments of speech signals is proposed for ASV
task. The key idea is to exploit the high signal-to-noise nature of short segments of
speech in the vicinity of impulse-like excitations. We demonstrate that the proposed
feature suffers lesser degradation with distance when compared to the widely used
Mel-frequency cepstral coefficients (MFCCs), and also yields better performance of
speaker verification than MFCCs. We propose a method of begin-end detection
based on the strength of the spectral peaks. A score normalization method is pro-
posed by considering only the robust regions of speech signal. In addition, the
regions of speech signal with high signal-to-reverberation ratio are identified, and
greater weightage is given to these regions. These modifications are shown to result
in a systematic improvement in the performance of the speaker verification system.
The use of additional features of duration and pitch is shown to further improve the
performance of speaker verification system for distant speech. |
|||||||
Digital version |