Research Area: | Signal Processing | Year: | 2014 | ||||
Type of Publication: | In Proceedings | ||||||
Authors: | Sivanand Achanta, Nivedita Chennupati, Vishala Pannala, Mansi R, Kishore S. Prahallad | ||||||
Abstract: | |||||||
A robust voice activity detection (VAD) is
a prerequisite for many speech based ap-
plications like speech recognition. We in-
vestigated two VAD techniques that use
time domain and frequency domain char-
acteristics of speech signal. The temporal
characteristic of the autocorrelation lag is
able to discriminate speech and nonspeech
regions. In the frequency domain, peak
value of the magnitude spectrum in differ-
ent sub-bands is used for VAD as it varies
slowly with time for speech regions when
compared to noise.
Performance of the proposed methods
are evaluated on TIMIT database with
noises from NOISEX-92 database at var-
ious signal-to-noise ratio (SNR) levels.
From the experimental results, it is ob-
served that VAD based on autocorrelation
lag is working consistently better than the
maximum peak value of the autocorrela-
tion function based method. However,
it performs inferior compared to our sec-
ond approach and AMR-VAD2. Our sec-
ond approach i.e., VAD based on maxi-
mum spectral amplitude in sub-bands out-
performs AMR-VAD2 and Sohn VAD for
some noise conditions. Moreover, it is
shown that a threshold independent of
noises and their levels can be selected in
the proposed method. |