Research Area: | Neural Networks | Year: | 2014 | ||||
Type of Publication: | In Proceedings | ||||||
Authors: | Sudarsana Reddy Kadiri, P. Gangamohan, B. Yegnanarayana | ||||||
Abstract: | |||||||
In this paper, we address the issue of
speaker-specific emotion detection (neu-
tral vs emotion) from speech signals with
models for neutral speech as reference. As
emotional speech is produced by the hu-
man speech production mechanism, the
emotion information is expected to lie in
the features of both excitation source and
the vocal tract system. Linear Prediction
residual is used as the excitation source
component and Linear Prediction Coef-
ficients as the vocal tract system com-
ponent. A pitch synchronous analysis
is performed. Separate Autoassociative
Neural Network models are developed to
capture the information specific to neu-
tral speech, from the excitation and the
vocal tract system components. Exper-
imental results show that the excitation
source carries more information than the
vocal tract system. The accuracy neu-
tral vs emotion classification using excita-
tion source information is 91%, which is
8% higher than the accuracy obtained us-
ing vocal tract system information. The
Berlin EMO-DB database is used in this
study. It is observed that, the proposed
emotion detection system provides an im-
provement of approximately 10% using
excitation source features and 3% using
vocal tract system features over the re-
cently proposed emotion detection which
uses the energy and pitch contour model-
ing with functional data analysis. |