Research Area: | Speech Synthesis | Year: | 2010 | ||||
Type of Publication: | Mastersthesis | Keywords: | Laughter, epoch, pitch period, strength of excitation, breathiness, loudness, zero-frequency resonator, synthesis | ||||
Authors: | Sudheer K. | ||||||
Abstract: | |||||||
Laughter is a nonverbal vocalization that occur often in continuous speech. It is produced
by the speech production mechanism using a highly variable physiological process. The
vocalized expression of laughter varies across gender, individuals and context. Despite its
variability, laughter is perceived naturally by humans.
Since laughter is produced by the human speech production mechanism, spectral fea-
tures were generally used for the study of laughter acoustics. This work mainly aims
at showing the significance of excitation information for analysis of laugh signals. We
proposed acoustic features which are motivated from the production characteristics of
laughter. The features are based on pitch period (T 0 ), the strength of excitation derived
from zero-frequency filtered signal, amount of breathiness and loudness measure.
It is observed that there will be sudden bursts of air flow through the vocal tract in the
case of laughter. This will result in faster vibration of the vocal folds, and hence reduc-
tion in the pitch period. Apart from decrease in the pitch period, there is also a raising
pattern in the pitch period contour. Similar observations are also made in case of strength
of excitation. The strength of excitation rises sharply and then falls almost at the same
rate. Since there will be more air flow, laughter is typically accompanied by some amount
of breathiness. It will be reflected perceptually as less louder and more noisy. These
loudness and non-determenistic component (noise) in the signal are estimated using mea-
sures based on Hilbert envelope. A method is proposed using these features for detecting
laughter in continuous speech. The method is tested on a noisy data (TV show data) and
a clean data (AMI Corpus) and the results are reported.
A method is also proposed for synthesizing isolated laughter by modifying the above features. The perceptual significance of each of the features is studied using an analysis
by synthesis based approach. The affect of each of the features are studied by deempha-
sizing them from some original laugh signals, and performing perceptual evaluation on
the obtained signals. The study indicated that pitch period pattern is the most significant
factor, followed by call-intercall duration, breathiness and strength of excitation. |
|||||||
Digital version |