About SVL

The thrust of our activity in the Speech and Vision Lab (SVL) at IIIT-H is on the development of natural input and output interfaces to a computer through speech and image. The focus is on development of speech input/output systems for Indian languages, with the objective of achieving speech translation from one Indian language to another. Another objective is to develop person authentication systems for providing secure access to information using biometrics that involves inputs in the form of speech, video, finger print, etc., and developing systems for content based information retrieval for nontextual data.

We have been focusing on the development of speech-to-text and text-to-speech system for Indian languages. An unrestricted text-to-speech system has been developed for Hindi and other Indian languages. The system incorporates prosodic rules derived for Hindi. Currently we are developing a speech-to-text system for Indian languages using syllable-like units as basic sound units. We are also working on speaker recognition tasks, both text-dependent and text-independent, mainly for secure access to information. Enhancement of degraded speech is another area of active pursuit in the laboratory over the past few years. Several new techniques have been developed for enhancement of speech degraded by additive noise and reverberation. The SVL has been active in developing signal processing algorithms for speech and image processing. Several signal processing methods have been developed for dealing with remotely sensed data. In particular, new models based on constraint satisfaction neural network have been developed for classification of multi-spectral remotely sensed data. The powerful features of neural network models are being explored for several tasks in speech, image and decision making.

The Speech and Vision lab is involved in research on

Speech signal processing
Speech-to-text conversion
Text-to-speech conversion
Speaker recognition
Speech enhancement
Applications of neural networks
Image processing
Person authentication using biometrics
Information retrieval using audio and video indexing

Current Thrust

We are currently working on several areas related to speech and vision. The key activities are as follows:

Event based analysis of speech
Speech enhancement using source features
Speech enhancement in multispeaker environment
Development of phonetic engine for Indian languages
Preparation of speech corpus for Indian languages
Speaker segmentation and tracking
Automatic prosody modeling and manipulation
Speaker recognition using source features
Information retrieval using audio and video indexing
Speaker verification using audio and visual clues
Person authentication using biometrics
Applications of neural networks

Speech and Vision Lab

About SVL

Current Thrust

Main Menu

Login Form

Who's Online