HCI : Speech /Speaker Recognition System

HCI : Speech /Speaker Recognition System
Dr. Bharti W. Gawali Associate Professor Department of Computer Science & Information Technology Dr.Babasaheb Ambedkar Marathwada University Aurangabad id: Dr.Bharti Gawali 06/03/2012

This tutorial will focus on :
Introduction to speech Processing Salient features of Speech Recognition System Feature extraction methods Speaker recognition System Some handouts Dr.Bharti Gawali 06/03/2012

Introduction The fundamental purpose of speech is communication, i.e., the transmission of messages. In the case of speech, the fundamental analog form of the message is an acoustic waveform, which we call the speech signal. Dr.Bharti Gawali 06/03/2012

Production of speech When we speak, we let air pass from our lungs through our mouth and nasal cavity, and this air stream is restricted and changed with our tongue and lips. This contractions and expansions of the lungs, produces an acoustic wave, a sound. The sound forms, the vowels and consonants, are usually called phones. The phones are combined together into words. Dr.Bharti Gawali 06/03/2012

A block diagram of Human Speech Production
Dr.Bharti Gawali 06/03/2012 A block diagram of Human Speech Production

SPEECH CHAIN The complete process of producing and perceiving speech from the formulation of a message in the brain of a talker, to the creation of the speech signal, and finally to the understanding of the message by a listener we have a speech chain from message, to speech signal, to understanding. Dr.Bharti Gawali 06/03/2012

Speech Chain Message formulation Language code Neuromuscular controls
Vocal tracts System Generation of acoustic wave Transmission channel Neural transduction (feature extraction) Language translation Message Understanding Dr.Bharti Gawali 06/03/2012

. Layers for describing speech
Acoustics Phonetics Phonology Morphology Syntax Semantics Dr.Bharti Gawali 06/03/2012

Speech signal with silence
Events Of Speech Speech signal with silence Dr.Bharti Gawali 06/03/2012

Digital Representation of Speech
This process of analog-to-digital conversion has two steps: sampling and quantization (Digitization). A signal is sampled by measuring its amplitude at a particular time; the sampling rate is the number of samples taken per second. In order to accurately measure a wave, it is necessary to have at least two samples in each cycle: one measuring the positive part of the wave and one measuring the negative part. Dr.Bharti Gawali 06/03/2012

Change in resonance changes sound.
Production of speech sound spectrum, due to resonances in the vocal tract, called formants. Change in resonance changes sound. Thus speech wave s(n) = convolution of the source (e(n))* impulse response function of the filter h (n). In frequency domain: Dr.Bharti Gawali 06/03/2012

Speech processing can be divided into the following categories
Speech recognition, which deals with analysis of the linguistic content of a speech signal. Speaker recognition, where the aim is to recognize the identity of the speaker. Speech coding, a specialized form of data compression, is important in the telecommunication area. Speech synthesis: the artificial synthesis of speech, which usually means computer-generated speech. Speech enhancement: enhancing the intelligibility and/or perceptual quality of a speech signal, like audio noise reduction for audio signals. Dr.Bharti Gawali 06/03/2012

Speech Recognition Basics
Speech recognition is the process of deriving the sequence of speech sounds best matching the input speech signal. It is characterized by the size and shape of filter ( vocal cavity). The following definitions are the basics needed for understanding speech recognition technology. Utterance Vocabularies Training Dr.Bharti Gawali 06/03/2012

Approaches to speech recognition
Template-based approaches In which unknown speech is compared against a set of prerecorded words (templates) Knowledge-based approaches In which “expert” knowledge about variations in speech is hand-coded into a system. Statistical-based approaches In which variations in speech are modeled statistically (e.g., by Hidden Markov Models, or HMMs), using automatic learning procedures Dr.Bharti Gawali 06/03/2012

Types of Speech Recognition
Isolated Words Example: "start”, “stop”, “ON”, “OF” Connected Words Example: Continuous Speech Example: Today I am presenting a lecture. Spontaneous Speech Example: commentators. Dr.Bharti Gawali 06/03/2012

Isolated Word Dr.Bharti Gawali 06/03/2012

Continuous Sentences Dr.Bharti Gawali 06/03/2012

Signal Sentence Hypothesis Feature Extraction Matching Acoustic Model
Acoustic domain Matching Symbolic domain Language Model Speech recognition is a special case of pattern recognition. Sentence Hypothesis Dr.Bharti Gawali Block Diagram of speech recognition 06/03/2012

Feature Extraction Technique
The speech feature extraction in a categorization problem is about reducing the dimensionality of the input vector while maintaining the discriminating power of the signal. As we know from fundamental formation of speaker identification and verification system, that the number of training and test vector needed for the classification problem grows with the dimension of the given input so we need feature extraction of speech signal. Dr.Bharti Gawali 06/03/2012

Cont.… Following are some feature extraction.
Linear Discriminate Analysis(LDA) Mel-frequency cepstrum (MFFCs) Dynamic time warping Independent Component Analysis (ICA) Linear Predictive coding Cepstral Analysis Filter bank analysis Kernel based feature extraction method Wavelet Dr.Bharti Gawali 06/03/2012

Speech Recognition Enables Many Applications
Voice based IVR systems and services that can remain available 24x7 Indexing of audio recordings such as internet (Google) search and may be, searching of audio recordings Hands-busy or eyes-busy applications, such as where the user has objects to manipulate or equipment to control. Telephony, where speech recognition is used for example in spoken dialogue systems for entering digits, recognizing words to accept collect calls, finding out airplane or train information, and call-routing etc. interaction between computers and humans with some disability resulting in the inability to type, or the inability to speak Dr.Bharti Gawali 06/03/2012

Speech Recognition Software
CMU Sphinx Homepage: Praat Homepage: HTK htk.eng.cam.ac.uk/download.shtml Matlab SFS Dr.Bharti Gawali 06/03/2012

Challenges in Speech Recognition
Speaking style: clear, spontaneous, slurred or sloppy Speaking rate: fast or slow speech Speaking rate can change within a single sentence Emotional state: happy, sad, etc. Emphasis: stressed speech vs unstressed speech Accents, dialects, foreign words Environmental or background noise Even the same person never speaks exactly the same way twice Large vocabulary and infinite language Absence of word boundary markers in continuous speech Inherent ambiguities: “I scream” or “Ice cream”? Dr.Bharti Gawali 06/03/2012

PERFORMANCE OF SYSTEMS
The performance of speech recognition systems is usually specified in terms of accuracy and speed. Accuracy may be measured in terms of performance accuracy which is usually rate with word error rate (WER), whereas speed is measured with the real time factor. Where S is the number of substitutions, D is the number of the deletions, I is the number of the insertions, N is the number of words in the reference Dr.Bharti Gawali 06/03/2012

Speaker Recognition System
It is a process of VALIDATING a user’s claim to an identity USING CHARACTERISTICS EXTRACTED FROM THEIR VOICE. It started four decades back. Uses acoustic features of speech that is different in two individuals. The acoustic patterns reflect both anatomy And learned behavioral patterns. Dr.Bharti Gawali 06/03/2012

Each speaker recognition system has two phases:
Enrollment and verification. During enrollment, the speaker's voice is recorded and typically a number of features are extracted to form a voice print, template, or model. In the verification phase, a speech sample or "utterance" is compared against a previously created voice print. For identification systems, the utterance is compared against multiple voice prints in order to determine the best match(es) while verification systems compare an utterance against a single voice print. Because of the process involved, verification is faster than identification. Dr.Bharti Gawali 06/03/2012

Block diagram of Typical Speaker verification system
Model Generation Threshold Criterion Input Signal Processing Accepted Pattern Matching Decision Logic Rejected Dr.Bharti Gawali 06/03/2012

There are two basic modes of speaker verification:
Text independent mode (Voice characteristics of speaker) Text dependent mode ( predetermined text is used) Text prompted speaker verification (system prompts to speaker) Dr.Bharti Gawali 06/03/2012

Gaussian mixture models, pattern matching algorithms, neural networks,
Technology The various technologies used to process and store voice prints include frequency estimation, hidden Markov models, Gaussian mixture models, pattern matching algorithms, neural networks, matrix representation, Vector Quantization and decision trees. Dr.Bharti Gawali 06/03/2012

Searching IVRS Database
Telephonic Card Name of Crop Symptom Call Connect to IVRS Searching IVRS Database Continue Call Reply from Machine Call Ended Farmer IVRS System Dr.Bharti Gawali 06/03/2012

The Speech Recognition Tool
Dr.Bharti Gawali 06/03/2012

Books for Speech Recognition
Fundamentals of Speech Recognition". L. Rabiner & B. Juang ISBN: "How to Build a Speech Recognition Application". B. Balentine, D. Morgan, and W. Meisel ISBN: "Speech Recognition : Theory and C++ Implementation". C. Becchetti and L.P. Ricotti ISBN: "Applied Speech Technology". A. Syrdal, R. Bennett, S. Greenspan ISBN: "Speech Recognition : The Complete Practical Reference Guide". P. Foster, T. Schalk ISBN: "Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition". D. Jurafsky, J. Martin ISBN: Dr.Bharti Gawali 06/03/2012

Contd.. "Discrete-Time Processing of Speech Signals (IEEE Press Classic Reissue)". J. Deller, J. Hansen, J. Proakis ISBN: Statistical Methods for Speech Recognition (Language, Speech, and Communication)". F. Jelinek ISBN: Digital Processing of Speech Signals" L. Rabiner, R. Schafer ISBN: Foundations of Statistical Natural Language Processing". C. Manning, H. Schutze ISBN: "Designing Effective Speech Interfaces". S. Weinschenk, D. T. Barker ISBN: Dr.Bharti Gawali 06/03/2012

HCI : Speech /Speaker Recognition System

Similar presentations

Presentation on theme: "HCI : Speech /Speaker Recognition System"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

HCI : Speech /Speaker Recognition System

Similar presentations

Presentation on theme: "HCI : Speech /Speaker Recognition System"— Presentation transcript:

Similar presentations

About project

Feedback