Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors

Slides:



Advertisements
Similar presentations
Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),
Advertisements

Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.
Look Who’s Talking Now SEM Exchange, Fall 2008 October 9, Montgomery College Speaker Identification Using Pitch Engineering Expo Banquet /08/09.
Vineel Pratap Girish Govind Abhilash Veeragouni. Human listeners are capable of extracting information from the acoustic signal beyond just the linguistic.
Abstract This article investigates the importance of the vocal source information for speaker recogni- tion. We propose a novel feature extraction scheme.
A 12-WEEK PROJECT IN Speech Coding and Recognition by Fu-Tien Hsiao and Vedrana Andersen.
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
7.0 Speech Signals and Front-end Processing
A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.
4/25/2001ECE566 Philip Felber1 Speech Recognition A report of an Isolated Word experiment. By Philip Felber Illinois Institute of Technology April 25,
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.
CUBS, University at Buffalo
Analysis & Synthesis The Vocoder and its related technology.
A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.
Voice Transformation Project by: Asaf Rubin Michael Katz Under the guidance of: Dr. Izhar Levner.
Securing Pervasive Networks Using Biometrics
Real-Time Speech Recognition Thang Pham Advisor: Shane Cotter.
Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,
Why is ASR Hard? Natural speech is continuous
A PRESENTATION BY SHAMALEE DESHPANDE
Representing Acoustic Information
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.
Multimedia Specification Design and Production 2013 / Semester 2 / week 3 Lecturer: Dr. Nikos Gazepidis
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.
Implementing a Speech Recognition System on a GPU using CUDA
Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal, in EuroSpeech 99.
1 Linear Prediction. 2 Linear Prediction (Introduction) : The object of linear prediction is to estimate the output sequence from a linear combination.
1 Linear Prediction. Outline Windowing LPC Introduction to Vocoders Excitation modeling  Pitch Detection.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
Look who’s talking? Project 3.1 Yannick Thimister Han van Venrooij Bob Verlinden Project DKE Maastricht University.
(Extremely) Simplified Model of Speech Production
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.
Performance Comparison of Speaker and Emotion Recognition
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
More On Linear Predictive Analysis
Predicting Voice Elicited Emotions
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Speaker Verification System Middle Term Presentation Performed by: Barak Benita & Daniel Adler Instructor: Erez Sabag.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,
Linear Prediction.
Speech Recognition with Matlab ® Neil E. Cotter ECE Department UNIVERSITY OF UTAH
Analysis of multidimensional signals for classification and recognition Politecnico di Bari Corresponding persons Pietro Guccione – Assistant Professor.
7.0 Speech Signals and Front-end Processing References: , 3.4 of Becchetti of Huang.
BIOMETRICS VOICE RECOGNITION. Meaning Bios : LifeMetron : Measure Bios : LifeMetron : Measure Biometrics are used to identify the input sample when compared.
PATTERN COMPARISON TECHNIQUES
ARTIFICIAL NEURAL NETWORKS
Vocoders.
Linear Prediction.
Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
PROJECT PROPOSAL Shamalee Deshpande.
The Vocoder and its related technology
Isolated word, speaker independent speech recognition
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
Digital Systems: Hardware Organization and Design
Linear Prediction.
Measuring the Similarity of Rhythmic Patterns
Keyword Spotting Dynamic Time Warping
Presentation transcript:

Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors

Speech Fundamentals Characterizing speech Content (Speech recognition) Signal representation (Vocoding) Waveform Parametric( Excitation, Vocal Tract) Signal analysis (Gender determination, Speaker recognition) Terminologies Phonemes : Basic discrete units of speech. English has around 42 phonemes. Language specific Types of speech Voiced speech Unvoiced speech(Fricatives) Plosives Formants

Speech production Speech production mechanismSpeech production model Impulse Train Generator Glottal Pulse Model G(z) Vocal Tract Model V(z) Radiation Model R(z) Noise source Pitch AvAv ANAN 17 cm

Nature of speech Spectrogram

Vocal Tract modeling Signal Spectrum Smoothened Signal Spectrum The smoothened spectrum indciates the locations of the formants of each user The smoothened spectrum is obtained by cepstral coefficients

Parametric Representations: Formants Formant Frequencies Characterizes the frequency response of the vocal tract Used in characterization of vowels Can be used to determine the gender

Parametric Representations:LPC Linear predictive coefficients Used in vocoding Spectral estimation

Parametric Representations:Cepstrum P[n]G(z) V(z)R(z) u[n] PitchAvAv ANAN D[]L[]D -1 [] x 1 [n]*x 2 [n] x 1 ‘[n]+x 2 ‘[n] y 1 ‘[n]+y 2 ‘[n] y 1 [n]*y 2 [n] DFT[]LOG[]IDFT[] x 1 [n]*x 2 [n] X 1 (z)X 2 (z) x1‘[n]+x2‘[n] log(X 1 (z)) + log(X 2 (z))

Speaker Recognition Definition It is the method of recognizing a person based on his voice It is one of the forms of biometric identification Depends of speaker dependent characteristics. Speaker Recognition Speaker IdentificationSpeaker VerificationSpeaker Detection Text Dependent Text Independent Text Dependent Text Independent

Generic Speaker Recognition System Preprocessing Feature Extraction Pattern Matching Preprocessing Feature Extraction Speaker Model Verification Enrollment A/D Conversion End point detection Pre-emphasis filter Segmentation LAR Cepstrum LPCC MFCC Stochastic Models GMM HMM Template Models DTW Distance Measures Speech signal Analysis FramesFeature Vector Score Choice of features Differentiating factors b/w speakers include vocal tract shape and behavioral traits Features should have high inter-speaker and low intra speaker variation

Our Approach Silence Removal Cepstrum Coefficients Cepstral NormalizationLong time average Polynomial Function Expansion Dynamic Time Warping Distance Computation Reference Template Preprocessing Feature Extraction Speaker model Matching

Silence Removal Preprocessing Feature Extraction Speaker model Matching

Pre-emphasis Preprocessing Feature Extraction Speaker model Matching

Segmentation Preprocessing Feature Extraction Speaker model Matching Short time analysis The speech signal is segmented into overlapping ‘Analysis Frames’ The speech signal is assumed to be stationary within this frame Q 31 Q 32 Q 33 Q 34

Feature Representation Preprocessing Feature Extraction Speaker model Matching Speech signal and spectrum of two users uttering ‘ONE’

Speaker Model F 1 = [a1…a10,b1…b10] F 2 = [a1…a10,b1…b10] F N = [a1…a10,b1…b10] …………….

Dynamic Time Warping Preprocessing Feature Extraction Speaker model Matching The DTW warping path in the n-by-m matrix is the path which has minimum average cumulative cost. The unmarked area is the constrain that path is allowed to go.

Results Distances are normalized w.r.t. length of the speech signal Intra speaker distance less than inter speaker distance Distance matrix is symmetric

Matlab Implementation

THANK YOU