A PRESENTATION BY SHAMALEE DESHPANDE

Slides:

Advertisements

Similar presentations

Building an ASR using HTK CS4706

Advertisements

Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),

Masters Presentation at Griffith University Master of Computer and Information Engineering Magnus Nilsson

Vineel Pratap Girish Govind Abhilash Veeragouni. Human listeners are capable of extracting information from the acoustic signal beyond just the linguistic.

Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors

A 12-WEEK PROJECT IN Speech Coding and Recognition by Fu-Tien Hsiao and Vedrana Andersen.

Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.

Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.

Natural Language Processing - Speech Processing -

1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.

ASR Intro: Outline ASR Research History Difficulties and Dimensions Core Technology Components 21st century ASR Research (Next two lectures)

Neural Net Algorithms for SC Vowel Recognition Presentation for EE645 Neural Networks and Learning Algorithms Spring 2003 Diana Stojanovic.

A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.

4/25/2001ECE566 Philip Felber1 Speech Recognition A report of an Isolated Word experiment. By Philip Felber Illinois Institute of Technology April 25,

COMP 4060 Natural Language Processing Speech Processing.

Voice Transformation Project by: Asaf Rubin Michael Katz Under the guidance of: Dr. Izhar Levner.

Dynamic Time Warping Applications and Derivation

Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.

Real-Time Speech Recognition Thang Pham Advisor: Shane Cotter.

Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,

Representing Acoustic Information

Audio Processing for Ubiquitous Computing Uichin Lee KAIST KSE.

Topics covered in this chapter

Isolated-Word Speech Recognition Using Hidden Markov Models

Speech Signal Processing

Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.

Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.

Speaker Recognition By Afshan Hina.

Multimedia Specification Design and Production 2013 / Semester 2 / week 3 Lecturer: Dr. Nikos Gazepidis

June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST.

Douglas A. Reynolds, PhD Senior Member of Technical Staff

1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.

1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.

International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.

A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.

1 Linear Prediction. 2 Linear Prediction (Introduction) : The object of linear prediction is to estimate the output sequence from a linear combination.

1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:

LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.

Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.

Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.

Advanced Topics in Speech Processing (IT60116) K Sreenivasa Rao School of Information Technology IIT Kharagpur.

Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )

Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.

Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.

Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.

Performance Comparison of Speaker and Emotion Recognition

EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,

Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.

Predicting Voice Elicited Emotions

BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.

By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.

1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.

Speaker Verification System Middle Term Presentation Performed by: Barak Benita & Daniel Adler Instructor: Erez Sabag.

EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,

Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:

Speaker Recognition UNIT -6. Introduction  Speaker recognition is the process of automatically recognizing who is speaking on the basis of information.

Guided By, DINAKAR DAS.C.N ( Assistant professor ECE ) Presented by, ARUN.V.S S7 EC ROLL NO: 2 1.

BIOMETRICS VOICE RECOGNITION. Meaning Bios : LifeMetron : Measure Bios : LifeMetron : Measure Biometrics are used to identify the input sample when compared.

ARTIFICIAL NEURAL NETWORKS

Speech Recognition UNIT -5.

Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.

Cepstrum and MFCC Cepstrum MFCC Speech processing.

Linear Prediction.

Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan

PROJECT PROPOSAL Shamalee Deshpande.

Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa

Digital Systems: Hardware Organization and Design

Speech Processing Final Project

Keyword Spotting Dynamic Time Warping

Dr. Babasaheb Ambedkar Marathwada University, Aurangabad

Presentation transcript:

A PRESENTATION BY SHAMALEE DESHPANDE SPEAKER RECOGNITION A PRESENTATION BY SHAMALEE DESHPANDE

INTRODUCTION Speaker Recognition * Automatically recognizing speaker * Uses individual information from the speaker’s speech waves

INTRODUCTION Two Approaches Text-Dependant Recognition Text-Independent Recognition

INTRODUCTION Two Approaches Text-Dependant Recognition *Use of keywords or sentences having the same text for the templates and the recognition Text-Independent Recognition

INTRODUCTION Text-Dependant Recognition Text-Independent Recognition Two Approaches Text-Dependant Recognition Text-Independent Recognition * Does not rely on a specific text being spoken.

INTRODUCTION Classes of Sound: Voiced, unvoiced, Plosive Production of Pitch Frequency and Formants Glottal Waveform

BLOCK DIAGRAM OF A SPEAKER RECOGNITION SYSTEM

DESIRABLE ATTRIBUTES OF A SPEAKER RECOGNITION SYS Feature should occur naturally and frequently in speech Easily measurable Doesn’t change over time or be affected by speakers health Isn’t affected by background noise Not be subject to mimicry

SOURCES OF VARIABILITY IN SPEECH Phonetic Identity Two samples may correspond to different phonetic segments. E.g. Vowel and fricative Pitch Pitch, other features like breathiness and amplitude can be varied Speaker Differences due to source physiology, emotions Microphone Environment

Possible Acoustic Parameters * Formant Frequencies * LPC * Pitch * Nasal Co articulation * Gain

COMMON SPEAKER RECOGNITION TECHNIQUES DISCRETE FOURIER TRANSFORM LINEAR PREDICTIVE CODING CEPSTRAL ANALYSIS DYNAMIC TIME WARPING HIDDEN MARKOV MODELS

DISCRETE / FAST FOURIER TRANSFORM Changes time domain signals into freq domain signal representations Enables reduced complexity for processor Read N speech samples from input Append N-L zeroes to the input data Calculation of DFT Windowing

LINEAR PREDICTIVE CODING TUBE Vocal tract BUZZER Glottal excitation Characterized by intensity and pitch Characterized by formants LPC model of the speech producing organs of the body

CEPSTRAL ANALYSIS Dis-adv of DFT/FFT is that formant freqs may shift the pitch or overlap it In Cepstral analysis, formants are completely removed from the spectrum Defined as Fourier Transform of the Log of the power spectrum S(n) = p(n) * v(n) X(n) = w(n) * s(n) S’(w) = p’(w) * v’(w) Fourier Transform Log S’(w)=log p’(w) + log v’(w) C(q)= log S’(q) = log p’(q) + log v’(q) Q – quefrency , C(q) – complex cepstrum

CEPSTRAL ANALYSIS Window DFT LOG IDFT Speech Cepstrum

DYNAMIC TIME WARPING Incoming speech is usually compared frame by frame with stored template Achieved via a pair wise comparison of feature vectors from each sequence Dis Adv – variation in length of corresponding phonemes DTW takes into account non linear relation between lengths of the two signals Used as a matching algorithm Example DTW grid

HIDDEN MARKOV MODELS Speech signal is identified during search process rather than explicitly Comprises of – Hidden Markov Chain representing temporal variability Observable process representing spectral variability Portrayed as stochastic pair (X,Y) HMM is a Finite State Machine where a Probability Density Function p(x|s) is associated with each state s

FUTURE RESEARCH To extract and apply all levels and information from the speech signal conveying speaker identity Acoustic – use spectral features conveying vocal tract information Prosodic - use features derived from pitch, energy tracks to classify information Phonetic – use phone sequences to characterize speaker specific pronunciations Idiolect – use words to characterize user specific word patterns Linguistic – use linguistic patterns to characterize speaker specific conversation style

APPLICATIONS Access Control- physical facilities, computer networks and websites PC Login and Password Reset Secured Transactions – remote banking and online credit card purchase authentication Time Attendance - workplaces Law Enforcement – forensics, parole