Scott Settembre CSE 734 : Cyber Physical Spaces

Slides:

Advertisements

Similar presentations

Web 2.0 Programming 1 © Tongji University, Computer Science and Technology. Web Web Programming Technology 2012.

Advertisements

You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…

Final Project Instructor: Nguyen Anh Tu Students: Tran Tien Tai Tran Tien Tai Tran Ngoc Mai Tran Ngoc Mai Tu Kim Tuan Tu Kim Tuan Nguyen Ngoc Phuong Nguyen.

©2011 1www.id-book.com Evaluation studies: From controlled to natural settings Chapter 14.

Fatma Y. ELDRESI Fatma Y. ELDRESI ( MPhil ) Systems Analysis / Programming Specialist, AGOCO Part time lecturer in University of Garyounis,

Decision Support and Artificial Intelligence Jack G. Zheng May 21 st 2008 MIS Chapter 4.

Decision Support and Artificial Intelligence Jack G. Zheng July 11 th 2005 MIS Chapter 4.

Learning Introductory Signal Processing Using Multimedia 1 Outline Overview of Information and Communications Some signal processing concepts Tools available.

Slide 1 FastFacts Feature Presentation October 24, 2013 To dial in, use this phone number and participant code… Phone number: Participant.

Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.

Chapter 14 Feedback and Oscillator Circuits

1 State Wildlife Action Plans Wiki: Business Transformation Tutorial Brand Niemann July 5, 2008

1111 National Centre for First Nations Governance rebuilding our nations Facilitation Techniques.

Library 1 Electronic Resources in the EUI Library Veerle Deckmyn, Library Director Aimee Glassel, Electronic Resources Librarian September 2, 2009.

Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13

Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13

Title Subtitle.

DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.

MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.

Overview of Lecture Parametric vs Non-Parametric Statistical Tests.

Dr. Marc Valliant, VP & CTO

- A Powerful Computing Technology Department of Computer Science Wayne State University 1.

4. Internet Programming ENG224 INFORMATION TECHNOLOGY – Part I

Introduction Lesson 1 Microsoft Office 2010 and the Internet

The internet. Background Created in 1969, connected computers at UCLA, Stanford Research Institute, U. of Utah, and UC at Santa Barbara With an estimated.

1 Communication Methods Audio, video and chat. 2 Objectives Identify different methods of communication – non- verbal, audio, video, and chat Identify.

BT Wholesale October Creating your own telephone network WHOLESALE CALLS LINE ASSOCIATED.

Control and Feedback Introduction Open-loop and Closed-loop Systems

ABC Technology Project

Acoustic/Prosodic Features

ECATS RCCA CAMP PROCESS ENHANCEMENTS

Squares and Square Root WALK. Solve each problem REVIEW:

Lecture 8: Testing, Verification and Validation

Executional Architecture

GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.

Addition 1’s to 20.

25 seconds left…...

Princess Nora University Artificial Intelligence Artificial Neural Network (ANN) 1.

We will resume in: 25 Minutes.

Computer Vision Lecture 7: The Fourier Transform

Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),

Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors

4/25/2001ECE566 Philip Felber1 Speech Recognition A report of an Isolated Word experiment. By Philip Felber Illinois Institute of Technology April 25,

Fig. 2 – Test results Personal Memory Assistant Facial Recognition System The facial identification system is divided into the following two components:

Why is ASR Hard? Natural speech is continuous

A PRESENTATION BY SHAMALEE DESHPANDE

Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos

Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.

Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.

Speaker Recognition By Afshan Hina.

Chapter 14 Speaker Recognition 14.1 Introduction to speaker recognition 14.2 The basic problems for speaker recognition 14.3 Approaches and systems 14.4.

1 BILC SEMINAR 2009 Speech Recognition: Is It for Real? Tony Mirabito Defense Language Institute English Language Center (DLIELC) DLIELC.

Juan Ortega 10/20/09 NTS490. Speaker recognition is the computing task of validating a user’s claimed identity using characteristics extracted from their.

Supervisor: Dr. Eddie Jones Co-supervisor: Dr Martin Glavin Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification.

Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )

Performance Comparison of Speaker and Emotion Recognition

ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.

Speaker Verification System Middle Term Presentation Performed by: Barak Benita & Daniel Adler Instructor: Erez Sabag.

By: Nicole Cappella. Why I chose Speech Recognition  Always interested me  Dr. Phil Show Manti Teo Girlfriend Hoax  Three separate voice analysts proved.

PREPARED BY MANOJ TALUKDAR MSC 4 TH SEM ROLL-NO 05 GUKC-2012 IN THE GUIDENCE OF DR. SANJIB KR KALITA.

Speaker Recognition UNIT -6. Introduction  Speaker recognition is the process of automatically recognizing who is speaking on the basis of information.

Speech Recognition through Neural Networks By Mohammad Usman Afzal Mohammad Waseem.

BIOMETRICS VOICE RECOGNITION. Meaning Bios : LifeMetron : Measure Bios : LifeMetron : Measure Biometrics are used to identify the input sample when compared.

ARTIFICIAL NEURAL NETWORKS

Retrieval of audio testimonials via voice search

Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa

A maximum likelihood estimation and training on the fly approach

Auditory Morphing Weyni Clacken

Presentation transcript:

Scott Settembre ss424@cse.buffalo.edu CSE 734 : Cyber Physical Spaces Speaker Recognition Scott Settembre ss424@cse.buffalo.edu CSE 734 : Cyber Physical Spaces

Scott Settembre [ss424@cse.buffalo.edu] Overview Speaker Identification Speaker Validation Two types of Recognition methods Text dependent vs. Text independent Speaker Recognition steps Conclusion / References March 16, 2009 Scott Settembre [ss424@cse.buffalo.edu]

Speaker Identification Determines the speaker from a set of registered speakers This is called a “closed” set identification Result is the best speaker matched What if the speaker is not in the database? This is called an “open” set identification Result can be a speaker or a no-match result March 16, 2009 Scott Settembre [ss424@cse.buffalo.edu]

Speaker Identification Diagram Speaker Database Actual Speaker Input Enrollment Calculate similarity to each speaker template or model Identification of Speaker Normalization Feature Extraction Select best match March 16, 2009 Scott Settembre [ss424@cse.buffalo.edu]

Scott Settembre [ss424@cse.buffalo.edu] Overview Speaker Identification Speaker Validation Two types of Recognition methods Text dependent vs. Text independent Speaker Recognition steps Conclusion / References March 16, 2009 Scott Settembre [ss424@cse.buffalo.edu]

Scott Settembre [ss424@cse.buffalo.edu] Speaker Validation Also called “Verification” or “Authentication” Determines if the voice matches a particular registered speaker Result is the probability of a match or a similarity measure Similarity must exceed a particular threshold Higher threshold produces more false negatives Lower threshold produces more false positives Voice variability and security issues make this a difficult threshold value to determine (more later) March 16, 2009 Scott Settembre [ss424@cse.buffalo.edu]

Speaker Validation Diagram Speaker Database Speaker template or model Speaker ID Actual Speaker Input Enrollment Calculate similarity to given template or model Verification (Accept/Reject) Normalization Feature Extraction Does similarity exceed threshold? March 16, 2009 Scott Settembre [ss424@cse.buffalo.edu]

Scott Settembre [ss424@cse.buffalo.edu] Overview Speaker Identification Speaker Validation Two types of Recognition methods Text dependent vs. Text independent Speaker Recognition steps Conclusion / References March 16, 2009 Scott Settembre [ss424@cse.buffalo.edu]

Scott Settembre [ss424@cse.buffalo.edu] Recognition Methods Text Dependent Requires user to speak text spoken at enrollment Usually a name, password, or phrase Text Prompting is used to combat deception The system requires the user to repeat back a random phrase or list of numbers Video example from “CSAIL” - Spoken Language Systems group at MIT. March 16, 2009 Scott Settembre [ss424@cse.buffalo.edu]

Scott Settembre [ss424@cse.buffalo.edu] March 16, 2009 Scott Settembre [ss424@cse.buffalo.edu]

Recognition Methods, cont. Text Independent Non-invasive, does not require user to actively answer prompts Longer enrollment phase required, more training data needed Focuses on a subset of audio/phonetic features Video example from Nathan Harrington at IBM developerWorks. March 16, 2009 Scott Settembre [ss424@cse.buffalo.edu]

Scott Settembre [ss424@cse.buffalo.edu] March 16, 2009 Scott Settembre [ss424@cse.buffalo.edu]

Scott Settembre [ss424@cse.buffalo.edu] Overview Speaker Identification Speaker Validation Two types of Recognition methods Text dependent vs. Text independent Speaker Recognition steps Conclusion / References March 16, 2009 Scott Settembre [ss424@cse.buffalo.edu]

Speaker Recognition Steps Input Speech Normalize captured speech Feature extraction Similarity matching Decision/Threshold March 16, 2009 Scott Settembre [ss424@cse.buffalo.edu]

Scott Settembre [ss424@cse.buffalo.edu] Step 1. Input Speech Various fidelity from inputs Telephone, computer microphone, noise cancelling headset, dedicated capture microphone, room microphones Noise Background noise, room echoes Variability in voice Speaking manner (rate and volume), sickness, aging, emotions, morning vs. evening voice March 16, 2009 Scott Settembre [ss424@cse.buffalo.edu]

Step 2. Normalize Captured Speech Intersession variability and variability over time cause speech features to fluctuate Use of “filter bank” is common Normalization helps remove these variations, but at a price Parameter-Domain normalization Distance/Similarity-Domain normalization March 16, 2009 Scott Settembre [ss424@cse.buffalo.edu]

Step 2.a. Normalization Techniques Parameter-Domain normalization Spectral equalization (i.e. signal processing) Dampens large variations in features by averaging over time, useful for long utterances Removes some speaker specific features Distance/Similarity-Domain normalization Various techniques that use probabilities of known speakers that have already been enrolled Useful if you are doing validation March 16, 2009 Scott Settembre [ss424@cse.buffalo.edu]

Step 3. Feature Extraction The input utterance is converted to a set of feature vectors Time alignment may need to be done Calculate similarity between each captured vector with the registered speaker template or model Hello h he e el l lo o h he e el l lo o h he e el l lo o h h .90 similarity he he .60 similarity, .75 overall March 16, 2009 Scott Settembre [ss424@cse.buffalo.edu]

Side note : Analyzing speech “ah” Waveform (Raw acoustic data) Spectrograph (Frequency vs. Amplitude) Formant (Continuous peak that crosses frequencies) Image attributed to Dr. Douglas Roland from lecture notes describing speech recognition. March 16, 2009 Scott Settembre [ss424@cse.buffalo.edu]

Step 4. Similarity Matching Other pattern classification techniques can be used on the normalized input Each speaker gets his/her own HMM, neural network, VQ codebook, etc. Another approach is to target specific phonemes or features Example showing the targeting of vowel sounds, in particular the syllable “ah” March 16, 2009 Scott Settembre [ss424@cse.buffalo.edu]

Example of Vowel Comparisons Charts attributed to Pasich, C. Speaker Identification MATLAB files, Connexions Web site. http://cnx.org/content/m14201/1.3/, Feb 16, 2007. March 16, 2009 Scott Settembre [ss424@cse.buffalo.edu]

Step 5. Decision/Threshold For speaker identification, simply take the registered speaker template with the highest similarity score For speaker verification, there needs to be a minimum acceptable similarity score March 16, 2009 Scott Settembre [ss424@cse.buffalo.edu]

Scott Settembre [ss424@cse.buffalo.edu] Overview Speaker Identification Speaker Validation Two types of Recognition methods Text dependent vs. Text independent Speaker Recognition steps Conclusion / References March 16, 2009 Scott Settembre [ss424@cse.buffalo.edu]

Scott Settembre [ss424@cse.buffalo.edu] Conclusion : Why care? Speaker recognition will become ubiquitous Cell phone applications – banking, security, logins Forensic analysis (voiceprints) Home automation (know thy user) Google “speaker” search? (You know it’s going to happen!  ) March 16, 2009 Scott Settembre [ss424@cse.buffalo.edu]

Scott Settembre [ss424@cse.buffalo.edu] References Video links MIT, CSAIL. http://www.youtube.com/watch?v=0ec1Gtnlq1k IBM, developerWorks. http://www.youtube.com/watch?v=JJ_YzBaqzAo Cole, Ronald A., Editor (1996) Survey of the State of the Art in Human Language Technology. http://cslu.cse.ogi.edu/HLTsurvey/HLTsurvey.html Iyer, Manjunath Ramachandra (2007). “Differentially Fed Artificial Neural Networks for Speech Signal Prediction.” In Hector Perez-Meana, Editor. Advances in audio and speech signal processing : technologies and applications (pp. 309-323 ) Hershey, PA : Idea Group Pub., c2007. Lung, Shung-Yung (2007). “Speaker Recognition.” In Hector Perez-Meana, Editor. Advances in audio and speech signal processing : technologies and applications (pp. 371-407) Hershey, PA : Idea Group Pub., c2007. March 16, 2009 Scott Settembre [ss424@cse.buffalo.edu]