Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.

Slides:

Advertisements

Similar presentations

Presented by Erin Palmer. Speech processing is widely used today Can you think of some examples? Phone dialog systems (bank, Amtrak) Computers dictation.

Advertisements

© Fraunhofer FKIE Corinna Harwardt Automatic Speaker Recognition in Military Environment.

Masters Presentation at Griffith University Master of Computer and Information Engineering Magnus Nilsson

Hierarchy of Design Voice Controlled Remote Voice Input Control Path Speech Processing IR Interface.

Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.

Frederico Rodrigues and Isabel Trancoso INESC/IST, 2000 Robust Recognition of Digits and Natural Numbers.

Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors

Sean Powers Florida Institute of Technology ECE 5525 Final: Dr. Veton Kepuska Date: 07 December 2010 Controlling your household appliances through conversation.

4/25/2001ECE566 Philip Felber1 Speech Recognition A report of an Isolated Word experiment. By Philip Felber Illinois Institute of Technology April 25,

A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.

Real-Time Speech Recognition Thang Pham Advisor: Shane Cotter.

Why is ASR Hard? Natural speech is continuous

A PRESENTATION BY SHAMALEE DESHPANDE

Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos

Advisor: Prof. Tony Jebara

Natural Language Processing and Speech Enabled Applications by Pavlovic Nenad.

Representing Acoustic Information

ISSUES IN SPEECH RECOGNITION Shraddha Sharma

Acoustic and Linguistic Characterization of Spontaneous Speech Masanobu Nakamura, Koji Iwano, and Sadaoki Furui Department of Computer Science Tokyo Institute.

Introduction to Automatic Speech Recognition

Eng. Shady Yehia El-Mashad

Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.

1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.

Speech and Language Processing

7-Speech Recognition Speech Recognition Concepts

1 Computational Linguistics Ling 200 Spring 2006.

Chapter 14 Speaker Recognition 14.1 Introduction to speaker recognition 14.2 The basic problems for speaker recognition 14.3 Approaches and systems 14.4.

International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.

A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.

By: Meghal Bhatt.  Sphinx4 is a state of the art speaker independent, continuous speech recognition system written entirely in java programming language.

Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.

Implementing a Speech Recognition System on a GPU using CUDA

Jacob Zurasky ECE5526 – Spring 2011

Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal, in EuroSpeech 99.

Supervisor: Dr. Eddie Jones Co-supervisor: Dr Martin Glavin Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification.

Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,

Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.

Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )

A NEW FEATURE EXTRACTION MOTIVATED BY HUMAN EAR Amin Fazel Sharif University of Technology Hossein Sameti, S. K. Ghiathi February 2005.

Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.

Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.

Look who’s talking? Project 3.1 Yannick Thimister Han van Venrooij Bob Verlinden Project DKE Maastricht University.

Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.

A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.

Performance Comparison of Speaker and Emotion Recognition

EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,

Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.

ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.

DTW for Speech Recognition J.-S. Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, Tsing Hua Univ. ( 清華大學.

Natural Language and Speech (parts of Chapters 8 & 9)

Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.

Speaker Verification System Middle Term Presentation Performed by: Barak Benita & Daniel Adler Instructor: Erez Sabag.

1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.

PREPARED BY MANOJ TALUKDAR MSC 4 TH SEM ROLL-NO 05 GUKC-2012 IN THE GUIDENCE OF DR. SANJIB KR KALITA.

BIOMETRICS VOICE RECOGNITION. Meaning Bios : LifeMetron : Measure Bios : LifeMetron : Measure Biometrics are used to identify the input sample when compared.

Automatic speech recognition What is the task? What are the main difficulties? How is it approached? How good is it? How much better could it be? 2/34.

Natural Language Processing and Speech Enabled Applications

ARTIFICIAL NEURAL NETWORKS

Artificial Intelligence for Speech Recognition

3.0 Map of Subject Areas.

Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan

Speech Processing Speech Recognition

Neuro-Fuzzy and Soft Computing for Speaker Recognition (語者辨識)

Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa

Digital Systems: Hardware Organization and Design

汉语连续语音识别年1月4日访北京工业大学 973 Project 2019/4/17 汉语连续语音识别年1月4日访北京工业大学郑方清华大学计算机科学与技术系语音实验室

Speaker Identification:

Keyword Spotting Dynamic Time Warping

Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems NDSS 2019 Hadi Abdullah, Washington Garcia, Christian Peeters, Patrick.

Presentation transcript:

Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The active topics of speech recognition  7.4 The basic framework of speech recognition system

7.1 The main form and application of speech recognition (1)  Speech Recognition --Inputs speech string and generates corresponding word or text of word string or transcription  Speech Understanding--Inputs speech string and generates corresponding response or actions  Speaker Recognition--Inputs speech string and identifies or verifies the speaker  Language Identification--Inputs speech string and identifies which language the input belongs to

The main form and application of speech recognition (2)  Speech Recognition  Speech Navigation Computer operation, Speech control, Intelligent toys, and Parcels dispatch  Speech Dictation Dictation machine, Speech dialing, and Broadcasting recording

The main form and application of speech recognition (3)  Speech Understanding  Speech Service For disables, Banking, Traveling, Transportation, in case dialog is needed  Speech Communication Bilingual speech communication and Multilingual simultaneous interpretation

The main form and application of speech recognition (4)  Speaker Recognition  Speaker Verification Accessing to the security department or program, Banking and other service  Speaker Identification User recognition, Voice checking for searching the criminals

7.2 The main factors of speech recognition (1)  Speech Style  Isolated Words (IWR)--There are obvious pause (or silence) between words, for example names of person or place, commands  Connected Word Speech (CWR) --For example continuous digit string (telephone numbers or data)  Continuous Speech (CSR) --Natural spoken language in sentence (or utterance). The easy degree is : CSR<<CWR<<IWR

The main factors of speech recognition (2)  Speaker Dependent (SD)  Speaker Dependent Recognition System only can recognize the speech by one or a couple of speaker. The speech model is trained only by the speaker’s speech samples (corpus)  Speaker Independent (SI)  Speaker Independent Recognition System can recognize speech by any speaker. In this case, the speech model is trained by many speaker’s corpuses and speaker adaptation for recognition will improve the performance. It is much harder than SD.

The main factors of speech recognition (3)  Vocabulary Size  Small Vocabulary --containing several hundred words  Middle Vocabulary--containing a thousand to several thousand words  Large Vocabulary--more than 10 thousand words

The main factors of speech recognition (4)  Other factors :  Speech Quality-Microphone speech or telephone speech, recording environment, speaker’s cooperation  Task--Word Recognition, Transcription, Word Spotting, Dialog and Translation are very different task  Domain (specific or generic) and Syntax Constraints (less or more)

7.3 The active topics of speech recognition  Broadcasting Recording Systems  Telephone Dialog Systems  Speaker Adaptation  Noise Reduction  Word Spotting  Language Models Based on Classes

7.4 The basic framework of speech recognition system (1)  Input--Speech string (utterance) through microphone or telephone { x’(n) }  Preprocessing--Windowing, Framing and Pre-emphasizing { x i (n) }  Feature Extraction--Feature vector calculation frame by frame { o i }  Decision Making--Simple algorithm such as minimal distance classifier to complex one such as HMM (statistical acoustic and language models).

The basic framework of speech recognition system(2)  Input  Anti-aliasing filter with 300-4KHz  Sampling rate : 8KHz (telephone speech) to 16KHz (microphone speech)  Sampling precision : 8 bits (telephone speech) to 16 bits (microphone speech)  Sampling starting and ending determination (silence detecting and memory buffer to use)

The basic framework of speech recognition system (3)  Preprocessing  Window selection and windowing  Framing--frame length and frame shift selection (typical 25ms and 10ms)  Pre-emphasizing y(n) = x(n) – αx(n-1)  αis close to 1.0 (0.95 or 0.97), for simplicity it could be 15/16 ≈  The goal is high frequency enhancement

The basic framework of speech recognition system(4)  Feature Extraction  There are a couple of way to get feature vector, here only one is given –MFCC( mel-scale frequency cepstrum coefficients)  The steps to get MFCC for one frame :  1. FFT (by padding 0) to get X(k) :  X[k] = Σ n=0 N-1 x[n]exp(-j2πnk/N), k=0~N-1  2. Using M filters, with the log-energy S[m] of filter m being computed via the convolution of the power spectrum S[k]=|X[k]| 2 with a filter H m [k] :

The basic framework of speech recognition system (5)  S[m] = log[S[k]*H m [k]] m=0~N-1 where H m [k]>=0 and Σ k=0 N-1 H m [k] = 1.  Typically H m [k]are chosen as triangular filters:  H m [k] = 0k<f[m-1]  =2(k-f[m-1])/[(f[m+1]-f[m-1])(f[m]-f[m-1])] f[m-1]<=k<f[m]  =2(f[m+1]-k)/[f[m+1]-f[m-1])(f[m+1]-f[m])] f[m]<=k<bf[m+1]  =0k>f[m+1]  So that Σ k=0 N-1 H m [k] = 1 for all m where the boundary points f[m] are uniformly spaced in the mel-scale.

The basic framework of speech recognition system (6)  The mel frequency cepstrum is the DCT of the m filter outputs :  c[n] = Σ k=0 m-1 S[k]cos(πn(k+1/2)/m), n=0~m-1  M=24~40, but n is truncated to about 12.  Besides the 12 coefficients, their first and second order of differences are often used as feature vector components too. The total number of the components is about 36~39.

The basic framework of speech recognition system (7)  Decision Making This is the last step to determine what is in the input speech string. Now for isolated word system the template matching is still used. For connected speech or continuous speech the statistical model (HMM and others ) is used. We will discuss them later in details.