By Sarita Jondhale1 Pattern Comparison Techniques.

Slides:

Advertisements

Similar presentations

Acoustic/Prosodic Features

Advertisements

Location Recognition Given: A query image A database of images with known locations Two types of approaches: Direct matching: directly match image features.

Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.

An Algorithm for Determining the Endpoints for Isolated Utterances L.R. Rabiner and M.R. Sambur The Bell System Technical Journal, Vol. 54, No. 2, Feb.

Perceptron Learning Rule

Computer Vision Lecture 16: Region Representation

DIGITAL IMAGE PROCESSING

ISWCS’06, Valencia, Spain 1 Blind Adaptive Channel Shortening by Unconstrained Optimization for Simplified UWB Receiver Design Authors: Syed Imtiaz Husain.

AN INVESTIGATION OF DEEP NEURAL NETWORKS FOR NOISE ROBUST SPEECH RECOGNITION Michael L. Seltzer, Dong Yu Yongqiang Wang ICASSP 2013 Presenter : 張庭豪.

Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.

Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.

Chapter 1: Introduction to Pattern Recognition

Exchanging Faces in Images SIGGRAPH ’04 Blanz V., Scherbaum K., Vetter T., Seidel HP. Speaker: Alvin Date: 21 July 2004.

LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Recognition of Human Gait From Video Rong Zhang, C. Vogler, and D. Metaxas Computational Biomedicine Imaging and Modeling Center Rutgers University.

Why is ASR Hard? Natural speech is continuous

A PRESENTATION BY SHAMALEE DESHPANDE

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Modulation of Waves (FM Radio, AM Radio and Television)

Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.

EE392J Final Project, March 20, Multiple Camera Object Tracking Helmy Eltoukhy and Khaled Salama.

Natural Language Understanding

Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.

For 3-G Systems Tara Larzelere EE 497A Semester Project.

EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.

Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.

1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.

By: Sepideh Abolghasem Shabnam Alaghehband Mina Khorram May 2006.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Introduction SNR Gain Patterns Beam Steering Shading Resources: Wiki:

Microphone Integration – Can Improve ARS Accuracy? Tom Houy

Technical Seminar Presented by :- Debabandana Apta (EC ) National Institute of Science and Technology [1] “ECHO CANCELLATION” Presented.

1 TEMPLATE MATCHING  The Goal: Given a set of reference patterns known as TEMPLATES, find to which one an unknown pattern matches best. That is, each.

Presented by Tienwei Tsai July, 2005

7-Speech Recognition Speech Recognition Concepts

Speech Enhancement Using Spectral Subtraction

Compiled By: Raj G Tiwari.  A pattern is an object, process or event that can be given a name.  A pattern class (or category) is a set of patterns sharing.

By Sarita Jondhale1 Signal Processing And Analysis Methods For Speech Recognition.

Experimental Results ■ Observations:  Overall detection accuracy increases as the length of observation window increases.  An observation window of 100.

Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.

The Use of Context in Large Vocabulary Speech Recognition Julian James Odell March 1995 Dissertation submitted to the University of Cambridge for the degree.

December 9, 2014Computer Vision Lecture 23: Motion Analysis 1 Now we will talk about… Motion Analysis.

1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.

Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.

Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments 張智星

By Sarita Jondhale 1 The process of removing the formants is called inverse filtering The remaining signal after the subtraction of the filtered modeled.

Autonomous Robots Vision © Manfred Huber 2014.

Performance Comparison of Speaker and Emotion Recognition

Single Correlator Based UWB Receiver Implementation through Channel Shortening Equalizer By Syed Imtiaz Husain and Jinho Choi School of Electrical Engineering.

By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.

Irfan Ullah Department of Information and Communication Engineering Myongji university, Yongin, South Korea Copyright © solarlits.com.

NOISE. NOISE AND DISTORTION NOISE : Noise can be defined as an unwanted signal that interferes with the communication of another signal. A noise itself.

Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:

Performance of Digital Communications System

Feature Transformation and Normalization Present by Howard Reference : Springer Handbook of Speech Processing, 3.3 Environment Robustness (J. Droppo, A.

1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.

1 A Statistical Matching Method in Wavelet Domain for Handwritten Character Recognition Presented by Te-Wei Chiang July, 2005.

Portable Camera-Based Assistive Text and Product Label Reading From Hand-Held Objects for Blind Persons.

Speech and Singing Voice Enhancement via DNN

Speech Recognition UNIT -5.

Speech Enhancement with Binaural Cues Derived from a Priori Codebook

Pattern Recognition Sergios Theodoridis Konstantinos Koutroumbas

PERFORMANCE ANALYSIS OF SPECTRUM SENSING USING COGNITIVE RADIO

PRAKASH CHOCKALINGAM, NALIN PRADEEP, AND STAN BIRCHFIELD

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

A maximum likelihood estimation and training on the fly approach

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Presenter: Shih-Hsiang(士翔)

Presentation transcript:

By Sarita Jondhale1 Pattern Comparison Techniques

By Sarita Jondhale2 Pattern Comparison Techniques The output of the front end spectral analysis is in the form of vectors. The test pattern T is the set containing many vectors. The reference pattern R is the set containing many vectors. The goal of pattern comparison stage is to determine the dissimilarity of each vector in T to each vector of R The reference pattern should be such that there should be minimum dissimilarity

By Sarita Jondhale3 Pattern Comparison Techniques To determine the global similarity of T and R we will consider the following problems: T and R generally are of unequal length w.r.t. time duration due to different speaking rates across different talkers T and R need not line up in time in any simple or well prescribed manner this is because different sounds cannot be varied in duration to same degree. Vowels are easily lengthened or shortened but consonants cannot change in duration We need a way to compare a spectral vectors

By Sarita Jondhale4 Speech Detection Also called as End point detection The goal of speech detection is to separate speech signal with a background signal. The need of speech detection occurs in many applications in telecommunications For automatic speech recognition, end point detection is required to isolate the speech of interest so as to be able to create a speech pattern or template.

By Sarita Jondhale5 Speech Detection Speech must be detected so as to provide the best patterns for the recognition Best patterns means which provides highest recognition accuracy

By Sarita Jondhale6 Speech Detection Accurate detection of speech is a simple problem when speech is produced in a relatively noise free environment It becomes difficult task when the environment is noisy

By Sarita Jondhale7 Speech Detection First factor: during speech, talker produces sound like lip smacks, heavy breathing and mouth clicks Mouth click with speaking: The mouth click is produced by opening the lips prior to speaking or after speaking, the noise of clicking is separate from the speech signal and the energy level is comparable to speech energy signal

By Sarita Jondhale8

9

10 Speech Detection Heavy breathing with speaking: unlike the mouth click the heavy breathing noise is not separated from the speech and therefore makes accurate end point detection quite difficult

By Sarita Jondhale11 Speech Detection Second factor: environmental noise The ideal environment for talking is the quite room with no acoustic noise signal generators other than that produced by the speaker.

By Sarita Jondhale12 Speech Detection Ideal environment is not possible practically Have to consider speech produced In noisy backgrounds (fans, machinery) In non stationary environments (presence of door slams, irregular road noise, car horns) With speech interference ( as from TV, radio, or background conversations) And in hostile circumstances ( when the speaker is stressed)

By Sarita Jondhale13 Speech Detection These interfering signals are some what like speech signals therefore accurate end point detection become difficult

By Sarita Jondhale14 Speech Detection Third factor: distortion introduced by the transmission system over which speech signal is sent.

By Sarita Jondhale15 Speech Detection The methods for speech detection is broadly classified into three approaches The explicit approach The implicit approach The hybrid approach

By Sarita Jondhale16 The explicit approach

By Sarita Jondhale17 The explicit approach The speech signal is first measured and feature measurement is made The speech detection method is then applied to locate and define the speech events The detected speech is sent to the pattern comparison algorithm, and finally the decision mechanism chooses the recognized word

By Sarita Jondhale18 The explicit approach For signals with a stationary and low level noise background, the approach produces reasonably good detection accuracy The approach fails often when the environment is noisy or the interference in non stationary

By Sarita Jondhale19 The implicit approach

By Sarita Jondhale20 The implicit approach This approach detects the speech detection problem simultaneously with the pattern matching and recognition-decision process It recognizes that the speech events are almost always accompanied by a certain acoustic background

By Sarita Jondhale21 The implicit approach The unmarked signal sequence is processed by the pattern matching module in which all possible end points sets are considered The decision mechanism provides ordered list of the candidate words as well as corresponding speech locations The final result is best candidate and its associated end points.

By Sarita Jondhale22 The implicit approach Depending on the word recognized the boundary locations could inherently be different with the implicit method (feedback) With explicit method only a single choice of boundary locations is made

By Sarita Jondhale23 The implicit approach

By Sarita Jondhale24 The implicit approach Advantages & disadvantages Requires heavy computations But offers higher detection accuracy than the explicit approach

By Sarita Jondhale25 The hybrid approach This is the combination of both implicit and explicit approaches Uses the explicit method to obtain several end points sets for recognition processing and implicit method to choose the alternatives The most likely candidate word and the corresponding end points as in implicit approach, are provided by the decision box.

By Sarita Jondhale26 The hybrid approach

By Sarita Jondhale27 The hybrid approach Computational load is equivalent to explicit method And accuracy comparable to implicit method

By Sarita Jondhale28 Speech activity detection algorithm

By Sarita Jondhale29 Speech activity detection algorithm Adaptive level equalization module: estimates the level of the acoustic background and uses the result to equalize the measured energy contour Preliminary energy pulses, which are speech like bursts are detected from the equalized energy contour Finally, these energy pulse end points are ordered to determine the possible sets of word end point pairs Contour: A line drawn on a map connecting points of equal height