Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.

Similar presentations


Presentation on theme: "1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types."— Presentation transcript:

1 1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types

2 2 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training In HMM

3 3 Recognition Tasks Isolated Word Recognition (IWR) Connected Word (CW), And Continuous Speech Recognition (CSR) Connected Word (CW), And Continuous Speech Recognition (CSR) Speaker Dependent, Multiple Speaker, And Speaker Independent Vocabulary Size –Small <20 –Medium >100, 100, <1000 –Large >1000, 1000, <10000 –Very Large >10000

4 4 Speech Recognition Concepts NLP Speech Processing Text Speech NLP Speech Processing Speech Understanding Speech Synthesis Text Phone Sequence Speech Recognition Speech recognition is inverse of Speech Synthesis

5 5 Speech Recognition Approaches Bottom-Up Approach Top-Down Approach Blackboard Approach

6 6 Bottom-Up Approach Signal Processing Feature Extraction Segmentation Signal Processing Feature Extraction Segmentation Sound Classification Rules Phonotactic Rules Lexical Access Language Model Voiced/Unvoiced/Silence Knowledge Sources Recognized Utterance

7 7 Unit Matching System Top-Down Approach Feature Analysis Lexical Hypo thesis Syntactic Hypo thesis Semantic Hypo thesis Utterance Verifier/ Matcher Inventory of speech recognition units Word Dictionary Grammar Task Model Recognized Utterance

8 8 Blackboard Approach Environmental Processes Acoustic Processes Lexical Processes Syntactic Processes Semantic Processes Black board

9 9 Recognition Theories Articulatory Based Recognition –Use from Articulatory system for recognition –This theory is the most successful until now Auditory Based Recognition –Use from Auditory system for recognition Hybrid Based Recognition –Is a hybrid from the above theories Motor Theory –Model the intended gesture of speaker

10 10 Recognition Problem We have the sequence of acoustic symbols and we want to find the words that expressed by speaker Solution : Finding the most probable of word sequence by having Acoustic symbols

11 11 Recognition Problem A : Acoustic Symbols W : Word Sequence we should find so that

12 12 Bayse Rule

13 13 Bayse Rule (Cont’d)

14 14 Simple Language Model Computing this probability is very difficult and we need a very big database. So we use from Trigram and Bigram models.

15 15 Simple Language Model (Cont’d) Trigram : Bigram : Monogram :

16 16 Simple Language Model (Cont’d) Computing Method : Number of happening W3 after W1W2 Total number of happening W1W2 AdHoc Method :

17 17 Error Production Factor Prosody (Recognition should be Prosody Independent) Noise (Noise should be prevented) Spontaneous Speech

18 18 P(A|W) Computing Approaches Dynamic Time Warping (DTW) Hidden Markov Model (HMM) Artificial Neural Network (ANN) Hybrid Systems

19 Dynamic Time Warping

20

21

22

23 Search Limitation : Search Limitation : - First & End Interval - Global Limitation - Local Limitation

24 Dynamic Time Warping Global Limitation : Global Limitation :

25 Dynamic Time Warping Local Limitation : Local Limitation :

26 26 Artificial Neural Network...... Simple Computation Element of a Neural Network

27 27 Artificial Neural Network (Cont’d) Neural Network Types –Perceptron –Time Delay –Time Delay Neural Network Computational Element (TDNN)

28 28 Artificial Neural Network (Cont’d)... Single Layer Perceptron

29 29 Artificial Neural Network (Cont’d)... Three Layer Perceptron...

30 30 2.5.4.2 Neural Network Topologies

31 31TDNN

32 32 2.5.4.6 Neural Network Structures for Speech Recognition

33 33 2.5.4.6 Neural Network Structures for Speech Recognition

34 34 Hybrid Methods Hybrid Neural Network and Matched Filter For Recognition PATTERN CLASSIFIER Speech Acoustic Features Delays Output Units

35 35 Neural Network Properties The system is simple, But too much iteration is needed for training Doesn’t determine a specific structure Regardless of simplicity, the results are good Training size is large, so training should be offline Accuracy is relatively good

36 Pre-processing Different preprocessing techniques are employed as the front end for speech recognition systems The choice of preprocessing method is based on the task, the noise level, the modeling tool, etc. 36

37 37

38 38

39 39

40 40

41 41

42 42

43 43 روش MFCC روش MFCC مبتني بر نحوه ادراک گوش انسان از اصوات مي باشد. روش MFCC مبتني بر نحوه ادراک گوش انسان از اصوات مي باشد. روش MFCC نسبت به ساير ويژگِيها در محيطهاي نويزي بهتر عمل ميکند. روش MFCC نسبت به ساير ويژگِيها در محيطهاي نويزي بهتر عمل ميکند. MFCC اساساً جهت کاربردهاي شناسايي گفتار ارايه شده است اما در شناسايي گوينده نيز راندمان مناسبي دارد. واحد شنيدار گوش انسان Mel مي باشد که به کمک رابطه زير بدست مي آيد: واحد شنيدار گوش انسان Mel مي باشد که به کمک رابطه زير بدست مي آيد:

44 44 مراحل روش MFCC مرحله 1: نگاشت سيگنال از حوزه زمان به حوزه فرکانس به کمک FFT زمان کوتاه. مرحله 1: نگاشت سيگنال از حوزه زمان به حوزه فرکانس به کمک FFT زمان کوتاه. : سيگنال گفتارZ(n) : تابع پنجره مانند پنجره همينگW(n( W F = e -j2 π/F m : 0,…,F – 1; : طول فريم گفتاري.F

45 45 مراحل روش MFCC مرحله 2: يافتن انرژي هر کانال بانک فيلتر. M تعداد بانکهاي فيلتر مبتني بر معيار مل ميباشد. M تعداد بانکهاي فيلتر مبتني بر معيار مل ميباشد. تابع فيلترهاي بانک فيلتر است. تابع فيلترهاي بانک فيلتر است.

46 46 توزيع فيلتر مبتنی بر معيار مل

47 47 مراحل روش MFCC مرحله 4: فشرده سازي طيف و اعمال تبديل DCT جهت حصول به ضرايب MFCC مرحله 4: فشرده سازي طيف و اعمال تبديل DCT جهت حصول به ضرايب MFCC در رابطه بالا L،...،0=n مرتبه ضرايب MFCC ميباشد. در رابطه بالا L،...،0=n مرتبه ضرايب MFCC ميباشد.

48 48 روش مل-کپستروم روش مل-کپستروم Mel-scaling فریم بندی IDCT |FFT| 2 Low-order coefficients Differentiator Cepstra Delta & Delta Delta Cepstra سیگنال زمانی Logarithm

49 49 ضرایب مل کپستروم(MFCC)

50 50 ویژگی های مل کپستروم (MFCC) نگاشت انرژی های بانک فیلترمل درجهتی که واریانس آنها ماکسیمم باشد (با استفاده از DCT ) استقلال ویژگی های گفتار به صورت غیرکامل نسبت به یکدیگر(تاثیر DCT ) پاسخ مناسب در محیطهای تمیز کاهش کارایی آن در محیطهای نویزی

51 51 Time-Frequency analysis Short-term Fourier Transform –Standard way of frequency analysis: decompose the incoming signal into the constituent frequency components. –W(n): windowing function –N: frame length –p: step size

52 52 Critical band integration Related to masking phenomenon: the threshold of a sinusoid is elevated when its frequency is close to the center frequency of a narrow-band noise Frequency components within a critical band are not resolved. Auditory system interprets the signals within a critical band as a whole

53 53 Bark scale

54 54 Feature orthogonalization Spectral values in adjacent frequency channels are highly correlated The correlation results in a Gaussian model with lots of parameters: have to estimate all the elements of the covariance matrix Decorrelation is useful to improve the parameter estimation.

55 55 Cepstrum Computed as the inverse Fourier transform of the log magnitude of the Fourier transform of the signal The log magnitude is real and symmetric -> the transform is equivalent to the Discrete Cosine Transform. Approximately decorrelated

56 56 Principal Component Analysis Find an orthogonal basis such that the reconstruction error over the training set is minimized This turns out to be equivalent to diagonalize the sample autocovariance matrix Complete decorrelation Computes the principal dimensions of variability, but not necessarily provide the optimal discrimination among classes

57 57 Principal Component Analysis (PCA) Mathematical procedure that transforms a number of (possibly) correlated variables into a (smaller) number of uncorrelated variables called principal components (PC) Find an orthogonal basis such that the reconstruction error over the training set is minimized This turns out to be equivalent to diagonalize the sample autocovariance matrix Complete decorrelation Computes the principal dimensions of variability, but not necessarily provide the optimal discrimination among classes

58 58 PCA (Cont.) Algorithm Apply Transform Output = (R- dim vectors) Input= (N-dim vectors) Covariance matrix Transform matrix Eigen values Eigen vectors

59 59 PCA (Cont.) PCA in speech recognition systems

60 60 Linear discriminant Analysis Find an orthogonal basis such that the ratio of the between-class variance and within-class variance is maximized This also turns to be a general eigenvalue-eigenvector problem Complete decorrelation Provide the optimal linear separability under quite restrict assumption

61 61 PCA vs. LDA

62 62 Spectral smoothing Formant information is crucial for recognition Enhance and preserve the formant information: –Truncating the number of cepstral coefficients –Linear prediction: peak-hugging property

63 63 Temporal processing To capture the temporal features of the spectral envelop; to provide the robustness: –Delta Feature: first and second order differences; regression –Cepstral Mean Subtraction: For normalizing for channel effects and adjusting for spectral slope

64 64 RASTA (RelAtive SpecTral Analysis) –Filtering of the temporal trajectories of some function of each of the spectral values; to provide more reliable spectral features –This is usually a bandpass filter, maintaining the linguistically important spectral envelop modulation (1-16Hz)

65 65

66 66 RASTA-PLP

67 67

68 68


Download ppt "1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types."

Similar presentations


Ads by Google