Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 2.5.4.1 Basics of Neural Networks. 2 2.5.4.2 Neural Network Topologies.

Similar presentations


Presentation on theme: "1 2.5.4.1 Basics of Neural Networks. 2 2.5.4.2 Neural Network Topologies."— Presentation transcript:

1 1 2.5.4.1 Basics of Neural Networks

2 2 2.5.4.2 Neural Network Topologies

3 3

4 4

5 5TDNN

6 6 2.5.4.6 Neural Network Structures for Speech Recognition

7 7

8 8 3.1.1 Spectral Analysis Models

9 9

10 10 3.2 THE BANK-OF-FILTERS FRONT- END PROCESSOR

11 11 3.2 THE BANK-OF-FILTERS FRONT- END PROCESSOR

12 12 3.2 THE BANK-OF-FILTERS FRONT- END PROCESSOR

13 13 3.2 THE BANK-OF-FILTERS FRONT- END PROCESSOR

14 14 3.2 THE BANK-OF-FILTERS FRONT- END PROCESSOR

15 15 3.2.1 Types of Filter Bank Used for Speech Recognition

16 16 Nonuniform Filter Banks

17 17 Nonuniform Filter Banks

18 18 3.2.1 Types of Filter Bank Used for Speech Recognition

19 19 3.2.1 Types of Filter Bank Used for Speech Recognition

20 20 3.2.2 Implementations of Filter Banks Instead of direct convolution, which is computationally expensive, we assume each bandpass filter impulse response to be represented by: Instead of direct convolution, which is computationally expensive, we assume each bandpass filter impulse response to be represented by: Where w(n) is a fixed lowpass filter

21 21 3.2.2 Implementations of Filter Banks

22 22 3.2.2.1 Frequency Domain Interpretation of the Short- Time Fourier Transform

23 23 3.2.2.1 Frequency Domain Interpretation of the Short-Time Fourier Transform

24 24 3.2.2.1 Frequency Domain Interpretation of the Short-Time Fourier Transform

25 25 3.2.2.1 Frequency Domain Interpretation of the Short-Time Fourier Transform

26 26 Linear Filter Interpretation of the STFT

27 27 3.2.2.4 FFT Implementation of a Uniform Filter Bank

28 28 Direct implementation of an arbitrary filter bank

29 29 3.2.2.5 Nonuniform FIR Filter Bank Implementations

30 30 3.2.2.7 Tree Structure Realizations of Nonuniform Filter Banks

31 31 3.2.4 Practical Examples of Speech- Recognition Filter Banks

32 32 3.2.4 Practical Examples of Speech- Recognition Filter Banks

33 33 3.2.4 Practical Examples of Speech- Recognition Filter Banks

34 34 3.2.4 Practical Examples of Speech- Recognition Filter Banks

35 35 3.2.5 Generalizations of Filter-Bank Analyzer

36 36 3.2.5 Generalizations of Filter-Bank Analyzer

37 37 3.2.5 Generalizations of Filter-Bank Analyzer

38 38 3.2.5 Generalizations of Filter-Bank Analyzer

39 39

40 40

41 41

42 42

43 43

44 44

45 45 روش MFCC روش MFCC مبتني بر نحوه ادراک گوش انسان از اصوات مي باشد. روش MFCC مبتني بر نحوه ادراک گوش انسان از اصوات مي باشد. روش MFCC نسبت به ساير ويژگِيها در محيطهاي نويزي بهتر عمل ميکند. روش MFCC نسبت به ساير ويژگِيها در محيطهاي نويزي بهتر عمل ميکند. MFCC اساساً جهت کاربردهاي شناسايي گفتار ارايه شده است اما در شناسايي گوينده نيز راندمان مناسبي دارد. MFCC اساساً جهت کاربردهاي شناسايي گفتار ارايه شده است اما در شناسايي گوينده نيز راندمان مناسبي دارد. واحد شنيدار گوش انسان Mel مي باشد که به کمک رابطه زير بدست مي آيد : واحد شنيدار گوش انسان Mel مي باشد که به کمک رابطه زير بدست مي آيد :

46 46 مراحل روش MFCC مرحله 1: نگاشت سيگنال از حوزه زمان به حوزه فرکانس به کمک FFT زمان کوتاه. مرحله 1: نگاشت سيگنال از حوزه زمان به حوزه فرکانس به کمک FFT زمان کوتاه. : سيگنال گفتارZ(n) : تابع پنجره مانند پنجره همينگW(n( W F = e -j2 π/F m : 0,…,F – 1; : طول فريم گفتاري.F

47 47 مراحل روش MFCC مرحله 2: يافتن انرژي هر کانال بانک فيلتر. که M تعداد بانکهاي فيلتر مبتني بر معيار مل ميباشد. که M تعداد بانکهاي فيلتر مبتني بر معيار مل ميباشد. تابع فيلترهاي بانک فيلتر است. تابع فيلترهاي بانک فيلتر است.

48 48 توزيع فيلتر مبتنی بر معيار مل

49 49 مراحل روش MFCC مرحله 4: فشرده سازي طيف و اعمال تبديل DCT جهت حصول به ضرايب MFCC مرحله 4: فشرده سازي طيف و اعمال تبديل DCT جهت حصول به ضرايب MFCC در رابطه بالا L ،... ، 0=n مرتبه ضرايب MFCC ميباشد. در رابطه بالا L ،... ، 0=n مرتبه ضرايب MFCC ميباشد.

50 50 روش مل - کپستروم روش مل - کپستروم Mel-scaling فریم بندی IDCT |FFT| 2 Low-order coefficients Differentiator Cepstra Delta & Delta Delta Cepstra سیگنال زمانی Logarithm

51 51 Time-Frequency analysis Short-term Fourier Transform Short-term Fourier Transform Standard way of frequency analysis: decompose the incoming signal into the constituent frequency components. Standard way of frequency analysis: decompose the incoming signal into the constituent frequency components. W(n): windowing function W(n): windowing function N: frame length N: frame length p: step size p: step size

52 52 Critical band integration Related to masking phenomenon: the threshold of a sinusoid is elevated when its frequency is close to the center frequency of a narrow-band noise Related to masking phenomenon: the threshold of a sinusoid is elevated when its frequency is close to the center frequency of a narrow-band noise Frequency components within a critical band are not resolved. Auditory system interprets the signals within a critical band as a whole Frequency components within a critical band are not resolved. Auditory system interprets the signals within a critical band as a whole

53 53 Bark scale

54 54 Feature orthogonalization Spectral values in adjacent frequency channels are highly correlated Spectral values in adjacent frequency channels are highly correlated The correlation results in a Gaussian model with lots of parameters: have to estimate all the elements of the covariance matrix The correlation results in a Gaussian model with lots of parameters: have to estimate all the elements of the covariance matrix Decorrelation is useful to improve the parameter estimation. Decorrelation is useful to improve the parameter estimation.

55 55 Cepstrum Computed as the inverse Fourier transform of the log magnitude of the Fourier transform of the signal Computed as the inverse Fourier transform of the log magnitude of the Fourier transform of the signal The log magnitude is real and symmetric -> the transform is equivalent to the Discrete Cosine Transform. The log magnitude is real and symmetric -> the transform is equivalent to the Discrete Cosine Transform. Approximately decorrelated Approximately decorrelated

56 56 Principal Component Analysis Find an orthogonal basis such that the reconstruction error over the training set is minimized Find an orthogonal basis such that the reconstruction error over the training set is minimized This turns out to be equivalent to diagonalize the sample autocovariance matrix This turns out to be equivalent to diagonalize the sample autocovariance matrix Complete decorrelation Complete decorrelation Computes the principal dimensions of variability, but not necessarily provide the optimal discrimination among classes Computes the principal dimensions of variability, but not necessarily provide the optimal discrimination among classes

57 57 Principal Component Analysis (PCA) Mathematical procedure that transforms a number of (possibly) correlated variables into a (smaller) number of uncorrelated variables called principal components (PC) Mathematical procedure that transforms a number of (possibly) correlated variables into a (smaller) number of uncorrelated variables called principal components (PC) Find an orthogonal basis such that the reconstruction error over the training set is minimized Find an orthogonal basis such that the reconstruction error over the training set is minimized This turns out to be equivalent to diagonalize the sample autocovariance matrix This turns out to be equivalent to diagonalize the sample autocovariance matrix Complete decorrelation Complete decorrelation Computes the principal dimensions of variability, but not necessarily provide the optimal discrimination among classes Computes the principal dimensions of variability, but not necessarily provide the optimal discrimination among classes

58 58 PCA (Cont.) Algorithm Algorithm Apply Transform Output = (R- dim vectors) Input= (N-dim vectors) Covariance matrix Transform matrix Eigen values Eigen vectors

59 59 PCA (Cont.) PCA in speech recognition systems PCA in speech recognition systems

60 60 Linear discriminant Analysis Find an orthogonal basis such that the ratio of the between-class variance and within-class variance is maximized Find an orthogonal basis such that the ratio of the between-class variance and within-class variance is maximized This also turns to be a general eigenvalue- eigenvector problem This also turns to be a general eigenvalue- eigenvector problem Complete decorrelation Complete decorrelation Provide the optimal linear separability under quite restrict assumption Provide the optimal linear separability under quite restrict assumption

61 61 PCA vs. LDA

62 62 Spectral smoothing Formant information is crucial for recognition Formant information is crucial for recognition Enhance and preserve the formant information: Enhance and preserve the formant information: Truncating the number of cepstral coefficients Truncating the number of cepstral coefficients Linear prediction: peak-hugging property Linear prediction: peak-hugging property

63 63 Temporal processing To capture the temporal features of the spectral envelop; to provide the robustness: To capture the temporal features of the spectral envelop; to provide the robustness: Delta Feature: first and second order differences; regression Delta Feature: first and second order differences; regression Cepstral Mean Subtraction: Cepstral Mean Subtraction: For normalizing for channel effects and adjusting for spectral slope For normalizing for channel effects and adjusting for spectral slope

64 64 RASTA (RelAtive SpecTral Analysis) Filtering of the temporal trajectories of some function of each of the spectral values; to provide more reliable spectral features Filtering of the temporal trajectories of some function of each of the spectral values; to provide more reliable spectral features This is usually a bandpass filter, maintaining the linguistically important spectral envelop modulation (1-16Hz) This is usually a bandpass filter, maintaining the linguistically important spectral envelop modulation (1-16Hz)

65 65

66 66 RASTA-PLP

67 67

68 68


Download ppt "1 2.5.4.1 Basics of Neural Networks. 2 2.5.4.2 Neural Network Topologies."

Similar presentations


Ads by Google