Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 INTRODUCTION METHODSRESULTSCONCLUSION Noise Robust Speech Recognition Group SB740 Noise Robust Speech Recognition Group SB740.

Similar presentations


Presentation on theme: "1 INTRODUCTION METHODSRESULTSCONCLUSION Noise Robust Speech Recognition Group SB740 Noise Robust Speech Recognition Group SB740."— Presentation transcript:

1 1 INTRODUCTION METHODSRESULTSCONCLUSION Noise Robust Speech Recognition Group SB740 Noise Robust Speech Recognition Group SB740

2 2 INTRODUCTION METHODSRESULTSCONCLUSION Standard feature extraction FramingFFTFilter Bank Cepstrum Coefficients speech features

3 3 INTRODUCTION METHODSRESULTSCONCLUSION Improved feature extraction Filter Bank Cepstrum Coefficients Framed FFT spectrum features Pre- Processing Post- Processing

4 4 INTRODUCTION METHODS RESULTSCONCLUSION Pre-Processing Quantile Based Noise Estimation for spectral subtraction (QBNE) Pre-Processing Quantile Based Noise Estimation for spectral subtraction (QBNE) Assuming that each frequency band contain only noise in a fraction of time even during speech For each frequency band the frames are sorted by amplitude A fixed q-value equal for all frequency bands Intersection between the vertical line and each frequency band is the noise estimate Problem with mis-matched training and test conditions

5 5 INTRODUCTION METHODS RESULTSCONCLUSION Pre-Processing Adaptive Quantile Based Noise Estimation for spectral subtraction (AQBNE) Goal is to improve the performance when training with low noise and testing with high noise Adapt to the utterance and noise levels Adjust the q-value for each frequency band Result is a q-estimation curve as opposed to a fixed value High and low noise situations will converge to similar representations

6 6 INTRODUCTION METHODS RESULTSCONCLUSION Filter Bank Speech Band Emphasizing Filter Bank (SBE) Mel Frequency Cepstrum Coefficient (MFCC) –Motivated from human perception and critical bands Mel Frequency Filter Bank –Triangular filters –Highest resolution at low frequencies –Resulting Importance Function Speech Band Emphasizing Filter Bank –Emphasizes the primary speech band –Highest resolution at 1500 Hz

7 7 INTRODUCTIONMETHODS RESULTS CONCLUSION Results QBNE with Mel Frequency Filter Bank showed an improvement of 15% AQBNE with SBE Filter Bank showed an improvement of 28% AQBNE with SBE Filter Bank showed a remarkable result under highly mis- matched conditions: 80% improvement compared to 21% when using QBNE with Mel Frequency Filter Bank

8 8 INTRODUCTIONMETHODSRESULTSCONCLUSION Conclusion AQBNE avoids describing speech signals during training to a level of detail which is unattainable during testing under noisy conditions The suggested SBE Filter Bank, though empirically chosen, indicates that filter distributions other than the standard Mel-scale may attain improved performance in noisy conditions

9 9 Presentation of Abstract Agenda:  Purpose of the abstract.  Structure of the abstract.  Content of the abstract.

10 10 Purpose of abstract Announcement to the 17 th 7 semester conference the 21th of December 2004. Appetizer to attract the right audience. In the abstract it is kept in mind that the audience for this project is other 7 semester students from the institute of electronic systems in Aalborg and Esbjerg.

11 11 Structure of the abstract Title:  Topic:The long title gives a detailed description of the content: ”Noise Robust Automatic Speech Recognition with Adaptive Quantile Based Noise Estimation and Speech Band Emphasizing Filter Bank”  Nature: Noise estimation.  Scope: Automatic speech recognition. Text is structured as IMRaD structure.

12 12 Structure of the abstract Throughout the text important keywords are used:  ASR, Noise Estimation, Feature Extraction. Known methods presented before new methods to create continuity. Complexity increased during the abstract.

13 13 Content of the abstract Introduction:  Contains information of the initial problem, the proposes made in the paper and field of operation.  This is the shortest section in the abstract, but contains a lot of keywords.

14 14 Content of the abstract Methods:  This section is the longest of the abstract, and contains references to known methods as well as new methods and solutions are introduced.  The first sentence in this section is linket to the introduction by the phrase ”feature extraction”.  This section ends with an advertisment to the results.

15 15 Content of the abstract Results:  The methods that have improved the recognition performance is presented first.  The best result is mentioned with the exact result compared to known methods.  The proposed solutions that have not improved the recognition is mentioned last in the section.

16 16 Content of the abstract Discussion:  First the method that did not improve the recognition performance is explained.  Secondly the methods that have improved the recognition performance are described.  The abstract is concluded by the recommendations based on the results achieved in this project.

17 17

18 18 Structure of Paper IMRaD model  Introduction- Introduction  Methods- Methods (PP, QBNE, AQBNE, SBE)  Results- Experimental framework - Experimental results  Discussion- Conclusion

19 19 Introduction Problem definition  Noise in speech signals has a dramatic effect on ASR. Analysis  Analysis of known methods.  Interesting known methods (PP, QBNE, MFCC).  Results: Develop new methods and combine different methods.

20 20 Methods Known methods  PP – Short presentation of method and implementation.  QBNE – Short presentation of method and thorough description of implementation. New methods  AQBNE and SBE – Motivation (Why is this a good method?) – Implementation (Compared to QBNE and MFCC)

21 21 Results Description of measurement instrument (HTK) and SpeechDat-Car database. Results in tables

22 22 Results Discussion of results in text. Chosen results in graph.

23 23 Conclusion Contains a summary of the important results, so it can be read and understood right after reading the abstract.

24 24 Worksheets Agenda:  Structure and organization  Brief presentation of worksheets

25 25 Structure and organization The worksheets are basis for the paper and the implementation of our system  Directly information about methods  Necessary background knowledge Give the group members the necessary knowledge to understand a subject Write in english The topic of the project was completely new to us  Impossible to plan work for a long time period  Discuss subjects, study, discuss new subjects Writing procedure:  The group discusses which subjects that need to be investigated  1-2 persons work together and write a work sheet  The group read and give feedback  1 person finish it

26 26 Brief presentation of work sheets 1. Introduction  State the aim of the project and our initial problem 2. Speech production  Human speech characteristics 3. Hidden Markov Model  Often used in speech recognition systems 4. Unwanted noise and effects  Noise and affects that can affect our system 5. Java execution speed test  Consideration of implementation language 6. Java processor blocks  Documents the implementation of our system 7. Matlab related  How to read sound files from SpeechDat-Car database

27 27 Brief presentation of work sheets 8. Frontend Interfaces  Input: SpeechDat-Car audio wave format, Output: HTK format 9. The standard frontend  Transformation of the sampled audio data into freature vectors 10. Post-Processing 11. The Mel filterbank 12. Quantile Based Noise Estimation 13. Spectral subtraction 14. Experimental framework  How we have tested the methods influence on the speech recognition 15. Experimental results  Describes our baseline and refer to App. A 16. Structure of abstract and paper  Overview of the important elements App. A: Raw results

28 28 Causality Causal:  Post-Processing  Speech Band Emphasizing Filter Bank Non-causal:  (Adaptive) Quantile Based Noise Estimation

29 29 Ordinary (non-causal) QBNE One discrete frequency (  ) Entire utterance is used for noise estimate

30 30 Causal QBNE One discrete frequency (  ) Noise estimate updated for each new frame

31 31 Causal QBNE n=0n=1n=2

32 32 Causal Adaptive QBNE

33 33 Causality PP and SBE are inherently causal QBNE and AQBNE can be made causal by using af buffer for the quantile  Additional computational cost  Reduced storage requirement

34 34 Closure Agenda:  Future work  Project working process

35 35 Future work (1/2) Implement causal AQBNE  Find optimal q-estimation curve etc.

36 36 Future work (2/2) Combine AQBNE and SBE with advanced front- end (WI008) Source: ETSI ES 202 050 V1.1.3 (2003-11) AQBNE SBE Filter-Bank

37 37 Project working process Project reporting form  No 3 weeks final report correction  Worksheets easier to write than report chapters Difficult to parallelize tasks  Few tasks  Large groups Information gathering  State of the art knowledge from scientific papers  No textbooks with up to date information exist


Download ppt "1 INTRODUCTION METHODSRESULTSCONCLUSION Noise Robust Speech Recognition Group SB740 Noise Robust Speech Recognition Group SB740."

Similar presentations


Ads by Google