Download presentation
Presentation is loading. Please wait.
Published byBridget Dickerson Modified over 9 years ago
2
PCS Research & Advanced Technology Labs Speech Lab How to deal with the noise in real systems? Hsiao-Chun Wu Motorola PCS Research and Advanced Technology Labs, Speech Laboratory richardw@srl.css.mot.com Phone: (815) 884-3071
3
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 Why do we need to study noise? Noise exists everywhere. It affects the performance of signal processing in reality. Since the noise cannot be avoided by system engineers, modern “noise-processing” technology has been researched and designed to overcome this problem. Hence many related research areas have been emerging, such as signal detection, signal enhancement/noise suppression and channel equalization.
4
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 Spectral Truncation –Spectral Subtraction (1989): Time Truncation –Signal Detection: Spatial and/or Temporal Filtering –Equalization: –Array Signal Separation (Blind Source Separation): How to deal with noise? Cut it off!!!!
5
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 Session 1. On-line Automatic End-of-speech Detection Algorithm (Time Truncation) 1. Project goal. 2. Review of current methods. 3. Introduction to voice metric based end-of-speech detector. 4. Simulation results. 5. Conclusion.
6
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 1. Project Goal: Problem –Digit-dial recognition with unknown digit string length Solution 1 –fixed length window such as 10 seconds? (inconvenience to users) Solution 2 –Dynamic termination of data capture? (need a robust detection algorithm)
7
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 Research and design a robust dynamic termination mechanism for speech recognizer. –a new on-line automatic end-of-speech detection algorithm with small computational complexity. Design a more robust front end to improve the recognition accuracy for speech recognizers. –a new algorithm can also decrease the excessive feature extraction of redundant noise.
8
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 2. Review of Current Methods: Most speech detection algorithms can be characterized into three categories. Frame energy detection –short-term frame energy (20 msec) can be used for speech/noise classification. –it is not robust at large background noise levels. Zero-crossing rate detection – short-term zero-crossing rate can also be used for speech/noise classification. –it is not robust in a wide variety of noise types. Higher-order-spectral detection –short-term higher-order spectra can be used for speech/noise classification. –it implies a heavy computational complexity and its threshold is difficult to be pre-determined.
9
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 3. Introduction to Voice Metric Based End-of-speech Detector: End-of-speech detection using voice metric features is based on the Mel- energies. Voice metric features are robust over a wide variety of background noise. Originally voice metric based speech/noise classifier was applied for IS-127 CELP speech coder standard. We modify and enhance voice-metric features to design a new end-of-speech detector for Motorola voice recognition front end (VR LITE III).
10
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000
11
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000
12
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000
13
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000
14
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000
15
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000
16
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 voice metric score table
17
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 Pre-S/N Classifier Voice Metric Mel- Spectrum SNR Estimate EOS Buffer Threshold Adaptation raw data FFT Speech Start? Silence Duration Threshold Post-S/N Classifier voice metric scores Original VR LITE Front End End-of-speech Detector data capture stops yes no
18
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 VR LITE recognition engine feature vector frame buffer segmentation of speech into frames data capture terminates end of speech? yes no frame i next frame i+1 speech input front end with end- of-speech detector
19
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 6.51 seconds 3.78 seconds 4.81 seconds raw data end point detected end point String “2-2-9-1-7-8” in Car 55 mph
20
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 Correct detection End point False detection false detection time error correct detection time error String “2-2-9-1-7-8” in Car 55 mph seconds
21
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 4. Simulation Results : (Simulation is done over Motorola digit-string database, including 16 speakers and 15,166 variable-length digit strings in 7 different conditions. Silence threshold is 1.85 seconds.) A. Receiver Operating Curve (ROC): ROC curve is the relationship between the end-of-speech detection rate versus the false (early) detection rate. We compare two different methods, namely, (1) new voice-metric based end-of-speech detector and (2) old speech/noise flag based end-of-speech detector.
22
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 ROC curve false detection rate (%) detection rate (%)
23
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 B. String-accuracy-convergence (SAC) curve: SAC curve is the relationship between the string recognition accuracy versus the false (early) detection rate. We compare two different methods, namely, (1) new voice-metric based end-of-speech detector and (2) old speech/noise flag based end-of-speech detector.
24
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 false detection rate (%) string recognition accuracy (%) SAC curve
25
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 C. Table of detection results: (This table illustrates the result among the Madison sub-database including data files with 1.85 seconds or more of silence after end of speech.)
26
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 (This table illustrates the result over the small database collected by Motorola PCS CSSRL. All digits strings are recorded in 15 seconds of fixed window) ConditionAverage Time Error Average False Detection Time Error Average Correct Detection Time Error False Detection Rate String Numbers Total Detection Rate String Recognition Accuracy (w/i EOS) String Recognition Accuracy (w/o EOS) Overall1.82 seconds 0 seconds1.82 seconds 0%12196.69%50.41%29.75% Office Close-talk 1.85 seconds 0 seconds1.85 seconds 0%21100%66.67%61.90% Office- Arm- length 1.84 seconds 0 seconds1.84 seconds 0%20100%65.00% Café Close-talk 1.76 seconds 0 seconds1.76 seconds 0%40100%40.00%15.00% Café Arm- length 1.85 seconds 0 seconds1.85 seconds 0%4090%45.00%10.00%
27
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 Analysis of the Simulation Result: Why didn’t EOS detection work well in babble noise?
28
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 Optimal Detection Decision Bayes classifier Likelihood Ratio Test
29
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 Digit “one” in close-talking mic, quiet office
30
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 Digit “one” in handsfree mic, 55 mil/h car
31
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 Digit “one” in far-talking mic, cafeteria
32
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 5. Conclusion: New voice-metric based end-of-speech detector is robust over a wide variety of background noise. Only a small increase in the computational complexity will be brought by new voice-metric based end-of-speech detector and it can be real-time implementable. New voice-metric based end-of-speech detector can improve recognition performance by discarding extra noise due to the fixed data capture window. New voice-metric based end-of-speech detector needs further improvement in the babble noise environment.
33
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 Session 2. Speech Enhancement Algorithms: Blind Source Separation Methods (Spatial and Temporal Filtering) 1. Motivation and research goal. 2. Statement of “blind source separation” problem. 3. Principles of blind source separation. 4. Criteria for blind source separation. 5. Application to blind channel equalization for digital communication systems. 6. Simulation and comparison. 7. Summary and conclusion.
34
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 1. Motivation: Mimic human auditory system to differentiate the subject signals from other sounds, such as interfered sources, background noise for clear recognition of the subject contents. ‘One of the most striking facts about our ears is that we have two of them-- and yet we hear one acoustic world; only one voice per speaker.’ (E. C. Cherry and W. K. Taylor. Some further experiments on the recognition of speech, with one and two ears. Journal of the Acoustic Society of America, 26:554-559, 1954) The ‘‘cocktail party effect’’--the ability to focus one’s listening attention on a single talker among a cacophony of conversations and background noise--has been recognized for some time. This specialized listening ability may be because of characteristics of the human speech production system, the auditory system, or high-level perceptual and language processing.
35
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 Research Goal: Design a preprocessor with digital signal processing speech enhancement algorithms. The input signals are collected through multiple sensor (microphone) arrays. After the computation of embedded signal processing algorithms, we have clearly separated signals at the output.
36
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 Audio Input Blind Source Separation Algorithms Enhanced Output
37
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 2. Problem Statement of Blind Source Separation: What is “Blind Source Separation”? Given the N linearly mixed received input signals, we need to recover the M statistically independent sources as much as possible ( ).
38
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 Formulation of Blind Source Separation Problem: A received signal vector from the array, X(t), is the original source vector S(t) through the channel distortion H(t), such that X(t) = H(t) S(t), where and We need to estimate a separator W(t) such that where
39
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 3. Principles of Blind Source Separation: The independence measurement: Shannon’s Mutual information.
40
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 4. Criteria to Separate Independent Sources: Constrained Entropy (Wu, IJCNN99): – Hardamard Measure (Wu, ICA99): – Frobenius Norm (Wu, NNSP97): – Quadratic Gaussianity (Wu, NNSP99): –
41
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 We apply the minimization of modified constrained entropy to adapt an equalizer w(t) =[w 0, w 1,....] for a digital channel h(t). Assume a PAM signal constellation with symbols s(t) =, passing through a digital channel h(t) = [c(t, 0.11) + 0.8c(t-1, 0.11) - 0.4c(t-3, 0.11)]W 6T (t), where is raised-cosine function with roll-off factor and is a rectangular window. the input signal to the equalizer is where n(t) is the background noise. We applied generalized anti-Hebbian learning to adapt w(t) such that. 5. Application to Blind Single Channel Equalization for Digital Communication Systems:
42
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 Signal-to-noise Ratio (dB) Signal-to-interference Ratio (dB)
43
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 Signal-to-noise Ratio (dB) Bit Error Rate
44
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 6. Simulation and Comparison: The simulation results for comparison among our generalized anti-Hebbian learning, SDIF algorithm and Lee’s Informax method (Lee IJCNN97) over three real recordings downloaded from Salk Institute, University of California at San Diego.
45
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 New VR LITE Frontend: Blind Source Separation + End-of-speech Detection
46
PCS Research & Advanced Technology Labs Speech LabNovember 14, 2000 7. Conclusion and Future Research: The computational efficiency of blind source separation needs to be reduced. Test BSS for EOS detection under microphone arrays of the same kind. Incorporate other array signal processing (beamformer?) technique to improve speech detection and recognition.
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.