Presentation is loading. Please wait.

Presentation is loading. Please wait.

Speech Processing AEGIS RET All-Hands Meeting

Similar presentations


Presentation on theme: "Speech Processing AEGIS RET All-Hands Meeting"— Presentation transcript:

1 Speech Processing AEGIS RET All-Hands Meeting
Applications of Images and Signals in High Schools AEGIS RET All-Hands Meeting Florida Institute of Technology July 6, 2012

2 Contributors Dr. Veton Këpuska, Faculty Mentor, FIT
Jacob Zurasky, Graduate Student Mentor, FIT Becky Dowell, RET Teacher, BPS Titusville High

3 Timeline 1874: Alexander Graham Bell proves frequency harmonics from electrical signal can be divided 1952: Bell Labs develops first effective speech recognizer DARPA: speech should be understood, not just recognized 1980’s: Call center and text-to-speech products commercially available 1990’s: PC processing power allow use of SR software by ordinary user Timeline of Speech Recognition. DARPA – U.S. Defense Advanced Research Projects Agency (DARPA)

4 Motivation Applications Call center speech recognition
Speech-to-text applications (e.g. dictation software) Hands-free user-interface (e.g., OnStar, XBOX Kinect, Siri) Science Fiction 1968: Stanley Kubrick’s 2001: A Space Odyssey Science Fact 2011: Apple iPhone 4S Siri

5 Difficulties Differences in speakers
Continuous Speech (word boundaries) Noise Background Other speakers Differences in speakers Dialects/Accents Male/female

6 Motivation Speech recognition requires speech to first be characterized by a set of “features”. Features are used to determine what words are spoken. Our project implements the feature extraction stage of a speech processing application.

7 Speech Recognition Front End: Pre-processing Back End: Recognition Speech Recognized speech Large amount of data. Ex: 256 samples Features Reduced data size. Ex: 13 features Front End – reduce amount of data for back end, but keep enough data to accurately describe the signal. Output is feature vector. 256 samples > 13 features Back End - statistical models used to classify feature vectors as a certain sound in speech

8 Front-End Processing of Speech Recognizer
High pass filter to compensate for higher frequency roll off in human speech Pre-emphasis

9 Front-End Processing of Speech Recognizer
High pass filter to compensate for higher frequency roll off in human speech Pre-emphasis Window Separate speech signal into frames Apply window to smooth edges of framed speech signal

10 Front-End Processing of Speech Recognizer
High pass filter to compensate for higher frequency roll off in human speech Pre-emphasis Window FFT Separate speech signal into frames Apply window to smooth edges of framed speech signal Transform signal from time domain to frequency domain Human ear perceives sound based on frequency content

11 Front-End Processing of Speech Recognizer
High pass filter to compensate for higher frequency roll off in human speech Pre-emphasis Window FFT Mel-Scale Separate speech signal into frames Apply window to smooth edges of framed speech signal Transform signal from time domain to frequency domain Human ear perceives sound based on frequency content Convert linear scale frequency (Hz) to logarithmic scale (mel-scale)

12 Front-End Processing of Speech Recognizer
High pass filter to compensate for higher frequency roll off in human speech Pre-emphasis Window FFT Mel-Scale log Separate speech signal into frames Apply window to smooth edges of framed speech signal Transform signal from time domain to frequency domain Human ear perceives sound based on frequency content Convert linear scale frequency (Hz) to logarithmic scale (mel-scale) Take the log of the magnitudes (multiplication becomes addition) to allow separation of signals

13 Front-End Processing of Speech Recognizer
High pass filter to compensate for higher frequency roll off in human speech Pre-emphasis Window FFT Mel-Scale log IFFT Separate speech signal into frames Apply window to smooth edges of framed speech signal Transform signal from time domain to frequency domain Human ear perceives sound based on frequency content Convert linear scale frequency (Hz) to logarithmic scale (mel-scale) Cepstral domain – spectrum of frequency spectrum – fourier transform of the log of the spectrum – rate of change Take the log of the magnitudes (multiplication becomes addition) to allow separation of signals Inverse of FFT to transform to Cepstral Domain… the result is the set of “features”

14 Speech Analysis and Sound Effects (SASE) Project
Graphical User Interface (GUI) Speech input Record and save audio Sound file (*.wav, *.ulaw, *.au) Graphs the entire audio signal Select a “frame” by clicking on graph Process speech frame and display output for each stage of processing Displays spectrogram

15

16 GUI Components

17 GUI Components Plotting Axes

18 Buttons GUI Components Plotting Axes

19 MATLAB Code Graphical User Interface (GUI) Stages of speech processing
GUIDE (GUI Development Environment) Callback functions Work in progress Extendable Stages of speech processing Modular functions for reusability

20 SASE Lab Interactive teaching tool Demo

21

22 Future Work Improve GUI Audio Effects Noise Filtering
Ex: Echo, Reverberation, Chorus, Flange Noise Filtering

23 References Ingle, Vinay K., and John G. Proakis. Digital signal processing using MATLAB. 2nd ed. Toronto, Ont.: Nelson, 2007. Oppenheim, Alan V., and Ronald W. Schafer. Discrete-time signal processing. 3rd ed. Upper Saddle River: Pearson, 2010. Weeks, Michael. Digital signal processing using MATLAB and wavelets. Hingham,Mass.: Infinity Science Press, 2007. Timeline of Speech Recognition.

24 Thank you! Questions?

25 Unit Plan Introduction Lesson #1: The Sound of a Sine Wave
Lesson #2: Frequency Analysis Lesson #3: Filtering (work in progress) Lesson #4: SASE Lab (work in progress) Conclusion


Download ppt "Speech Processing AEGIS RET All-Hands Meeting"

Similar presentations


Ads by Google