Speech Processing AEGIS RET All-Hands Meeting

Slides:



Advertisements
Similar presentations
Speech Processing AEGIS RET All-Hands Meeting University of Central Florida July 20, 2012 Applications of Images and Signals in High Schools.
Advertisements

Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),
Speech Processing AEGIS RET All-Hands Meeting University of Central Florida July 20, 2012 Applications of Images and Signals in High Schools.
SIMS-201 Characteristics of Audio Signals Sampling of Audio Signals Introduction to Audio Information.
Introduction The aim the project is to analyse non real time EEG (Electroencephalogram) signal using different mathematical models in Matlab to predict.
IT-101 Section 001 Lecture #8 Introduction to Information Technology.
Reduction of Additive Noise in the Digital Processing of Speech Avner Halevy AMSC 664 Final Presentation May 2009 Dr. Radu Balan Department of Mathematics.
Introduction to Matlab II EE 2303 Lab. Basic Matlab Review Data file input/output string, char, double, struct  Types of variables load, save  directory/workspace.
A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.
Dual Tone Multi-Frequency System Michael Odion Okosun Farhan Mahmood Benjamin Boateng Project Participants: Dial PulseDTMF.
4/25/2001ECE566 Philip Felber1 Speech Recognition A report of an Isolated Word experiment. By Philip Felber Illinois Institute of Technology April 25,
Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.
DSP Implementation of a 1961 Fender Champ Amplifier James Siegle Advisor: Dr. Thomas L. Stewart April 8, 2003.
DREAM PLAN IDEA IMPLEMENTATION Introduction to Image Processing Dr. Kourosh Kiani
Teaching Tool For French Speech Pronunciation Capstone Design Project 2008 Joseph Ciaburri Advisor: Professor Catravas.
Representing Acoustic Information
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
Modeling speech signals and recognizing a speaker.
By: Meghal Bhatt.  Sphinx4 is a state of the art speaker independent, continuous speech recognition system written entirely in java programming language.
Reduction of Additive Noise in the Digital Processing of Speech Avner Halevy AMSC 663 Mid Year Progress Report December 2008 Professor Radu Balan 1.
Real-Time D igital S ignal P rocessing Lab Development using Matlab, Simulink, and the TI TMS320C6711 DSK David Rolando & Jonathan Kellerman Graduate Advisor:
Jacob Zurasky ECE5526 – Spring 2011
Supervisor: Dr. Eddie Jones Co-supervisor: Dr Martin Glavin Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification.
Dan Rosenbaum Nir Muchtar Yoav Yosipovich Faculty member : Prof. Daniel LehmannIndustry Representative : Music Genome.
09/19/2002 (C) University of Wisconsin 2002, CS 559 Last Time Color Quantization Dithering.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Marwan Al-Namari 1 Digital Representations. Bits and Bytes Devices can only be in one of two states 0 or 1, yes or no, on or off, … Bit: a unit of data.
CSCI-100 Introduction to Computing Hardware Part II.
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
COMP135/COMP535 Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman Chapter 2 Lecture 2 – Digital Representations.
Fourier and Wavelet Transformations Michael J. Watts
The Discrete Fourier Transform
Speech Processing Using HTK Trevor Bowden 12/08/2008.
Time Compression/Expansion Independent of Pitch. Listening Dies Irae from Requiem, by Michel Chion (1973)
EC1358 – DIGITAL SIGNAL PROCESSING
Bryant Tober. Problem Description  View the sound wave produced from a wav file  Apply different modulations to the wave file  Hear the effect of the.
Digital Signal Processing Rahil Mahdian LSV Lab, Saarland University, Germany.
بسم الله الرحمن الرحيم Lecture (1) Introduction to DSP Dr. Iman Abuel Maaly University of Khartoum Department of Electrical and Electronic Engineering.
Data statistics and transformation revision Michael J. Watts
Speech Processing Dr. Veton Këpuska, FIT Jacob Zurasky, FIT.
Topic: Waveforms in Noesis
  Digital Signal Processing Implementation of a 1961 Fender Champ Amplifier
Speech Processing AEGIS RET All-Hands Meeting
Ch. 2 : Preprocessing of audio signals in time and frequency domain
CS 591 S1 – Computational Audio
Spectrum Analysis and Processing
CS 591 S1 – Computational Audio
Speech Processing AEGIS RET All-Hands Meeting
Automatic Speech Recognition
ARTIFICIAL NEURAL NETWORKS
Spoken Digit Recognition
Fourier and Wavelet Transformations
Leigh Anne Clevenger Pace University, DPS ’16
Neuro-Fuzzy and Soft Computing for Speaker Recognition (語者辨識)
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
LECTURE 18: FAST FOURIER TRANSFORM
Richard M. Stern demo January 12, 2009
Digital Systems: Hardware Organization and Design
Signal Processing and Data Analysis Simon Godsill Lent 2015
ECE 791 Project Proposal Project Title: Developing and Evaluating a Tool for Converting MP3 Audio Files to Staff Music Project Team: Salvatore DeVito.
Govt. Polytechnic Dhangar(Fatehabad)
Chapter 15: Wavelets (i) Fourier spectrum provides all the frequencies
Keyword Spotting Dynamic Time Warping
LECTURE 18: FAST FOURIER TRANSFORM
Combination of Feature and Channel Compensation (1/2)
Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems NDSS 2019 Hadi Abdullah, Washington Garcia, Christian Peeters, Patrick.
Presentation transcript:

Speech Processing AEGIS RET All-Hands Meeting Applications of Images and Signals in High Schools AEGIS RET All-Hands Meeting Florida Institute of Technology July 6, 2012

Contributors Dr. Veton Këpuska, Faculty Mentor, FIT vkepuska@fit.edu Jacob Zurasky, Graduate Student Mentor, FIT jzuraksy@my.fit.edu Becky Dowell, RET Teacher, BPS Titusville High dowell.jeanie@brevardschools.org

Timeline 1874: Alexander Graham Bell proves frequency harmonics from electrical signal can be divided 1952: Bell Labs develops first effective speech recognizer 1971-1976 DARPA: speech should be understood, not just recognized 1980’s: Call center and text-to-speech products commercially available 1990’s: PC processing power allow use of SR software by ordinary user Timeline of Speech Recognition. http://www.emory.edu/BUSINESS/et/speech/timeline.htm DARPA – U.S. Defense Advanced Research Projects Agency (DARPA)

Motivation Applications Call center speech recognition Speech-to-text applications (e.g. dictation software) Hands-free user-interface (e.g., OnStar, XBOX Kinect, Siri) Science Fiction 1968: Stanley Kubrick’s 2001: A Space Odyssey http://www.youtube.com/watch?v=6MMmYyIZlC4 Science Fact 2011: Apple iPhone 4S Siri http://www.apple.com/iphone/features/siri.html

Difficulties Differences in speakers Continuous Speech (word boundaries) Noise Background Other speakers Differences in speakers Dialects/Accents Male/female

Motivation Speech recognition requires speech to first be characterized by a set of “features”. Features are used to determine what words are spoken. Our project implements the feature extraction stage of a speech processing application.

Speech Recognition Front End: Pre-processing Back End: Recognition Speech Recognized speech Large amount of data. Ex: 256 samples Features Reduced data size. Ex: 13 features Front End – reduce amount of data for back end, but keep enough data to accurately describe the signal. Output is feature vector. 256 samples ------> 13 features Back End - statistical models used to classify feature vectors as a certain sound in speech

Front-End Processing of Speech Recognizer High pass filter to compensate for higher frequency roll off in human speech Pre-emphasis

Front-End Processing of Speech Recognizer High pass filter to compensate for higher frequency roll off in human speech Pre-emphasis Window Separate speech signal into frames Apply window to smooth edges of framed speech signal

Front-End Processing of Speech Recognizer High pass filter to compensate for higher frequency roll off in human speech Pre-emphasis Window FFT Separate speech signal into frames Apply window to smooth edges of framed speech signal Transform signal from time domain to frequency domain Human ear perceives sound based on frequency content

Front-End Processing of Speech Recognizer High pass filter to compensate for higher frequency roll off in human speech Pre-emphasis Window FFT Mel-Scale Separate speech signal into frames Apply window to smooth edges of framed speech signal Transform signal from time domain to frequency domain Human ear perceives sound based on frequency content Convert linear scale frequency (Hz) to logarithmic scale (mel-scale)

Front-End Processing of Speech Recognizer High pass filter to compensate for higher frequency roll off in human speech Pre-emphasis Window FFT Mel-Scale log Separate speech signal into frames Apply window to smooth edges of framed speech signal Transform signal from time domain to frequency domain Human ear perceives sound based on frequency content Convert linear scale frequency (Hz) to logarithmic scale (mel-scale) Take the log of the magnitudes (multiplication becomes addition) to allow separation of signals

Front-End Processing of Speech Recognizer High pass filter to compensate for higher frequency roll off in human speech Pre-emphasis Window FFT Mel-Scale log IFFT Separate speech signal into frames Apply window to smooth edges of framed speech signal Transform signal from time domain to frequency domain Human ear perceives sound based on frequency content Convert linear scale frequency (Hz) to logarithmic scale (mel-scale) Cepstral domain – spectrum of frequency spectrum – fourier transform of the log of the spectrum – rate of change Take the log of the magnitudes (multiplication becomes addition) to allow separation of signals Inverse of FFT to transform to Cepstral Domain… the result is the set of “features”

Speech Analysis and Sound Effects (SASE) Project Graphical User Interface (GUI) Speech input Record and save audio Sound file (*.wav, *.ulaw, *.au) Graphs the entire audio signal Select a “frame” by clicking on graph Process speech frame and display output for each stage of processing Displays spectrogram

GUI Components

GUI Components Plotting Axes

Buttons GUI Components Plotting Axes

MATLAB Code Graphical User Interface (GUI) Stages of speech processing GUIDE (GUI Development Environment) Callback functions Work in progress Extendable Stages of speech processing Modular functions for reusability

SASE Lab Interactive teaching tool Demo

Future Work Improve GUI Audio Effects Noise Filtering Ex: Echo, Reverberation, Chorus, Flange Noise Filtering

References Ingle, Vinay K., and John G. Proakis. Digital signal processing using MATLAB. 2nd ed. Toronto, Ont.: Nelson, 2007. Oppenheim, Alan V., and Ronald W. Schafer. Discrete-time signal processing. 3rd ed. Upper Saddle River: Pearson, 2010. Weeks, Michael. Digital signal processing using MATLAB and wavelets. Hingham,Mass.: Infinity Science Press, 2007. Timeline of Speech Recognition. http://www.emory.edu/BUSINESS/et/speech/timeline.htm

Thank you! Questions?

Unit Plan Introduction Lesson #1: The Sound of a Sine Wave Lesson #2: Frequency Analysis Lesson #3: Filtering (work in progress) Lesson #4: SASE Lab (work in progress) Conclusion