Audio Workgroup Neuro-inspired Speech Recognition Group Members Ismail UysalYoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson.
Presentation on theme: "Audio Workgroup Neuro-inspired Speech Recognition Group Members Ismail UysalYoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson."— Presentation transcript:
Audio Workgroup Neuro-inspired Speech Recognition Group Members Ismail UysalYoojin Chung Ramin Pichevar Rich Hammett Tarek Massoud Ross Gaylor David Anderson Shihab Shamma Hynek HermanskiShih-Chii Liu Giacomo Indiveri Malcolm Slaney
Audio Workgroup Audio Projects Localization Speech Recognition More ASR
Audio Workgroup Shihab is Running See Shihab arriving in Telluride in 2004 (should happen around 4PM today)
Audio Workgroup Localization Effort Interaural Time Difference (ITD) Estimated from time difference between spikes of two matching channels. Interaural Intensity Difference (IID) Difference of spike counts between two cochleae. Azimuth: Combination of ITD and IID ITD estimation from pure tones Azimuth estimation from music Speaker Microphones
Audio Workgroup Localization Effort
Audio Workgroup FPAA/Mote – Word Recognition
Audio Workgroup FPAA/Mote – Word Recognition Field Programmable Analog Array (FPAA)based analog cochlea (non-spiking) with envelope detection. MOTEbased pattern matching using matched filtering with receptive fields Robosapien listens to the spoken commands….
Audio Workgroup FPAA/Mote – Word Recognition Status: FPAA – (we are using a new FPAA) 2 nd -order sections synthesized but a full auditory filter bank is not yet up. MOTE – real-time communication with Matlab and sampling operational.
Audio Workgroup Relational Network (Simple) X Y Z M M X M Y M Z m Patches of neurons Each measure one quantity Bidirectional relations for feedback/feedforward Thanks to Rodney Douglas
Audio Workgroup ASR Relational Network Cochlea Delay Phone Recognizer Word Recognizer A patch of neurons (one of N output) Note: We dont know how to represent delays Phone Recognizer Bidirectional links enforce phoneme/word constraints
Audio Workgroup Relational Advantages Not an HMM HMMs are great, but… Incorporate other knowledge Bottom-up perception Top-down word hypothesis Hallucinate Based on experience Hear ba.. and know that Bad, bat, bar, bass, band follow >
Audio Workgroup Inner hair cells Silicon Cochlea Ganglion cells Basilar membrane high frequency low frequency (van Schaik, Liu, 2004) BASILAR MEMBRANE INNER HAIR CELLS GANGLION CELLS
Audio Workgroup Silicon Frequency Response Tone ramps into two cochleas
Audio Workgroup Cochlear Rate Profiles Left CochleaRight Cochlea Spikes per utterance
Audio Workgroup Learning Algorithms Statistical SAS (Pick best channels for decision) Least squares (for software demo) Liquid State Machine Take input to high dimensions with spiking net Spike Timing Dependent Plasticity (STDP) Giocomo/Srinjoy Chip Brader/Fusi Vowel 1 Vowel 2 LSM Spiking Output
Audio Workgroup Tone Results Tone recognition Spike input from silicon cochlea Training Two tones Duplicated input Positive and negative examples Testing
Audio Workgroup Phoneme recognition Spike input from silicon cochlea Training Two phonemes Duplicated inputs Positive and negative examples Testing Phoneme Results
Audio Workgroup Behind the Curtain
Audio Workgroup Hardware Overview Cochlea Learning Phoneme Word PCI-AER (for remapping) Cochlea Shih-Chii Liu Giacomo Indiveri Implemented in M ATLAB
Audio Workgroup Infrastructure Difficulties Remapper Ensuing the problems surrounding AER mapper boards, remapping the AER data from silicon cochlea to the learning chip had to be done in Matlab. (very slow) Power The unpredictable problem caused by the variation in supply voltage as much as 1V. Sharing chips The learning chip had to be shared with two other workgroups. PC replacement