Presentation is loading. Please wait.

Presentation is loading. Please wait.

Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

Similar presentations


Presentation on theme: "Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,"— Presentation transcript:

1

2 Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky, David Anderson, Malcolm Slaney, Andrew Schwartz, Tara Julia Hamilton, John Harris, Nima Mesgarani, Shihab Shamma

3 Outline Field Programmable Analog Array (Dave) Speaker Identification (Malcolm, Nima and Max) Speech Recognition (Hynek, Misha, Jordon) STRF Noise Suppression (Nima, Shihab, Dave) Reconstructions from STRF/Modulation Detectors (Nima, Shihab) Social sonar demonstration using silicon cochlea and RoboQuad toy (Toby and Malcolm) Cochlear ITD Detector (Andrew, Malcolm, Shih-Chii) Cochlear Periodicity Detector (Teddy, John, Malcolm, Shih-Chii)

4 FPAA

5 Speaker ID FeaturesModel FeaturesModel FeaturesModel FeaturesModel Winner Take All MFCC STRF GMM ART

6 Speaker ID - STRF

7 Speaker ID – ART Malcom Slaney – Heather Ames – Max Versace Supervised Fuzzy Adaptive Resonance Theory neural network (ARTMAP) uses top-down expectations to learn categories First test: three synthesized vowels (large clusters) spoken by three speakers (different colors) represented in 2D feature space.

8 Speaker ID - ART Results Feature extraction Vowel extraction Training Features Feature vectors for vowel data Acoustic Model of Speaker Identity Speech input (.wav) 12 MFCC + E, First and second derivs Utterance Independent transformatio n Transformed Features ½ wave rectify, Lowpass filter, Choice of high energy timeslices TBD ARTMAP Testing Predicted Speaker Identity 50% correct after 100 cross-validations (# of instances of ARTMAP run) on 10 speaker identification Continued work: 1.Improved vowel extraction 2.Utterance independent transformation of feature space Why we care? Top-Down Online

9 Speaker ID - Results Test% Correct % Correct in 5dB noise MFCC (Baseline) 81.3%81.0% STRF79.8% ART~60% Very preliminary work!!!! Comparing to technology (MFCC+GMM) that have been perfected over decades.

10 ASR - Phoneme Posteriors

11 ASR - Combining Information Training Context ? Machines P(word|sound)P(word|context) Humans [1-P(word|sound)] [1-P(word|context)] Maximize

12 Inverse model: from neural responses to sound

13 Reconstruction of speech in white noise Reconstructed speech is cleaner than the original noisy Original SpectrogramsReconstructed Spectrograms

14 Psychoacoustically-motivated Speech Enhancement Perceptual loudness L=(b*e(t))^a By mapping loudness using the same type of function, noise can be decreased Results from STRF processing

15 Noise suppression using inverse model Train G-filters on reconstructing clean stimuli from corresponding noisy responses. Apply the trained filters to new noisy responses 14 Cortical decompositionTrained inverse filters

16 Noise Suppression for White, Jet and City Noise 15

17 RS Media Linux version 2.4.18-rmk5-mx1ads-p3 (sam@estechsolution.com) (gcc version 2.95.3 20010 315 (release)) #517 Fri Feb 16 11:40:45 HKT 2007 Processor: ARM/CIRRUS Arm920Tsid(wb) revision 0 Architecture: Motorola MX1ADS On node 0 totalpages: 8192 zone(0): 8192 pages. zone(1): 0 pages. zone(2): 0 pages. Kernel command line: root=fe01 ro mem=32M Console: colour dummy device 80x30 Calibrating delay loop... 98.50 BogoMIPS Memory: 32MB = 32MB total Memory: 30816KB available (1023K code, 316K data, 60K init) Dentry-cache hash table entries: 4096 (order: 3, 32768 bytes) Inode-cache hash table entries: 2048 (order: 2, 16384 bytes) Mount-cache hash table entries: 512 (order: 0, 4096 bytes) Buffer-cache hash table entries: 1024 (order: 0, 4096 bytes) Page-cache hash table entries: 8192 (order: 3, 32768 bytes) POSIX conformance testing by UNIFIX Linux NET4.0 for Linux 2.4 Based upon Swansea University Computer Society NET3.039 Initializing RT netlink socket Starting kswapd ttySA0 at I/O 0x206000 (irq = 29) is a MX1ADS ttySA1 at I/O 0x207000 (irq = 23) is a MX1ADS pty: 256 Unix98 ptys configured DMA InitializingLinux version 2.4.18-rmk5-mx1ads-p3 (sam@estechsolution.com) (gcc version 2.95.3 20010 315 (release)) #517 Fri Feb 16 11:40:45 HKT 2007 Processor: ARM/CIRRUS Arm920Tsid(wb) revision 0 Architecture: Motorola MX1ADS On node 0 totalpages: 8192 zone(0): 8192 pages. zone(1): 0 pages. zone(2): 0 pages. Kernel command line: root=fe01 ro mem=32M Console: colour dummy device 80x30 Calibrating delay loop... 98.50 BogoMIPS Memory: 32MB = 32MB total Memory: 30816KB available (1023K code, 316K data, 60K init) Dentry-cache hash table entries: 4096 (order: 3, 32768 bytes) Inode-cache hash table entries: 2048 (order: 2, 16384 bytes) Mount-cache hash table entries: 512 (order: 0, 4096 bytes) Buffer-cache hash table entries: 1024 (order: 0, 4096 bytes) Page-cache hash table entries: 8192 (order: 3, 32768 bytes) POSIX conformance testing by UNIFIX Linux NET4.0 for Linux 2.4 Based upon Swansea University Computer Society NET3.039 Initializing RT netlink socket Starting kswapd ttySA0 at I/O 0x206000 (irq = 29) is a MX1ADS ttySA1 at I/O 0x207000 (irq = 23) is a MX1ADS pty: 256 Unix98 ptys configured DMA Initializing

18 Cochlear - ITD Detector Time Position

19 Cochlear - JAER Demo

20 Cochlear - Periodicity detector Response to hissResponse to coo

21

22 When both channels conditionally independent p C p A – probability of correct recognition in both channels p C (1-p A ) – correct in ch 1 but not in ch 2 p A (1-p C ) – correct in ch 2 but not in ch 1 These three cases are mutually exclusive, thus probability of correct recogntion is p = p C p A + p C (1-p A ) + p A (1-p C ) = p C +p A -p C p A Probability of error e = (1-p) = 1-p C -p A +p C p A = (1-p C )(1-p A ) = e C e A context (top-down) acoustic (bottom-up) pCpC pApA stimulusdecision


Download ppt "Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,"

Similar presentations


Ads by Google