Presentation is loading. Please wait.

Presentation is loading. Please wait.

SPECTRUM? Hynek Hermansky with Jordan Cohen, Sangita Sharma, and Pratibha Jain,

Similar presentations


Presentation on theme: "SPECTRUM? Hynek Hermansky with Jordan Cohen, Sangita Sharma, and Pratibha Jain,"— Presentation transcript:

1 SPECTRUM? Hynek Hermansky with Jordan Cohen, Sangita Sharma, and Pratibha Jain,

2 /u/ /o/ /a/ /  / /iy/ Helmholtz Radio Rex (1917) “limited commercial success” -John Pierce 1969  beer Newton

3 frequency time classify about 20 ms Short-term spectrum SHORT TERM SPECTRUM

4 Cortical receptive fields

5 frequency time ASR from TempoRAl Patterns (TRAP) Phone “boundaries” 1 sec temporal pattern of critical band energies window classify about 20 ms Short-term spectrum

6 WHY 200- 1000 ms ? because that’s where the information is (coarticulation) –mutual info studies (Bilmes, Yang et al.) psychophysics of hearing –200 ms “critical time window” (forward masking, perception of loudness, perception of gaps,… physiology of hearing –time component of cortical receptive fields (Klein) because “it works” –ETSI Aurora work time frequency 200 – 1000 ms

7 WHY narrow frequency bands? psychophysics of hearing –independence of processing within critical bands physiology of hearing –mechanical selectivity of cochlea –cortical receptive fields (e.g. Shamma) because “it works” –multi-band ASR (Bourlard and Dupont, Hermansky et al,…) –decrease in ASR accuracy for wider frequency spans (Jain and Hermansky - Eurospeech 2003) time frequency 1-3 Bark

8 Which features? no knowledge is better than wrong knowledge –data cannot lie –speech evolved to be heard data-derived processing is consistent with human-like processing (minus the irrelevant components of the human cognitive processing) time features frequency data-guided processing

9 WHY data- guided processing? some function of class posteriors –class posteriors form the most efficient feature set [e.g. Fukunaga] posteriors of which classes? time features frequency data-guided (trained on data) processing

10 Speech Events signal frequency selective hearing event detection p(event,frequency) class (phoneme?) detection

11 time frequency data processing ( trained system ) processing ( trained system ) some function of phoneme posteriors TRAPTANDEM data processing ( trained system ) class posteriors


Download ppt "SPECTRUM? Hynek Hermansky with Jordan Cohen, Sangita Sharma, and Pratibha Jain,"

Similar presentations


Ads by Google