Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Experiments on ”stir-sir”-paradigm using large vocabulary ASR Kalle Palomäki Adaptive Informatics Research Centre Helsinki University of Technology.

Similar presentations


Presentation on theme: "1 Experiments on ”stir-sir”-paradigm using large vocabulary ASR Kalle Palomäki Adaptive Informatics Research Centre Helsinki University of Technology."— Presentation transcript:

1 1 Experiments on ”stir-sir”-paradigm using large vocabulary ASR Kalle Palomäki Adaptive Informatics Research Centre Helsinki University of Technology

2 2 Introduction Aim: Test large vocabulary ASR in stir – sir paradigm Motivation: Large vocabulary ASR has learned phoneme models close to humans ASR: a newly trained English-English large vocabulary recogniser –Trained on read Wall street journal articles –Sampling rate 16 kHz

3 3 ASR details Standard features: Mel freq. cepstral coefficients (MFCCs) + power + deltas + accelerations Triphone HMMs with acoustic likelihood modeled by Gaussian mixture model Supervised adaptation using constrained maximum likelihood linear regression, CMLLR –Can be formulated as linear feature transformation

4 4 Experiments Three things tested for –Free recognition result –Recognizer chooses in between: ”next_you'll_get_sir_to_click_on” “next_you'll_get_stir_to_click_on” –Temporally averaged log-probability of ”t”

5 5 Experiments Experiment 1: ”dry” models with no adaptation Experiment 2: ”dry” models adapted to right conditions –Near-near adapted with near-near –Far-far adapted with far-far –Supervised adaptation with utterances at ends of continuum Experiment 3: "dry” models adapted to both ”near near”, and ”far-far” –Supervised adaptation with utterances at the ends of continuum

6 6 Exp. 1: “dry” models, no adaptation Free recognition: –near-near: “nantz two-a-days so far”, “nursing care so far” –far-far: “nantz th”, “NMS death”, “ “ Choose in between “next_you'll_get_sir_to_click_on”, “next_you'll_get_stir_to_click_on” and silence model –Near near: change in between conditions 08 and 09 –Far-far: everything silence

7 7 Exp. 1: “dry” models, no adaptation

8 8 Exp. 1: “dry” models, adapted to right cond. Free recognition: –Near-near: “next month though the khon” –Far-far: ”next he’ll throw the khon” Choose in between “next_you'll_get_sir_to_click_on”, “next_you'll_get_stir_to_click_on” and silence model –Near near: change in between conditions 03 and 04 –Far-far: ”sir” all the time

9 9 Exp. 1: “dry” models, adapted to right cond.

10 10 Exp. 1: “dry” models, adapted to both Free recognition: –Near-near: next month though the khon –Far far: “next month khon” or “nantz khon” Choose in between “next_you'll_get_sir_to_click_on”, “next_you'll_get_stir_to_click_on” and silence model –Switches in between the sentences oddly

11 11 Exp. 1: “dry” models, adapted to both

12 12 Discussion & Future directions Currently ”unconvincing” –Poor free recognition performance –Especially poor far-far performance –May be hard to obtain similar sensitivity as human listeners have Tricks to get around the poor performance –Cooke (2006) uses a priori masks in order to find glimpses of speech –Choose in between two sentences rather than free recogniton –Measure log-prob instead of recogn performance How to model Compensation which is the main issue


Download ppt "1 Experiments on ”stir-sir”-paradigm using large vocabulary ASR Kalle Palomäki Adaptive Informatics Research Centre Helsinki University of Technology."

Similar presentations


Ads by Google