Presentation is loading. Please wait.

Presentation is loading. Please wait.

Diamantino Caseiro and Isabel Trancoso INESC/IST, 2000 Large Vocabulary Recognition Applied to Directory Assistance Services.

Similar presentations


Presentation on theme: "Diamantino Caseiro and Isabel Trancoso INESC/IST, 2000 Large Vocabulary Recognition Applied to Directory Assistance Services."— Presentation transcript:

1 Diamantino Caseiro and Isabel Trancoso INESC/IST, 2000 Large Vocabulary Recognition Applied to Directory Assistance Services

2 2 Summary SPEECHDAT telephone corpus Training of general purpose acoustic models Directory experiments and results Conclusions and future work

3 3 Corpus Description Multilingual telephone speech corpus SPEECHDAT(M)1000 Speakers Male 45% - Female 55% SPEECHDAT II4000 Speakers Male 46% - Female 54%

4 4 Sub-Corpus Used Train and Development (80% speakers) –W - Phonetically rich words (7h) –A - Phonetically rich sentences (63h) Test (20% speakers) –Directory Assistance Words (Speechdat II) O1 - Spontaneous forename O2 - Spontaneous city name O3 - Read city name (set of 500) O7 - Read name and surname (set of 150)

5 5 Feature Extraction MFCC (Mel Frequency Cepstral Coefficients) –14 Cepstra + 14  Cepstra + Energy +  Energy –Speech signal band-limited between 200 and 3800 Hz –Hamming window of 25 ms every 10 ms –Cepstral Mean Subtraction (CMS)

6 6 Acoustic Modeling Left-right continuous density HMM’s –39 Portuguese phones. –Silence and filler models with forward and backward skips Gender Dependent models HMM: Hidden Markov Model

7 7 Acoustic Modeling Word internal tied state triphones –Tree based clustering –13k triphones –8498 shared states

8 8 Model Topology

9 9 Train Train monophones Create triphones by cloning monophones Train triphones to separate the distributions Cluster triphone states using decision tree Synthesize unseen triphones Loop –Train triphones –Increase number of mixtures

10 10 Train Development set results –2356 phonetically rich words

11 11 Directory Tasks Spontaneous forename –Recognition using a set of 750 frequent names (1) –Recognition using 640 names from transcriptions (2) Spontaneous city name –Open vocabulary 500 cities City name –Closed vocabulary 500 cities Forename and surname –Closed vocabulary 150 forenames and 150 surnames

12 12 Conclusions & Future Work The results are promising, but Further work is needed: –Improve general purpose models –Create task-specific models –Fine-tune the recognition


Download ppt "Diamantino Caseiro and Isabel Trancoso INESC/IST, 2000 Large Vocabulary Recognition Applied to Directory Assistance Services."

Similar presentations


Ads by Google