Presentation is loading. Please wait.

Presentation is loading. Please wait.

Creating Speech Recognizers Quickly Björn Bringert Department of Computer Science and Engineering Chalmers.

Similar presentations


Presentation on theme: "Creating Speech Recognizers Quickly Björn Bringert Department of Computer Science and Engineering Chalmers."— Presentation transcript:

1 bringert@cs.chalmers.se1 Creating Speech Recognizers Quickly Björn Bringert bringert@cs.chalmers.se Department of Computer Science and Engineering Chalmers University of Technology and Göteborg University

2 bringert@cs.chalmers.se2 My goals ● For me – Understand how to build a simple speech recognizer ● For others – Make it easier to build speech recognizer prototypes – Allows quick experimentation with different options

3 bringert@cs.chalmers.se3 Existing components ● Grammatical Framework – GF is a high-level grammar formalism. – We'll assume that there is a GF grammar for what the recognizer should recognize. ● HTK (Hidden Markov Model Toolkit) – Free toolkit for building and using Hidden Markov Models. – General, but geared towards speech recognition.

4 bringert@cs.chalmers.se4 Things you need to do ● Create pronunciation dictionary – Now automatic for Swedish (still low quality results). ● Create acoustic model – Before with HTK: Lots of semi-automatic steps. – Now automatic given data (still low quality results). ● Create recognition grammar – Can now be generated from a GF grammar.

5 bringert@cs.chalmers.se5 Recording data ● Generate utterances to record – Automatic given a GF grammar. ● Record utterances – A simple program prompts for each utterance, and records it.

6 bringert@cs.chalmers.se6 Write pronunciation dictionary ● Markus Forsberg has implemented basic Swedish pronunciation rules. – We use these to generate pronunciations of all word forms in the grammar. ● Can also use Lexin database + Functional Morphology to generate better pronunciations – Lemma pronunciations from Lexin – Word forms from Functional Morphology

7 bringert@cs.chalmers.se7 Build the acoustic model 1.Transcribe the data using the dictionary. 2.Parametrize the data. 3.Train monophone models. 4.Select the closest pronunciations (using models), retrain. 5.Copy monophone models to make triphone models. 6.Train triphone models.

8 bringert@cs.chalmers.se8 Create a recognition grammar ● We need a grammar to guide the recognizer – Remember: “recognize speech” / “wreck a nice beach” ● Speech recognition grammars are not fun to write – Simple context-free grammars, or finite automata. – Can generate from a GF grammar: ● GSL (Nuance) ● JSGF (Java Speech API) ● SRGS (W3C standard) ● SLF (HTK)

9 bringert@cs.chalmers.se9 Evaluation ● Keep some of the recorded data for evaluation. ● Evaluation is automatic using transcribed data, recognition grammar and pronunciation dictionary.

10 bringert@cs.chalmers.se10 Evaluation results L The phone string lengths used, 1 for monophones and 3 for triphones. TS Number of training utterances. TW Total number of words in the 20 test utterances. CS Percentage of whole test utterances which were recognized correctly. AccThe accuracy (CW - I). CWPercentage of the test words which were recognized correctly. DNumber of deletions as percentage of the number of test words. SNumber of substitutions as percentage of the number of test words. INumber of insertions as percentage of the number of test words.

11 bringert@cs.chalmers.se11 Future work ● Try with more data. – How good can we make the recognizer with this simple method? ● Tweak model / recognizer parameters – Automatic tweaking using evaluation and machine learning? ● Improve Swedish pronunciation generation. ● Generate more phonetically diverse utterances. ● Improve data collection tool for larger-scale recordings.

12 bringert@cs.chalmers.se12 Conclusions ● Creating a prototype recognizer has been reduced to: – Writing a GF grammar. – Recording data. – Writing a pronunciation dictionary (automatic for Swedish). ● Quality still low, should try with more data. ● Provides a platform for efficient experimentation.


Download ppt "Creating Speech Recognizers Quickly Björn Bringert Department of Computer Science and Engineering Chalmers."

Similar presentations


Ads by Google