Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dep. of Communication Technology Tom Brøndsted, Speech Communication 06 P.1 MM9 Speech Communication MM8 summary –Brush-up –Conclusions (what you hopefully.

Similar presentations


Presentation on theme: "Dep. of Communication Technology Tom Brøndsted, Speech Communication 06 P.1 MM9 Speech Communication MM8 summary –Brush-up –Conclusions (what you hopefully."— Presentation transcript:

1 Dep. of Communication Technology Tom Brøndsted, Speech Communication 06 P.1 MM9 Speech Communication MM8 summary –Brush-up –Conclusions (what you hopefully learned!) MM9 –Standard Speech API –Hello World

2 Dep. of Communication Technology Tom Brøndsted, Speech Communication 06 P.2 Types of Speech Recognisers 1.“rule grammar recognition” = “command & control recognition” 2.“dictation”, “large vocabulary recognition”, 3.other types (e.g.. “Speech Commands” on mobile phones, DTW) From mm 7

3 Dep. of Communication Technology Tom Brøndsted, Speech Communication 06 P.3 Exercise with Dictation Dictation is not “general recognition” –Dependent on the ”topic” of the text data used for LM-training E.g. ViaVoice performs better for dictation of business letters than for dictation of fairy-tales! Dictation performs better after adaptation to the user –Is not 100% speaker-independent!

4 Dep. of Communication Technology Tom Brøndsted, Speech Communication 06 P.4 Exercise with the calculator Speech recognition is not the same is speech understanding! Understanding requires –Parsing –Context analysis

5 Dep. of Communication Technology Tom Brøndsted, Speech Communication 06 P.5 Dialogue System (text) James Allen: Natural Language Understanding, 1995 Recognition Synthesis Grammar & lexicon, Acoustic models

6 Dep. of Communication Technology Tom Brøndsted, Speech Communication 06 P.6 Exercise JHVite 10 dansk advokat har afsløret afdelingsingeniør dansk advokat har afsløret afhængig afdelingsingeniør almindelig dansk advokat har afsløret afdelingsingeniør afhængig afdelingsingeniør angrede begejstret advokat dominerer advokat afviser begejstret afdelingsingeniør advokat angrede dansk advokat angrede advokat afviser en begejstret afdelingsingeniør

7 Dep. of Communication Technology Tom Brøndsted, Speech Communication 06 P.7 Exercise JHVite 10 $adjektiv = dansk | afhængig | begejstret | almindelig; $substantiv = advokat | afdelingsingeniør; $transverb = afviser | har afsløret ; $intransverb = angrede | dominerer; $det = en | den; $np = [$det] {$adjektiv} $substantiv; $vp = ($transverb [$np]) | $intransverb; $s= $np $vp; ($s)

8 Dep. of Communication Technology Tom Brøndsted, Speech Communication 06 P.8 Exercise JHVite 10 Use variables that correspond to normal grammatical categories (noun, verb, subject, predicate etc.) Test the grammar –Does it take all sentences of a testset into account? –Does it only generate sentences that are likely to be input to the system?

9 Dep. of Communication Technology Tom Brøndsted, Speech Communication 06 P.9 WHAT IS A SPEECH API? (Conservative) State-of-the-art speech technology command and control speech recognition dictation speech recognition speech synthesis SPEECH API SPEECH APPLICATION e.g. spoken language dialogue system (grammar)(mark-up language)

10 Dep. of Communication Technology Tom Brøndsted, Speech Communication 06 P.10 SAPI Microsoft+vendors (IBM etc.) –Cross-vendor API –Platform: Windows 32 systems (NT, 2K XP) –Com interface, Ms Visual C++ 4.0, and other MS products SAPI-compliant speech products: –MS Whisper (free!), + “any” modern speech recogniser /synthesizer for Windows

11 Dep. of Communication Technology Tom Brøndsted, Speech Communication 06 P.11 JSAPI Sun Microsystems+vendors (Apple Computer, Inc, AT&T, Dragon Systems, IBM, Novell. Inc. Philips, Texas Instruments Incorporated) –cross-vendor API –cross platform API –JAVA JSAPI-compliant speech products: –ViaVoice for Linux (was free!) and Win32 systems, various speech synthesis systems

12 Dep. of Communication Technology Tom Brøndsted, Speech Communication 06 P.12 JSAPI packages three packages (collections of objects) javax.speech. javax.speech.synthesis javax.speech.recognition standard extension to the Java platform (“x”) Personal Java, Embedded Java

13 Dep. of Communication Technology Tom Brøndsted, Speech Communication 06 P.13 javax.speech centralized mechanism for –a) registering new speech engines, and –b) selecting available speech engines (from an application) a locale defines the supported language (e.g. de.ch = Swiss German) Additional features define –names of speakers that have trained the recognizer, –available synthetic voices pausing/resuming, notification of events etc.

14 Dep. of Communication Technology Tom Brøndsted, Speech Communication 06 P.14 javax.speech.synthesis Interface javax.speech.synthesis.speakPlainText –argument simple orthographic text Interface javax.speech. synthesis.speak –argument JSML-text, e.g. Message from John Doe regarding magazine article.

15 Dep. of Communication Technology Tom Brøndsted, Speech Communication 06 P.15 javax.speech.recognition Interface javax.speech.recognition.FinalRuleResult Interface javax.speech.recognition.Result 1-best list/n-best-list; for each item in list: –list of tokens (“words”) –list of tags –name of JSGF grammar accepting input –name of public rule accepting input

16 Dep. of Communication Technology Tom Brøndsted, Speech Communication 06 P.16 Java Speech Grammar Format (JSGF) EBNF-equivalent “traditional style” (like SAPIs CFG-format) plus!: –Java-adapted style (e.g. grammar URLs!) –“semantic tags” (synonymy, multilinguality) –weights (enabling n-gram-statistics) –unification gr.-like “action tags” (Sun Microsystem proposal)

17 Dep. of Communication Technology Tom Brøndsted, Speech Communication 06 P.17 JSGF: JAVA-adapted style JSGF header: grammar name/import, e.g. grammar dk.mydomain. application.mailBrowser import documentation comments /** - */ public rules vs. non-public (“private”) rules public = ; = ; =man | woman | bird;

18 Dep. of Communication Technology Tom Brøndsted, Speech Communication 06 P.18 JSGF Tags handling synonymy: = Australia {Oz} | (United States) {USA} | America {USA} | (U S of A) {USA}; handling multilinguality: = (howdy | good morning) {hi}; = (ohayo | ohayogozaimasu) {hi}; = (guten tag) {hi}; = (bon jour) {hi};

19 Dep. of Communication Technology Tom Brøndsted, Speech Communication 06 P.19 JSGF Weights probabilistic grammars (e.g. bigrams, trigrams) in JSGF = /10/ small | /2/ medium | /1/ large; equivalent to probabilities = /10/13/ small | /2/13/ medium | /1/13/ large;

20 Dep. of Communication Technology Tom Brøndsted, Speech Communication 06 P.20 JSGF Action Tags (proposal) Unification gr.-like percolation mechanism, but no structure sharing/feature constraints = (julie | "julie kay") { cat = properNoun; // The word is a proper noun. = juliek; // User's ID. date = permanent; // Indicates permanent entry in address book. }; = (( | | ){$user}) { this = $user; };

21 Dep. of Communication Technology Tom Brøndsted, Speech Communication 06 P.21 Language Models

22 Dep. of Communication Technology Tom Brøndsted, Speech Communication 06 P.22 N-grams Sentence: S = w1 w2... wQ Ideal sentence probability: P(S) = P(w1 w2... wQ)= P(w1)P(w2|w1)P(w3|w1 w2)...P(wQ|w1 w2...wQ-1) Approximate conditional word probability: P(wQ|w1 w2... wQ-1)  p(wQ|wQ-N+1... wQ-1) - where N has a constant “windowing” size: Unigram (N=1), Bigram (N=2), Trigram (N=3)

23 Dep. of Communication Technology Tom Brøndsted, Speech Communication 06 P.23 Trigram smoothing (Jellinek) Used when there are insufficient data for real trigrams P(w3|w1 w2)= p1 F(w1,w2,w3) + p2 F(w1,w2) + p3 F(w1) F(w1, w2) F(w1)  F(wi) Where: F is number of occurences of the string in its argument  F(wi) is the number of words in corpus p1, p2, p3 are positive values and p1+p2+p3=1

24 Dep. of Communication Technology Tom Brøndsted, Speech Communication 06 P.24 Clustering words in N-grams N-grams of word classes, categorical N-grams: –Words are “replaced” by (semantic, syntactic) categories before training. (e.g. “w_day” for Monday, Tuesday...) Data-driven clustering Stemming (porter) ….

25 Dep. of Communication Technology Tom Brøndsted, Speech Communication 06 P.25 N-gram problems Long distance dependencies exceeding n: [kommoden/bordet/stolene] i værelset på tredje etage skal males [rød, rødt, røde] Stochastic grammars “freezes” human verbal behaviour at a state reflected in the training data. The verbal behaviour may change. Adaptive approach? Finding corpora reflecting how humans will communicate with the final system –(Human-human dialogs vs. WOZ-experiments).


Download ppt "Dep. of Communication Technology Tom Brøndsted, Speech Communication 06 P.1 MM9 Speech Communication MM8 summary –Brush-up –Conclusions (what you hopefully."

Similar presentations


Ads by Google