Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2013 by Larson Technical Services

Similar presentations


Presentation on theme: "© 2013 by Larson Technical Services"— Presentation transcript:

1 © 2013 by Larson Technical Services
Outline Grammar-based speech recognition Statistical language model-based recognition Speech Synthesis Dialog Management Natural Language Processing © 2013 by Larson Technical Services

2 Speech Synthesis (Text-To-Speech, TTS)
Structure Rules Structure Analysis Abbreviation and Acronym Database Text Normalization Pronunciation Lexicon Text-to-phoneme Conversion Prosody Rules Prosody Analysis Phoneme-to-sound Database Waveform Production © 2013 by Larson Technical Services

3 Concatenated vs. Parameter-based Speech Synthesis
Isolate Phonemes “The dog barked” “red car” Concatenate er ed d k ah er dh eh d ao g b ah er k eh d “red car” Generate Speech er ed d k ah er Voice Parameters © 2013 by Larson Technical Services

4 © 2013 by Larson Technical Services
Speech Synthesis ML Structure Analysis Text Normali- zation Text-to- Phoneme Conversion Prosody Analysis Waveform Production Markup support: p, s Non-markup behavior: infer structure by automated text analysis Critic: I don’t want to use to use TTS, it’s too difficult to understand Response: Developers can replay audio files, use TTS, or a combination of both. Developers can rely upon defaults from the TTS engine, or specify commands to override defaults © 2013 by Larson Technical Services

5 Before and after Structure Analysis
Before structure analysis Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught a 19 lb. bass. After structure analysis <p> <s> Dr. Smith lives at 214 Elm Dr. </s> He weights 214 lb. <s> He plays bass guitar. </s> He also likes to fish; last week he caught a 19 lb. bass. </p> © 2013 by Larson Technical Services

6 © 2013 by Larson Technical Services
Speech Synthesis ML Structure Analysis Text Normali- zation Text-to- Phoneme Conversion Prosody Analysis Waveform Production Markup support: p, s Non-markup behavior: infer structure by automated text analysis Critic: I don’t want to use to use TTS, it’s too difficult to understand Response: Developers can replay audio files, use TTS, or a combination of both. Developers can rely upon defaults from the TTS engine, or specify commands to override defaults Markup support: say-as for dates, times, etc. sub for aliasing Non-markup behavior: automatically identify and convert constructs © 2013 by Larson Technical Services

7 After Text Normalization
<p> <s> <sub alias= "doctor">Dr. </sub> Smith lives at 214 Elm <sub alias = "drive">Dr. </sub> </s> He weights 214<sub alias= "pounds"> lb. </sub> He plays bass guitar. He also likes to fish; last week he caught a 19 <sub alias= "pound"> lb. </sub> bass. </p> © 2013 by Larson Technical Services

8 © 2013 by Larson Technical Services
Speech Synthesis ML Structure Analysis Text Normali- zation Text-to- Phoneme Conversion Prosody Analysis Waveform Production Markup support: phoneme, say-as Non-markup behavior: look up in pronunciation dictionary Markup support: p, s Non-markup behavior: infer structure by automated text analysis Critic: I don’t want to use to use TTS, it’s too difficult to understand Response: Developers can replay audio files, use TTS, or a combination of both. Developers can rely upon defaults from the TTS engine, or specify commands to override defaults Markup support: say-as for dates, times, etc. sub for aliasing Non-markup behavior: automatically identify and convert constructs © 2013 by Larson Technical Services

9 After Text-to-Phoneme Conversion
<p> <s> <sub alias = "doctor">Dr.</sub> Smith lives at <say-as interpret-as = “address"> 214 </sayas> Elm <sub alias = "drive">Dr. </sub> </s> He weighs <sayas interpret-as = "number">214 </sayas> <sub alias= "pounds"> lb.</sub> He plays <phoneme alphabet = "ipa" ph="beɪs">bass</phoneme> guitar. He also likes to fish; last week he caught a <sayas interpret-as= "number">19 </sayas> <sub alias= "pound"> lb. </sub> <phoneme alphabet = "ipa" ph="bæs">bass</phoneme>. </p> © 2013 by Larson Technical Services

10 Pronunciation Specification
Within the text replace "creek" by “krik” With the phoneme commands <phoneme alphabet = "ipa" ph="krik"> creek </phoneme> In the pronunciation lexicon <lexeme> <grapheme>creek</grapheme> <phoneme>"krik" </phoneme> </lexeme> Designer has preference for how words should be spoken, e.g., creek, aluminum Phonetic spellings sometimes don’t have the desired effect © 2013 by Larson Technical Services

11 © 2013 by Larson Technical Services
Speech Synthesis ML Structure Analysis Text Normali- zation Text-to- Phoneme Conversion Prosody Analysis Waveform Production Markup support: phoneme, say-as Non-markup behavior: look up in pronunciation dictionary Markup support: p, s Non-markup behavior: infer structure by automated text analysis Markup support: emphasis, break, prosody Non-markup behavior: automatically generate prosody through analysis of document structure and sentence syntax Critic: I don’t want to use to use TTS, it’s too difficult to understand Response: Developers can replay audio files, use TTS, or a combination of both. Developers can rely upon defaults from the TTS engine, or specify commands to override defaults Markup support: say-as for dates, times, etc. sub for aliasing Non-markup behavior: automatically identify and convert constructs © 2013 by Larson Technical Services

12 Prosody Analysis (Initial text)
<prompt> Environmental control menu. Do you want to adjust the lighting or temperature? </prompt> © 2013 by Larson Technical Services

13 © 2013 by Larson Technical Services
Prosody Analysis <prompt> Environmental control menu <break/> <emphasis level = "reduced" > do you want to adjust the </emphasis> <emphasis level = "strong"> lighting </emphasis> <break/> or <emphasis level = "strong"> temperature? </emphasis> </prompt> © 2013 by Larson Technical Services

14 © 2013 by Larson Technical Services
Speech Synthesis ML Structure Analysis Text Normali- zation Text-to- Phoneme Conversion Prosody Analysis Waveform Production Markup support: voice, audio* Markup support: phoneme, say-as Non-markup behavior: look up in pronunciation dictionary Markup support: paragraph, sentence Non-markup behavior: infer structure by automated text analysis *audio icons, branding, advertising Markup support: emphasis, break, prosody Non-markup behavior: automatically generate prosody through analysis of document structure and sentence syntax Critic: I don’t want to use to use TTS, it’s too difficult to understand response: Developers can replay audio files, use TTS, or a combination of both. Developers can rely upon defaults from the TTS engine, or specify commands to override defaults Markup support: say-as for dates, times, etc. sub for aliasing Non-markup behavior: automatically identify and convert constructs © 2013 by Larson Technical Services

15 Prerecorded messages vs. Speech Synthesis
Natural sounding Easy to understand Static data Tedious to record and tag Prerecorded messages Artificial sounding May be difficult to understand Computer-generated data Easy to specify Speech Synthesis (TTS)


Download ppt "© 2013 by Larson Technical Services"

Similar presentations


Ads by Google