© 2013 by Larson Technical Services

Name: © 2013 by Larson Technical Services
Uploaded: 2017-09-09T15:38:07+00:00
Duration: PTM13S26
Channel: Rudolph Reed
Description: © 2013 by Larson Technical Services

© 2013 by Larson Technical Services
Outline Grammar-based speech recognition Statistical language model-based recognition Speech Synthesis Dialog Management Natural Language Processing © 2013 by Larson Technical Services

Speech Synthesis (Text-To-Speech, TTS)
Structure Rules Structure Analysis Abbreviation and Acronym Database Text Normalization Pronunciation Lexicon Text-to-phoneme Conversion Prosody Rules Prosody Analysis Phoneme-to-sound Database Waveform Production © 2013 by Larson Technical Services

Concatenated vs. Parameter-based Speech Synthesis
Isolate Phonemes “The dog barked” “red car” Concatenate er ed d k ah er dh eh d ao g b ah er k eh d “red car” Generate Speech er ed d k ah er Voice Parameters © 2013 by Larson Technical Services

Speech Synthesis ML Structure Analysis Text Normali- zation Text-to- Phoneme Conversion Prosody Analysis Waveform Production Markup support: p, s Non-markup behavior: infer structure by automated text analysis Critic: I don’t want to use to use TTS, it’s too difficult to understand Response: Developers can replay audio files, use TTS, or a combination of both. Developers can rely upon defaults from the TTS engine, or specify commands to override defaults © 2013 by Larson Technical Services

Before and after Structure Analysis
Before structure analysis Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught a 19 lb. bass. After structure analysis <s> Dr. Smith lives at 214 Elm Dr. </s> He weights 214 lb. <s> He plays bass guitar. </s> He also likes to fish; last week he caught a 19 lb. bass. © 2013 by Larson Technical Services

Speech Synthesis ML Structure Analysis Text Normali- zation Text-to- Phoneme Conversion Prosody Analysis Waveform Production Markup support: p, s Non-markup behavior: infer structure by automated text analysis Critic: I don’t want to use to use TTS, it’s too difficult to understand Response: Developers can replay audio files, use TTS, or a combination of both. Developers can rely upon defaults from the TTS engine, or specify commands to override defaults Markup support: say-as for dates, times, etc. sub for aliasing Non-markup behavior: automatically identify and convert constructs © 2013 by Larson Technical Services

After Text Normalization
 <s> Dr. Smith lives at 214 Elm Dr. </s> He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught a 19 lb. bass. © 2013 by Larson Technical Services

Speech Synthesis ML Structure Analysis Text Normali- zation Text-to- Phoneme Conversion Prosody Analysis Waveform Production Markup support: phoneme, say-as Non-markup behavior: look up in pronunciation dictionary Markup support: p, s Non-markup behavior: infer structure by automated text analysis Critic: I don’t want to use to use TTS, it’s too difficult to understand Response: Developers can replay audio files, use TTS, or a combination of both. Developers can rely upon defaults from the TTS engine, or specify commands to override defaults Markup support: say-as for dates, times, etc. sub for aliasing Non-markup behavior: automatically identify and convert constructs © 2013 by Larson Technical Services

After Text-to-Phoneme Conversion
 <s> Dr. Smith lives at <say-as interpret-as = “address"> 214 </sayas> Elm Dr. </s> He weighs <sayas interpret-as = "number">214 </sayas> lb. He plays <phoneme alphabet = "ipa" ph="beɪs">bass</phoneme> guitar. He also likes to fish; last week he caught a <sayas interpret-as= "number">19 </sayas> lb. <phoneme alphabet = "ipa" ph="bæs">bass</phoneme>. © 2013 by Larson Technical Services

Pronunciation Specification
Within the text replace "creek" by “krik” With the phoneme commands <phoneme alphabet = "ipa" ph="krik"> creek </phoneme> In the pronunciation lexicon <lexeme> <grapheme>creek</grapheme> <phoneme>"krik" </phoneme> </lexeme> Designer has preference for how words should be spoken, e.g., creek, aluminum Phonetic spellings sometimes don’t have the desired effect © 2013 by Larson Technical Services

Speech Synthesis ML Structure Analysis Text Normali- zation Text-to- Phoneme Conversion Prosody Analysis Waveform Production Markup support: phoneme, say-as Non-markup behavior: look up in pronunciation dictionary Markup support: p, s Non-markup behavior: infer structure by automated text analysis Markup support: emphasis, break, prosody Non-markup behavior: automatically generate prosody through analysis of document structure and sentence syntax Critic: I don’t want to use to use TTS, it’s too difficult to understand Response: Developers can replay audio files, use TTS, or a combination of both. Developers can rely upon defaults from the TTS engine, or specify commands to override defaults Markup support: say-as for dates, times, etc. sub for aliasing Non-markup behavior: automatically identify and convert constructs © 2013 by Larson Technical Services

Prosody Analysis (Initial text)
<prompt> Environmental control menu. Do you want to adjust the lighting or temperature? </prompt> © 2013 by Larson Technical Services

Prosody Analysis <prompt> Environmental control menu <break/> <emphasis level = "reduced" > do you want to adjust the </emphasis> <emphasis level = "strong"> lighting </emphasis> <break/> or <emphasis level = "strong"> temperature? </emphasis> </prompt> © 2013 by Larson Technical Services

Speech Synthesis ML Structure Analysis Text Normali- zation Text-to- Phoneme Conversion Prosody Analysis Waveform Production Markup support: voice, audio* Markup support: phoneme, say-as Non-markup behavior: look up in pronunciation dictionary Markup support: paragraph, sentence Non-markup behavior: infer structure by automated text analysis *audio icons, branding, advertising Markup support: emphasis, break, prosody Non-markup behavior: automatically generate prosody through analysis of document structure and sentence syntax Critic: I don’t want to use to use TTS, it’s too difficult to understand response: Developers can replay audio files, use TTS, or a combination of both. Developers can rely upon defaults from the TTS engine, or specify commands to override defaults Markup support: say-as for dates, times, etc. sub for aliasing Non-markup behavior: automatically identify and convert constructs © 2013 by Larson Technical Services

Prerecorded messages vs. Speech Synthesis
Natural sounding Easy to understand Static data Tedious to record and tag Prerecorded messages Artificial sounding May be difficult to understand Computer-generated data Easy to specify Speech Synthesis (TTS)

© 2013 by Larson Technical Services

Similar presentations

Presentation on theme: "© 2013 by Larson Technical Services"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

© 2013 by Larson Technical Services

Similar presentations

Presentation on theme: "© 2013 by Larson Technical Services"— Presentation transcript:

Similar presentations

About project

Feedback