Presentation is loading. Please wait.

Presentation is loading. Please wait.

L C SL C S SpeechBuilder: Facilitating Spoken Dialogue System Creation Eugene Weinstein Project Oxygen Core Team MIT Laboratory for Computer Science

Similar presentations


Presentation on theme: "L C SL C S SpeechBuilder: Facilitating Spoken Dialogue System Creation Eugene Weinstein Project Oxygen Core Team MIT Laboratory for Computer Science"— Presentation transcript:

1 L C SL C S SpeechBuilder: Facilitating Spoken Dialogue System Creation Eugene Weinstein Project Oxygen Core Team MIT Laboratory for Computer Science ecoder@mit.edu

2 Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003 Developing robust, mixed-initiative spoken dialogue systems is difficult –Complex systems can be created by human- language technology experts Speech Builder Hub Speech Synthesis Speech Synthesis Language Generation Language Generation Dialogue Management Dialogue Management Context Resolution Context Resolution Language Processing Speech Recog. Speech Recog. Database Server Database Server Audio Bridging the Experience Gap SpeechBuilder aims to help novices rapidly create speech-based systems –Uses intuitive methods for specifying domain-specific constraints –Automatically configures HLT components using MIT GALAXY architecture *Leverages future technical advances *Encourages research on portability –Novice developers must overcome a considerable technical challenge

3 Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003 SpeechBuilder Server SpeechBuilder Server Hub CGI Parameter Generation CGI Parameter Generation Speech Recognition Speech Recognition Speech Synthesis Speech Synthesis Language Processing Audio Server Audio Server HTTP Gives developer total control over application functionality Developer Application Developer Application Communication with Galaxy via simple HTTP protocol “Turn on the lights in the kitchen” action=set&frame=(object=lights, room=kitchen,value=on) “Show me the banks on Main Street” action=identify&frame=( object=(type=bank, on=(street=Main, ext=Street))) Baseline Configuration

4 Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003 Still gives developer total control over application functionality Frame Relay server exposes Galaxy meaning representation to app Developer Application Developer Application “Turn on the lights in the kitchen” {c turn_management :parse_frame {c turn :object “lights” :room “kitchen” :value “on”} “Show me the banks on Main Street” {c turn_management :parse_frame {c identify “type” bank :pred {p :on {:street “Main” :ext “Street”}}} Modified Baseline Configuration (this class) Frame Relay Server Frame Relay Server Hub CGI Parameter Generation CGI Parameter Generation Speech Recognition Speech Recognition Speech Synthesis Speech Synthesis Language Processing Audio Server Audio Server TCP SocketSemantic Frame

5 Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003 For a speech-based interface to structured data No programming required; specify table(s) and constraints Database Server Database Server Hub Language Generation Language Generation Speech Recognition Speech Recognition Discourse Resolution Discourse Resolution Speech Synthesis Speech Synthesis Dialogue Management Dialogue Management Language Processing I/O Server I/O Server Audio Server Audio Server Audio Server Audio Server INFO Database Access Configuration **

6 Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003 Step 1: Off-line creation and compilation Hub NLG ASR Discours TTS Dialog NLU Aud io SB Query Response Step 2: On-line deployment INFO Dialog NLG HUB NLU Disc ASR UploadCompile Creating a Speech-Based Application

7 Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003 Audio Server Audio Server Telephone or lightweight audio server Database Server Database Server Accesses back- end database Language Processing N-best interface with ASR Grammar from attributes & actions Backs off to concept spotting Context Resolution Context Resolution New component performs concept inheritance & masking Processes ‘E-form’ Dialogue Management Dialogue Management Generic server handles interaction Speech Synthesis Speech Synthesis Commercial product Language Generation Language Generation Generates ‘E-form’, SQL, & responses Default entries made Galaxy programmable hub controls interactions between all components Hub Human Language Technologies Speech Recognition Speech Recognition Generic acoustic models Unknown word model Class or hierarchical n-gram

8 Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003 Some columns are used to access entries (e.g., Name) –Column entries must be incorporated into ASR & NLU Some columns are only used in responses (e.g., Phone) –Column names must be incorporated into ASR & NLU NamePhoneEmailOffice Jim Glassx3-1640glass@mit.edu603 Stephanie Seneffx3-0451seneff@mit.edu643 Victor Zuex3-8513zue@mit.edu601a “What is the phone number for Victor Zue?” Extracting Database Information **

9 Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003 Knowledge Representation Concepts and actions form basis for understanding –Concepts become key/value entries in meaning representation *city: Boston, New York…day: Monday, Tuesday –Actions provide sentence-level patterns of specific queries *“I want to fly from Boston to Taipei…” action=lookup_flight –Action text can be bracketed to define hierarchical concepts ** *“I want to fly source=(from Boston) destination=(to Taipei)” *source=Boston destination=Taipei –Concepts and actions used to configure the following components *Speech Recognition *Natural Language Understanding *Discourse Database columns define basic concepts –Column names can be grouped into concepts *property: phone, email…weather: snow, rain…

10 Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003 Concept usage can be fine-tuned to improve performance:** By default, concepts are used for language modeling, parsing grammar, and meaning representation –For language modeling and parsing grammar only (i.e., no meaning) –For keyword spotting only (i.e., no role in language modeling) –For fine-grained language modeling with coarser meaning representation rain hail snow weather: snow “Will it snow?” sprinkles flurries showers breezy rainy snowy snowfall accumulation rainfall snowstorm thunderstorm blizzard weather: snow Language Modeling and Understanding

11 Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003 Current Status SpeechBuilder has been operational for over two years –Used by over 50 developers from MIT and elsewhere –Used in undergraduate classes at MIT and Georgetown University ASR capabilities benchmarked against main systems –Achieves same ASR performance as MIT Jupiter weather information system (6.8% word error rate on clean data) (phone #) Several prototype systems have been developed –Information about faculty, staff and students at LCS and AI Labs (phone, email, room, voice messages, transfer, etc.) –Application to control the various physical items in a typical office (lights, curtains, TV, VCR, projector, etc.) –Others include TV schedules, real-time weather forecasts, hotel and restaurant information etc. SpeechBuilder used for initial design of many more complex domains

12 Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003 Increase sophistication of discourse and dialogue manager to handle more complex dialogues –Enable finer specification of discourse capabilities –Add generic capabilities for times, dates, etc. Incorporate confidence scoring and implement unsupervised training of acoustic and language models Create functionality to allow developers to create domain- specific concatenative speech synthesis Create alternative methods of domain specifications to streamline development –Advanced developers don’t necessarily use web interface –Allow for more efficient automatic generation of SpeechBuilder domains Ongoing and Future Work

13 Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003 Issam Bazzi Scott Cyphers Ed Filisko Jim Glass TJ Hazen Lee Hetherington Joe Polifroni Stephanie Seneff Michelle Spina Eugene Weinstein Jon Yi Misha Zitser Acknowledgements

14 L C SL C S SpeechBuilder Hands-on Activity Eugene Weinstein Project Oxygen Core Team MIT Laboratory for Computer Science ecoder@mit.edu

15 Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003 Frame Relay Server Frame Relay Server Hub CGI Parameter Generation CGI Parameter Generation Speech Recognition Speech Recognition Speech Synthesis Speech Synthesis Language Processing Audio Server Audio Server TCP Socket Still gives developer total control over application functionality Frame Relay server exposes Galaxy meaning representation to app Developer Application Developer Application Modified Baseline Configuration (this class) Semantic Frame Jaim

16 Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003 SpeechBuilder API Galaxy Frame Relay Galaxy meaning representation provided through frame relay Applications connect via TCP sockets API provided in Perl, Python, and Java –This class: Python API Python class galaxy.server.Server Application Python class galaxy.frame.Frame galaxy.server.Server methods: Constructor(machine,port,ID) connect() processMessage(blocking) disconnect() galaxy.frame.Frame methods: getAction() getAttribute(attr_name) getText() toString() Python API TCP Socket


Download ppt "L C SL C S SpeechBuilder: Facilitating Spoken Dialogue System Creation Eugene Weinstein Project Oxygen Core Team MIT Laboratory for Computer Science"

Similar presentations


Ads by Google