Presentation is loading. Please wait.

Presentation is loading. Please wait.

Where do we go from here? Research and Commercial Spoken Dialog Systems. Roberto Pieraccini CTO, Tell-Eureka Corporation, New York, NY Juan Huerta IBM.

Similar presentations


Presentation on theme: "Where do we go from here? Research and Commercial Spoken Dialog Systems. Roberto Pieraccini CTO, Tell-Eureka Corporation, New York, NY Juan Huerta IBM."— Presentation transcript:

1 Where do we go from here? Research and Commercial Spoken Dialog Systems. Roberto Pieraccini CTO, Tell-Eureka Corporation, New York, NY Juan Huerta IBM T. J. Watson Research Center, Yorktown Heights, NY

2 The Spoken Dialog Landscape Dialog Linguistics Voice User Interface Dialog Engineering ACADEMIC RESEARCH INDUSTRIAL R&D

3 Cost Time to market Business model Interoperability Standards Two Different Goals Natural Interaction Freedom of Expression User is in control Task Completion Usability System is in control UNCONSTRAINED NATURAL LANGUAGE UNDERSTANDING MIXED INITIATIVE DIALOG HAND-CRAFTED GRAMMARS DIRECTED DIALOG ACADEMIC RESEARCH INDUSTRIAL R&D

4 FAQs Aren’t human-like free-form conversational systems more usable? Aren’t human-like free-form conversational systems more usable? It depends. It depends. Speech recognition and understanding technology is still very limited Speech recognition and understanding technology is still very limited Even if we had perfect technology, that won’t guarantee usability. Even if we had perfect technology, that won’t guarantee usability. Two extremes: Human Agents ---- DTMF Two extremes: Human Agents ---- DTMF Human agents are trained to follow well defined scripts Human agents are trained to follow well defined scripts

5 FAQs Isn’t natural language always the best design choice? Don’t users always want freedom of expression? Isn’t natural language always the best design choice? Don’t users always want freedom of expression? It depends. It depends. More freedom of expression = more speech recognition errors – Users hate errors More freedom of expression = more speech recognition errors – Users hate errors More freedom of expression = more difficult to correct and set dialog back on track More freedom of expression = more difficult to correct and set dialog back on track Some applications do very well without freedom of expression, others don’t. Some applications do very well without freedom of expression, others don’t.

6 FAQs Directed dialog or mixed initiative? Shouldn't users always be able to control the course of dialog? Directed dialog or mixed initiative? Shouldn't users always be able to control the course of dialog? It depends It depends Without guidance most users will be lost and wouldn’t know what to say or what the capabilities of the system are. Without guidance most users will be lost and wouldn’t know what to say or what the capabilities of the system are. Structured interactions (as opposed to free form) show a reduced rate of speech disfluences (Oviatt, 1995). Less disfluences = better ASR accuracy Structured interactions (as opposed to free form) show a reduced rate of speech disfluences (Oviatt, 1995). Less disfluences = better ASR accuracy Directed prompts allow to predict responses and tune grammars = less error prone interactions Directed prompts allow to predict responses and tune grammars = less error prone interactions

7 A little bit of history 1995 – The Birth of the Dialog Industry At that time the research world was searching for the holy grail of free form, natural language, spoken interaction (DARPA ATIS, Communicator) At that time the research world was searching for the holy grail of free form, natural language, spoken interaction (DARPA ATIS, Communicator) A couple of startup companies took a step back and realized that well structured directed dialog can outperform free form interactions for certain types of applications. A couple of startup companies took a step back and realized that well structured directed dialog can outperform free form interactions for certain types of applications. They realized the value of Voice User Interface (VUI) design They realized the value of Voice User Interface (VUI) design A market for telephony based speech applications started to appear and soon became a structured mature industry. A market for telephony based speech applications started to appear and soon became a structured mature industry. Industrial standards started to catch up with the convergence of speech and Web technology. Industrial standards started to catch up with the convergence of speech and Web technology.

8 2005 – The Commercial Spoken Dialog Landscape TECHNOLOGY VENDORS SPEECH RECOGNITION, TTS PLATFORM INTEGRATORS IVR, VoiceXML, CTI,… TOOLS – AUTHORING, TUNING, PREPACKAGED APPLICATIONS APPLICATION DEVELOPERS PROFESSIONAL SERVICES HOSTING In 2004, 600 to 1,000M$ revenue > 200 deployed applications in NA New evolving standards guarantee interoperability of engines and platforms.

9 Two Different Architectures General Natural Language Understanding Prompt Specific Grammars ACADEMIC RESEARCH INDUSTRIAL R&D

10 Architecture of a dialog system (the research view) SPEECH RECOGNIZER Language Models DIALOG MANAGER NATURAL LANGUAGE UNDERSTANDING Semantic Models TEXT-TO-SPEECH SYNTHESIZER

11 Commercial Conversational Architecture VOICE BROWSER PLATFORM (ASR, TTS, PLAY) APPLICATION SERVER PROMPTS GRAMMARS PROMPT GRAMMAR RECOGNITION RESULT SRGS SSML VoiceXML MRCP CCXML EMMA ? SCXML?

12 Speech and the Web VoiceXML applications VoiceXML Browser Web Server VoiceXML page HTTP request Static VoiceXML pages BACKEND Internet

13 Speech and the Web VoiceXML applications VoiceXML Browser Web Server BACKEND APPLICATION Application State Dynamic VoiceXML generation Dialog Manager VoiceXML document HTTP request Internet

14 Dialog Engineering: What is a Dialog Manager? DECIDE WHICH FUNCTION TO CALL GET RESULTS UPDATE STATE

15 Dialog Engineering: What is a Dialog Manager? DECIDE WHICH VoiceXML PAGE TO SERVE GET RESULTS UPDATE STATE

16 Two Different Approaches to Dialog Management DIALOG ENGINE USER EXPERIENCE ACADEMIC RESEARCH INDUSTRIAL R&D ENGINE BASED ON GENERAL DIALOG PRINCIPLES--CONFIGURED FOR DIFFERENT APPLICATIONS USER EXPERIENCE COMPLETELY SPECIFIED BY DESIGNER – CODED INTO APPLICATION PLATFORM

17 VUI Completeness Successful commercial applications require a detailed control of the user experience Successful commercial applications require a detailed control of the user experience No unpredictable behavior—every possible situation needs to be thought of and specified No unpredictable behavior—every possible situation needs to be thought of and specified It is common practice in the industry to fully describe the Voice User Interface (VUI) in a specification document. It is common practice in the industry to fully describe the Voice User Interface (VUI) in a specification document. Graph with nodes and conditional transitions Graph with nodes and conditional transitions System prompts and grammars are specified in detail System prompts and grammars are specified in detail Design-develop-test cycles prior to full deployment. Design-develop-test cycles prior to full deployment. Programming paradigm should match specification Programming paradigm should match specification

18 User Experience/VUI design Welcome Main Menu Account Balance Transfer Bill Payments Exit Get Origin Account Get Destination Account Get Amount Enter Transfer amount > origin account? Play Wrong Amount Message YES Play Confirmation confirmed? What is wrong? Go to Main Menu NO YES NO amount destination account origin account

19 Get Origin Account Get Destination Account Get Amount Enter Transfer amount > origin account? Play Wrong Amount Message YES Play Confirmation confirmed? What is wrong? Go to Main Menu NO YES NO amount destination account origin account User Experience/VUI design Get Amount Interaction Module PROMPTS TypeWordingSource Initial Please say the amount you would like to transfer from your get_amount_I_1.wav TTS to your get_amount_I_2.wav TTS in dollars and cents. get_amount_I_3.wav Retry 1 Please say the amount you would like to transfer from your get_amount_I_1.wav TTS to your get_amount_I_2.wav TTS in dollars and cents. get_amount_I_3.wav Retry 2 Please say the amount you would like to have transferred, like one hundred dollars and fifty cents. get_amount_R_2_1.wav Timeout 1 I'm sorry, I didn't hear you. get_amount_T_1_1.wav Please say the amount you would like to transfer from your get_amount_I_1.wav TTS to your get_amount_I_2.wav TTS Timeout 2 I didn't hear you this time either. Please say the amount you would like to have transferred, like one hundred dollars and fifty cents. get_amount_T_2_1.wav Help Please say how much do you wish to transfer. You can say the amount in dollars and cents, like, for instance, one hundred dollars and fifty cents. get_amount_H.wav ACTIONS CONDITIONACTION if amount greater than amount in if amount greater than amount in Go to "Play Wrong Amount Message" else Go to "Play Confirmation"

20 The Speech Application Lifecycle 23 45 6 7 8 9 10 requirements VUI design usability 1 VUI development speech science high level system design system engineering integration partial deployment full deployment Analyst VUI Designer Speech Scientist VUI Designer Architect, App Developer Engineer Project Manager

21 Simple Things Should be Easy Difficult Things Should be Possible CONTROL: The dialog programming paradigm should allow a detailed control of the VUI CONTROL: The dialog programming paradigm should allow a detailed control of the VUI Too low-level makes complex behavior hard to program Too low-level makes complex behavior hard to program EXPRESSIVENESS: Complex behavior need to be expressed in a simple way EXPRESSIVENESS: Complex behavior need to be expressed in a simple way Built-in behavior may be hard to bypass Built-in behavior may be hard to bypass

22 When Things Go Wrong Speech Error Control Speech recognition is not perfect and probably will never be in the foreseeable future Speech recognition is not perfect and probably will never be in the foreseeable future Speech recognition errors are extremely disruptive to the course of the dialog Speech recognition errors are extremely disruptive to the course of the dialog Commercial dialog applications developed robust strategies for error control Commercial dialog applications developed robust strategies for error control Directed dialog–matching grammars and prompts Directed dialog–matching grammars and prompts Two-step and One-step correction strategies Two-step and One-step correction strategies

23 Programmatic Dialog Management A generic program implementing a VUI specification if( answer == “balance” ) { PlayPrompt(“CheckingOrSavings.wav”); Recognize(&answer, “CheckingOrSavings.grm”); if( answer == “checking” ) { GetCheckingAccountBalance(&balance); PlayComplexPrompt(“BalanceOfCheckingIs.wav”, balance); elseif( answer == “savings” ) { GetSavingsAccontBalance(&balance); PlayComplexPrompt(“BalanceOfSavingsIs.wav”, balance); endif; elseif( answer == “transfer” ) { PlayPrompt(“SayFromAccount.wav”); Recognize(&answer, “CheckingOrSavings.grm”); if(answer == “checking”) {...

24 Smart developers hate to do the same things over and over Build libraries of reusable functions Build libraries of reusable functions Dialog Modules Dialog Modules Handle full collection of a single or multiple pieces of information (e.g. Credit Card, SSN, Date,...). Handle full collection of a single or multiple pieces of information (e.g. Credit Card, SSN, Date,...). Manage re-prompts, timeouts, disambiguation, data normalization, etc. Manage re-prompts, timeouts, disambiguation, data normalization, etc. Develop design patterns and styles Develop design patterns and styles Build sample code frameworks Build sample code frameworks State machine frameworks State machine frameworks Code examples, templates,... Code examples, templates,... State machine engines State machine engines

25 State Machine Call Flow Call flow is the simplest state machine model Call flow is the simplest state machine model Nodes correspond to prompts Nodes correspond to prompts Arcs correspond to user choices Arcs correspond to user choices Nodes roughly correspond to the application state Nodes roughly correspond to the application state

26 Balance or transfer? Which account? From account? Give checking balance Give savings balance transfer checkingsavings Amount? checking savings Make Savings to checking transfer Make Checking to savings transfer

27 Call-flow authoring in commercial IVR platforms

28 Early call flow tools had several limitations Topology often restricted to trees Topology often restricted to trees Limited functionality of nodes Limited functionality of nodes Limited conditional language Limited conditional language No recursion, encapsulation, scoping No recursion, encapsulation, scoping No inheritance of node properties No inheritance of node properties Limited mechanisms for handling external variables. Limited mechanisms for handling external variables. GUI drag-n-drop development environment GUI drag-n-drop development environment Difficult to handle mixed initiative Difficult to handle mixed initiative

29 A common misconception Finite state based dialog managers need a branch for each possible situation – cannot handle mixed initiative because of the combinatory explosion. Finite state based dialog managers need a branch for each possible situation – cannot handle mixed initiative because of the combinatory explosion.

30 What? dest Destination? Time? Destination? origin dest & time Origin? dest Origin? time Time? origin & dest Dest? origin & time Origin? dest & time origin dest time FLIGHT

31 What? Origin? !origin origin dest time FLIGHT Destination? Time? !dest !time …? FIA: Form Interpretation Algorithm..................

32 origin dest time FLIGHT !origin  ask_origin !dest  ask_dest !time  ask_time nprompts = 0  retrieve_flights !origin  ask_origin !dest  ask_dest !time  ask_time nprompts = 0  retrieve_flights Rule Based Authoring ask_origin ask_dest ! origin ! dest RETURN CONTINUE END STOP ask_time CONTINUE ! time RETURN STOP

33 Bottom line Finite state dialog controller is more powerful than what we thought Finite state dialog controller is more powerful than what we thought Can handle mixed initiative dialogs Can handle mixed initiative dialogs Applications can be authored in different forms Applications can be authored in different forms Some do not put constraints on the topology of the state machine (e.g. “call flow”) Some do not put constraints on the topology of the state machine (e.g. “call flow”) Others use specific topologies (e.g. rules, FIA) Others use specific topologies (e.g. rules, FIA) VUI completeness can be managed VUI completeness can be managed Simple things are easy. Are difficult things possible? Simple things are easy. Are difficult things possible?

34 The complexity of dialog systems LOW MEDIUM HIGH COMPLEXITY FLIGHT STATUS STOCK TRADING PACKAGE TRACKING FLIGHT RESERVATION BANKING CUSTOMER CARE TECHNICAL SUPPORT INFORMATIONALTRANSACTIONALPROBLEM SOLVING

35 Building more complex applications: Inference Based Dialog Managers Use an inference engine with a defined behavior Use an inference engine with a defined behavior Describe the application model rather than the user experience Describe the application model rather than the user experience Infer the user goal and create plans for system actions Infer the user goal and create plans for system actions

36 Example of Inference Based Dialog Management application  book_flight OR book_hotel OR book_train book_flight  get_dep_date AND ((get_itinerary AND get_time) OR get_flight_ID) AND get_airline get_itinerary  get_origin AND get_destination book_hotel  get_date_in AND get_date_out AND get_room_type AND get_city...

37 Example of Inference Based Dialog Management application  book_flight OR book_hotel OR book_train book_flight  get_dep_date AND ((get_itinerary AND get_time) OR get_flight_ID) AND get_airline get_itinerary  get_origin AND get_destination book_hotel  get_date_in AND get_date_out AND get_room_size AND get_city... I want to leave on May 3 rd from New York ORIGIN DEP_DATE

38 Example of Inference Based Dialog Management application  book_flight OR book_hotel OR book_train book_flight  get_dep_date AND ((get_itinerary AND get_time) OR get_flight_ID) AND get_airline get_itinerary  get_origin AND get_dest book_hotel  get_date_in AND get_date_out AND get_room_size AND get_city... I want to leave on May 3 rd from New York ORIGIN DEP_DATE STACK Where do you want to go? AGENDA get_dest get_time get_airline get_dest get_time Which airline ?

39 Roadblocks to Deploying Engine Based Models in Commercial Applications VUI completeness and predictability of engine behavior. VUI completeness and predictability of engine behavior. Difficult things are possible but are simple things easy? Difficult things are possible but are simple things easy? Application independence Application independence Developer training Developer training Fine tuning of VUI Fine tuning of VUI Mapping to/from VUI specs and roundtripping Mapping to/from VUI specs and roundtripping

40 The Innovator’s Dilemma TIME PERFORMANCE MARKET NEEDS SUSTAINING TECHNOLOGY DISRUPTIVE TECHNOLOGY The Innovator’s Dilemma Clayton M. Christensen, 1997 optical photography digital photography

41 Speech Technology and the Innovator’s Dilemma TIME PERFORMANCE MARKET NEEDS RESEARCH SYSTEMS COMMERCIAL SYSTEMS TODAY


Download ppt "Where do we go from here? Research and Commercial Spoken Dialog Systems. Roberto Pieraccini CTO, Tell-Eureka Corporation, New York, NY Juan Huerta IBM."

Similar presentations


Ads by Google