Lecture 3: October 5, 2004 Dan Jurafsky

Slides:



Advertisements
Similar presentations
Negotiative dialogue some definitions and ideas. Negotiation vs. acceptance Clark’s ladder: –1. A attends to B’s utterance –2. A percieves B’s utterance.
Advertisements

CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 9: Human conversation, frame-based dialogue systems.
INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING NLP-AI IIIT-Hyderabad CIIL, Mysore ICON DECEMBER, 2003.
Substitute FAQs SubFinder Overview. FAQs Do I have to have touch-tone service to use SubFinder? No, but you do need a telephone that can be switched from.
User interaction ‘Rules’ of Human-Human Conversation
U1, Speech in the interface:2. Dialogue Management1 Module u1: Speech in the Interface 2: Dialogue Management Jacques Terken HG room 2:40 tel. (247) 5254.
Speech and Language Processing
6/28/20151 Spoken Dialogue Systems: Human and Machine Julia Hirschberg CS 4706.
 MODERN DATABASE MANAGEMENT SYSTEMS OVERVIEW BY ENGINEER BILAL AHMAD
ITCS 6010 SALT. Speech Application Language Tags (SALT) Speech interface markup language Extension of HTML and other markup languages Adds speech and.
Interaction Design Session 12 LBSC 790 / INFM 718B Building the Human-Computer Interface.
Dept. of Computer Science University of Rochester Rochester, NY By: James F. Allen, Donna K. Byron, Myroslava Dzikovska George Ferguson, Lucian Galescu,
1 Natural Language Processing Lecture Notes 14 Chapter 19.
Student Pages
Listener-Control Navigation of VoiceXML. Nuance Speech Analysis 92% of customer service is through phone. 84% of industrialists believe speech better.
Integrating Multiple Knowledge Sources For Improved Speech Understanding Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering,
© 2015 albert-learning.com How to talk to your boss How to talk to your boss!!
VoiceXML. Nuance Speech Analysis 92% of customer service is through phone. 84% of industrialists believe speech better than web.
Speech and multimodal Jesse Cirimele. papers “Multimodal interaction” Sharon Oviatt “Designing SpeechActs” Yankelovich et al.
JavaScript: Conditionals contd.
JavaScript Controlling the flow of your programs with ‘if’ statements
Natural Language Understanding
VoiceXML Tutorial: Part 1 Introduction and User Interaction with DTMF
Databases: What they are and how they work
Tutorial Number 4 Time, Days & Dates.
How do Web Applications Work?
CS 224S / LINGUIST 285 Spoken Language Processing
ENGLISH 3024 Research & Resources
CSC 594 Topics in AI – Natural Language Processing
Routers and Redundancy
Writing Paper Three Monday, November 2.
System Design Ashima Wadhwa.
Sullivan County 4-H Activities Day
Activity Flow Design - or - Organizing the users’ Work
Assessing Grammar Module 5 Activity 5.
Language Learning for Busy People
Learning for Dialogue.
Natural Language Understanding
Making Arrangements By Ms. Terri Yueh.
Seek First to Understand, Then to Be Understood
Introduction CSE 1310 – Introduction to Computers and Programming
Section 10.1 YOU WILL LEARN TO… Define scripting
Intro to PHP & Variables
Spoken Dialogue Systems
Issues in Spoken Dialogue Systems
Spoken Dialogue Systems: Human and Machine
Business and Management Research
Spoken Dialogue Systems: Managing Interaction
Lecture 4: October 7, 2004 Dan Jurafsky
Specifying, Compiling, and Testing Grammars
Teaching Listening Based on Active Learning.
Task-Based Approach to Language Instruction
Dialogue Acts and Information State
Lecture 29 Word Sense Disambiguation-Revision Dialogues
Managing Dialogue Julia Hirschberg CS /28/2018.
Instructional Learning Cycle:
Lecture 29 Word Sense Disambiguation-Revision Dialogues
UNIT 3 CHAPTER 1 LESSON 4 Using Simple Commands.
Problem Solving Skill Area 305.1
Lecture 30 Dialogues November 3, /16/2019.
Programming We have seen various examples of programming languages
PROJ2: Building an ASR System
Applying for a Job “My First Résumé”
Introduction to Decision Structures and Boolean Variables
Spoken Dialogue Systems: System Overview
Telephone English By Joy Yu.
Telephone English.
CISC101 Reminders Assignment 3 due today.
CMPT 120 Lecture 3 - Introduction to Computing Science – Programming language, Variables, Strings, Lists and Modules.
VoiceXML An investigation Author: Mya Anderson
Presentation transcript:

Lecture 3: October 5, 2004 Dan Jurafsky LING 138/238 SYMBSYS 138 Intro to Computer Speech and Language Processing Lecture 3: October 5, 2004 Dan Jurafsky 11/16/2018 LING 138/238 Autumn 2004

Week 2: Dialogue and Conversational Agents Examples of spoken language systems Components of a dialogue system, focus on these 3: ASR NLU Dialogue management VoiceXML Grounding and Confirmation 11/16/2018 LING 138/238 Autumn 2004

Conversational Agents AKA: Spoken Language Systems Dialogue Systems Speech Dialogue Systems Applications: Travel arrangements (Amtrak, United airlines) Telephone call routing Tutoring Communicating with robots Anything with limited screen/keyboard 11/16/2018 LING 138/238 Autumn 2004

A travel dialog: Communicator 11/16/2018 LING 138/238 Autumn 2004

Call routing: ATT HMIHY 11/16/2018 LING 138/238 Autumn 2004

A tutorial dialogue: ITSPOKE 11/16/2018 LING 138/238 Autumn 2004

Dialogue System Architecture Simplest possible architecture: ELIZA Read-search/replace-print loop We’ll need something with more sophisticated dialogue control And speech 11/16/2018 LING 138/238 Autumn 2004

Dialogue System Architecture 11/16/2018 LING 138/238 Autumn 2004

ASR engine ASR = Automatic Speech Recognition Job of ASR system is to go from speech (telephone or microphone) to words We will be studying this in a few weeks 11/16/2018 LING 138/238 Autumn 2004

ASR Overview (pic from Yook 2003) 11/16/2018 LING 138/238 Autumn 2004

ASR in Dialogue Systems ASR systems work better if can constrain what words the speaker is likely to say. A dialogue system often has these constraints: System: What city are you departing from? Can expect sentences of the form I want to (leave|depart) from [CITYNAME] From [CITYNAME] [CITYNAME] etc 11/16/2018 LING 138/238 Autumn 2004

ASR in Dialogue Systems Also, can adapt to speaker But!! ASR is errorful So unlike ELIZA, can’t count on the words being correct As we will see, this fact about error plays a huge role in dialogue system design 11/16/2018 LING 138/238 Autumn 2004

Natural Language Understanding Also called NLU We will discuss this later in the quarter There are many ways to represent the meaning of sentences For speech dialogue systems, perhaps the most common is a simple one called “Frame and slot semantics”. Semantics = meaning 11/16/2018 LING 138/238 Autumn 2004

An example of a frame Show me morning flights from Boston to SF on Tuesday. SHOW: FLIGHTS: ORIGIN: CITY: Boston DATE: Tuesday TIME: morning DEST: CITY: San Francisco 11/16/2018 LING 138/238 Autumn 2004

How to generate this semantics? Many methods, as we will see in week 9 Simplest: semantic grammars LIST -> show me | I want | can I see|… DEPARTTIME -> (after|around|before) HOUR | morning | afternoon | evening HOUR -> one|two|three…|twelve (am|pm) FLIGHTS -> (a) flight|flights ORIGIN -> from CITY DESTINATION -> to CITY CITY -> Boston | San Francisco | Denver | Washington 11/16/2018 LING 138/238 Autumn 2004

Semantics for a sentence LIST FLIGHTS ORIGIN Show me flights from Boston DESTINATION DEPARTDATE to San Francisco on Tuesday DEPARTTIME morning 11/16/2018 LING 138/238 Autumn 2004

Frame-filling We use a parser (week 10) to take these rules and apply them to the sentence. Resulting in a semantics for the sentence We can then write some simple code That takes the semantically labeled sentence And fills in the frame. 11/16/2018 LING 138/238 Autumn 2004

Other NLU Approaches Cascade of Finite-State-Transducers Instead of a parser, we could use FSTs, which are very fast, to create the semantics. Or we could use “Syntactic rules with semantic attachments” This latter is what is done in VoiceXML, so we will see that today. 11/16/2018 LING 138/238 Autumn 2004

Generation and TTS Won’t say much about this today TTS next week! Generation: two main approaches Simple templates (prescripted sentences) Unification: use similar grammar rules as for parsing, but run them backwards! 11/16/2018 LING 138/238 Autumn 2004

Dialogue Manager Eliza was simplest dialogue manager Read-search/replace-print loop No state was kept; system did the same thing on every sentence A real dialogue manager needs to keep state We can’t keep asking the same question over and over! 11/16/2018 LING 138/238 Autumn 2004

Three architectures for dialogue management Finite State Frame-based Planning Agents 11/16/2018 LING 138/238 Autumn 2004

Finite State Dialogue Manager 11/16/2018 LING 138/238 Autumn 2004

Finite-state dialogue managers System completely controls the conversation with the user. It asks the user a series of question Ignoring (or misinterpreting) anything the user says that is not a direct answer to the system’s questions 11/16/2018 LING 138/238 Autumn 2004

Dialogue Initiative “Initiative” means who has control of the conversation at any point Single initiative System User Mixed initative 11/16/2018 LING 138/238 Autumn 2004

System Initiative Systems which completely control the conversation at all times are called system initiative. Advantages: Simple to build User always knows what they can say next System always knows what user can say next Known words: Better performance from ASR Known topic: Better performance from NLU Disadvantage: Too limited 11/16/2018 LING 138/238 Autumn 2004

User Initiative User directs the system Generally, user asks a single question, system answers System can’t ask questions back, engage in clarification dialogue, confirmation dialogue Used for simple database queries User asks question, system gives answer Web search is user initiative dialogue. 11/16/2018 LING 138/238 Autumn 2004

Problems with System Initiative Real dialogue involves give and take! In travel planning, users might want to say something that is not the direct answer to the question. For example answering more than one question in a sentence: Hi, I’d like to fly from Seattle Tuesday morning I want a flight from Milwaukee to Orlando one way leaving after 5 p.m. on Wednesday. 11/16/2018 LING 138/238 Autumn 2004

Single initiative + universals We can give users a little more flexibility by adding universal commands Universals: commands you can say anywhere As if we augmented every state of FSA with these Help Correct This describes many implemented systems But still doesn’t deal with mixed initiative 11/16/2018 LING 138/238 Autumn 2004

Mixed Initiative Conversational initiative can shift between system and user Simplest kind of mixed initiative: use the structure of the frame itself to guide dialogue Slot Question ORIGIN What city are you leaving from? DEST Where are you going? DEPT DATE What day would you like to leave? DEPT TIME What time would you like to leave? AIRLINE What is your preferred airline? 11/16/2018 LING 138/238 Autumn 2004

Frames are mixed-initiative User can answer multiple questions at once. System asks questions of user, filling any slots that user specifies When frame is filled, do database query If user answers 3 questions at once, system has to fill slots and not ask these questions again! Anyhow, we avoid the strict constraints on order of the finite-state architecture. 11/16/2018 LING 138/238 Autumn 2004

Multiple frames flights, hotels, rental cars Flight legs: Each flight can have multiple legs, which might need to be discussed separately Presenting the flights (If there are multiple flights meeting users constraints) It has slots like 1ST_FLIGHT or 2ND_FLIGHT so use can ask “how much is the second one” General route information: Which airlines fly from Boston to San Francisco Airfare practices: Do I have to stay over Saturday to get a decent airfare? 11/16/2018 LING 138/238 Autumn 2004

Multiple Frames Need to be able to switch from frame to frame Based on what user says. Disambiguate which slot of which frame an input is supposed to fill, then switch dialogue control to that frame. 11/16/2018 LING 138/238 Autumn 2004

VoiceXML Voice eXtensible Markup Language An XML-based dialogue design language Makes use of ASR and TTS Deals well with simple, frame-based mixed initiative dialogue. Most common in commercial world (too limited for research systems) But useful to get a handle on the concepts. 11/16/2018 LING 138/238 Autumn 2004

Voice XML Each dialogue is a <form>. (Form is the VoiceXML word for frame) Each <form> generally consists of a sequence of <field>s, with other commands 11/16/2018 LING 138/238 Autumn 2004

Sample vxml doc <form> <field name="transporttype"> <prompt> Please choose airline, hotel, or rental car. </prompt> <grammar type="application/x=nuance-gsl"> [airline hotel "rental car"] </grammar> </field> <block> You have chosen <value expr="transporttype">. </prompt> </block> </form> 11/16/2018 LING 138/238 Autumn 2004

VoiceXML interpreter Walks through a VXML form in document order Iteratively selecting each item If multiple fields, visit each one in order. Special commands for events 11/16/2018 LING 138/238 Autumn 2004

Another vxml doc (1) noinput> I'm sorry, I didn't hear you. <reprompt/> </noinput> <nomatch> I'm sorry, I didn't understand that. <reprompt/> </nomatch> 11/16/2018 LING 138/238 Autumn 2004

Another vxml doc (2) <form> <block> Welcome to the air travel consultant. </block> <field name="origin"> <prompt> Which city do you want to leave from? </prompt> <grammar type="application/x=nuance-gsl"> [(san francisco) denver (new york) barcelona] </grammar> <filled> <prompt> OK, from <value expr="origin"> </prompt> </filled> </field> 11/16/2018 LING 138/238 Autumn 2004

Another vxml doc (3) <field name="destination"> <prompt> And which city do you want to go to? </prompt> <grammar type="application/x=nuance-gsl"> [(san francisco) denver (new york) barcelona] </grammar> <filled> <prompt> OK, to <value expr="destination"> </prompt> </filled> </field> <field name="departdate" type="date"> <prompt> And what date do you want to leave? </prompt> <prompt> OK, on <value expr="departdate"> </prompt> 11/16/2018 LING 138/238 Autumn 2004

Another vxml doc (4) <block> <prompt> OK, I have you are departing from <value expr="origin”> to <value expr="destination”> on <value expr="departdate"> </prompt> send the info to book a flight... </block> </form> 11/16/2018 LING 138/238 Autumn 2004

A mixed initiative VXML doc Mixed initiative: user might answer a different question So VoiceXML interpreter can’t just evaluate each field of form in order User might answer field2 when system asked field1 So need grammar which can handle all sorts of input: Field1 Field2 Field 1 and field 2 etc 11/16/2018 LING 138/238 Autumn 2004

VXML Nuance-style grammars Rewrite rules Wantsentence -> I want to (fly|go) Nuance VXML format is: () for concatenation, [] for disjunction Each rule has a name: Wantsentence (I want to [fly go]) Airports [(san francisco) denver] 11/16/2018 LING 138/238 Autumn 2004

Mixed-init VXML example (3) <noinput> I'm sorry, I didn't hear you. <reprompt/> </noinput> <nomatch> I'm sorry, I didn't understand that. <reprompt/> </nomatch> <form> <grammar type="application/x=nuance-gsl"> <![ CDATA[ 11/16/2018 LING 138/238 Autumn 2004

Grammar Flight ( ?[ (i [wanna (want to)] [fly go]) (i'd like to [fly go]) ([(i wanna)(i'd like a)] flight) ] [ ( [from leaving departing] City:x) {<origin $x>} ( [(?going to)(arriving in)] City:x) {<dest $x>} ( [from leaving departing] City:x [(?going to)(arriving in)] City:y) {<origin $x> <dest $y>} ?please ) 11/16/2018 LING 138/238 Autumn 2004

Grammar City [ [(san francisco) (s f o)] {return( "san francisco, california")} [(denver) (d e n)] {return( "denver, colorado")} [(seattle) (s t x)] {return( "seattle, washington")} ] ]]> </grammar> 11/16/2018 LING 138/238 Autumn 2004

Grammar <initial name="init"> <prompt> Welcome to the air travel consultant. What are your travel plans? </prompt> </initial> <field name="origin"> <prompt> Which city do you want to leave from? </prompt> <filled> <prompt> OK, from <value expr="origin"> </prompt> </filled> </field> 11/16/2018 LING 138/238 Autumn 2004

Grammar <field name="dest"> <prompt> And which city do you want to go to? </prompt> <filled> <prompt> OK, to <value expr="dest"> </prompt> </filled> </field> <block> <prompt> OK, I have you are departing from <value expr="origin"> to <value expr="dest">. </prompt> send the info to book a flight... </block> </form> 11/16/2018 LING 138/238 Autumn 2004

Grounding and Confirmation Dialogue is a collective act performed by speaker and hearer Common ground: set of things mutually believed by both speaker and hearer Need to achieve common ground, so hearer must ground or acknowledge speakers utterance. Clark (1996): Principle of closure. Agents performing an action require evidence, sufficient for current purposes, that they have succeeded in performing it 11/16/2018 LING 138/238 Autumn 2004

Clark and Schaefer: Grounding Continued attention: B continues attending to A Relevant next contribution: B starts in on next relevant contribution Acknowledgement: B nods or says continuer like uh-huh, yeah, assessment (great!) Demonstration: B demonstrates understanding A by paraphrasing or reformulating A’s contribution, or by collaboratively completing A’s utterance Display: B displays verbatim all or part of A’s presentation 11/16/2018 LING 138/238 Autumn 2004

11/16/2018 LING 138/238 Autumn 2004

Grounding examples Display: Acknowledgement C: I need to travel in May A: And, what day in May did you want to travel? Acknowledgement C: He wants to fly from Boston A: mm-hmm C: to Baltimore Washington International 11/16/2018 LING 138/238 Autumn 2004

Grounding Examples (2) Acknowledgement + next relevant contribution And, what day in May did you want to travel? And you’re flying into what city? And what time would you like to leave? 11/16/2018 LING 138/238 Autumn 2004

Grounding and Dialogue Systems Grounding is not just a tidbit about humans Is key to design of conversational agent Why? 11/16/2018 LING 138/238 Autumn 2004

Grounding and Dialogue Systems Grounding is not just a tidbit about humans Is key to design of conversational agent Why? HCI researchers find users of speech-based interfaces are confused when system doesn’t give them an explicit acknowedgement signal Experiment with this 11/16/2018 LING 138/238 Autumn 2004

Confirmation Another reason for grounding Speech is a pretty errorful channel Hearer could misinterpret the speaker This is important in Conv. Agents Since we are using ASR, which is still really buggy. So we need to do lots of grounding and confirmation 11/16/2018 LING 138/238 Autumn 2004

Explicit confirmation S: Which city do you want to leave from? U: Baltimore S: Do you want to leave from Baltimore? U: Yes 11/16/2018 LING 138/238 Autumn 2004

Explicit confirmation U: I’d like to fly from Denver Colorado to New York City on September 21st in the morning on United Airlines S: Let’s see then. I have you going from Denver Colorado to New York on September 21st. Is that correct? U: Yes 11/16/2018 LING 138/238 Autumn 2004

Implicit confirmation: display U: I’d like to travel to Berlin S: When do you want to travel to Berlin? U: Hi I’d like to fly to Seattle Tuesday morning S: Traveling to Seattle on Tuesday, August eleventh in the morning. Your name? 11/16/2018 LING 138/238 Autumn 2004

Implicit vs. Explicit Complementary strengths Explicit: easier for users to correct systems’s mistakes (can just say “no”) But explicit is cumbersome and long Implicit: much more natural, quicker, simpler (if system guesses right). 11/16/2018 LING 138/238 Autumn 2004

Implicit and Explicit Early systems: all-implicit or all-explicit Modern systems: adaptive How to decide? ASR system can give confidence metric. This expresses how convinced system is of its transcription of the speech If high confidence, use implicit confirmation If low confidence, use explicit confirmation 11/16/2018 LING 138/238 Autumn 2004

Next Lecture Dialogue acts More on VXML More on design of dialogue agents Evaluation of dialogue agents Don’t forget to look at the homework early!!!! 11/16/2018 LING 138/238 Autumn 2004