Presentation on theme: "Asking Questions to Limited Domain Virtual Characters: How Good Does Speech Recognition Have to Be? Dr. Anton Leuski, 2LT Brandon Kennedy, Ronak Patel,"— Presentation transcript:
Asking Questions to Limited Domain Virtual Characters: How Good Does Speech Recognition Have to Be? Dr. Anton Leuski, 2LT Brandon Kennedy, Ronak Patel, Dr. David Traum
Outline Question-Answering Characters Sgt Blackwell Answer selection mechanisms Research questions –How good are the responses? –What is the impact of imperfect ASR? Experiment and Results Summary, Future Work, & Final Thoughts
Question-answering characters Q&A dialogue –Focus on information and social interaction Simulate person answering question, e.g.: –From reporter –From interviewer –From police interrogator –Different from Question-answering system Give appropriate answer rather than correct information –Different from believable characters Focus on simulation of question-answering process rather than Turing test Uses for Q&A characters –Simulation –Training –Games
Examples of ICT Question-answering Characters Be a Reporter C3IT/TACQ: Raed Sgt Blackwell
Technology demo for ASC 24 Highlights: –Life-sized, mixed reality Trans-screen –High-production quality Rendering (> 60K polygons) Voice Authored Text Robust responsiveness –Speech recognition –speech and non-verbal reply –Limited domain of interaction: responding to interview/Q&A
Virtual Character Creation: Data-driven method 1.Collect data (questions) a)Scripted b)Paraphrases c)Wizard of Oz d)System 2.Annotate data – Pick appropriate answers – Rate level of appropriateness 3.Train statistical algorithms 4.Integrate in system 5.Evaluate = (1d)
Sgt Blackwell Dialogue Model Set of pre-constructed answers –In domain –Off-topic –Prompt Local history IR-based classification –Given (possibly previously unseen) question, map to best answer
Methods for Computing Responses to Questions Classification-Response Selection Extraction-Template selection- template filling Parsing/Interpretation - Inference - Generation
Some Word-based Classification Approaches Key-word spotting Bayesian Classification Latent Semantic Analysis (LSA) Support-Vector Machines (SVM) Relevance Model Retrieval Cross-language Relevance Model
Relevance Model Relevance Model: P(w|R) - prob that a random word from appropriate answer is w. P(w|R) ~apprx P(w|Q) Estimate P(w|Q) - prob observing word w in an answer given question Estimate P(w|A) - prob observing word w in an answer given answer Compare two probabilities
Estimate P(w|Q) v. 1 Approach 1: consider answers as class labels (ignore content of answer) combine together all questions for an answer into a pseudo-answer Compare a test question to each pseudo-answer and select the best match
Estimate P(w|Q) v. 2 Approach 2: Answer text matters! Questions and answers are two different languages
Sgt Blackwell Evaluation Questions: 1.How good are the responses to expected questions? 2.How good are the responses to open questions? 3.How much do speech recognition errors affect performance?
Experiment Data: –20 users asked Sgt Blackwell 20 questions each: 10 given, in domain 10 composed by user Metrics –Accuracy of Speech recognizer & classifier –Appropriateness of replies (including to unseen and out of domain questions) Answers rated for relevance (scale from 1-6)
Example Session:user questions Q Who created you (Who created you) Tell you the truth, I have spent a lot of time with the smart folks at ICT, testing and tweaking. I am about ready to get out there and train with real soldiers. I am thinking first stop is going to be Fort Sill. But my ultimate goal is to get forward deployed... QWhy should we spend money on you (What should we spend money on you) Why do you need me?! What you should be saying is "How did you get along without me?" I will show you how to be a leader, how to make critical decisions under stress... And I am glad to be here, because I can help. I know I can. Look... I have one mission: to make better soldiers, make better warfighters. And if I can... can save one life... bring one loved one back to his family... Well, then, I am definitely worth it. You be the judge. QWhat's your value (What's the value in) I can do "Virtually" anything. I do mission rehearsals, Urban Combat training, language training - I speak Farsi, Pashtun and Arabic. "Mar Haba shabab! That means how are you in Arabic. QDo you really work or is this just a demo (G. really worker is this just a demo) Sorry. That's outside my AO. QHow could you be better (How could you be better) Stop mumbling. Just kidding. I didn't get that. QWhat do you wish you could do (When you wish you could do) Career Army. That is what I am thinking. Just call me a lifer. QWhat kind of people have you trained (What other people in the turn in) I teach cultural awareness, function as a training mentor and can call in the full range of Joint fires and effects. QWhat's the hardest part of your job (He was the hardest are you job) I am not authorized to comment on that QYou have any virtual friends (The universal friends) Sorry, I can't hear you QAre you hard of hearing (Are you had a hearing) I can't understand you.
Response coherence coding Gandhe et al ASC 2004 1.Response is not related in any way to the question 2.Response contains some discussion of people or objects mentioned in the question, but does not really address the question itself. 3.Response partially addresses the question, but little or no coherence between the question and response. 4.Response does mostly address the question, but with major problems in the coherence between question and response; seems like the response is really addressing a different question than the one asked. 5.Response does address the question, but the transition is somewhat awkward. 6.Response answers the question in a perfectly fluent manner.
Performance on in-domain questions Avg Ans Score ASR WER % data
Performance on user-selected questions Avg Ans Score ASR WER % data
Summary Question Answering Characters How important is Speech recognition accuracy? –Not very getting some correct words is good enough –Even a moderate quality recognizer is good enough, and worth convenience factor of speech
Future Work More use of context –Information transfer –Mood of character New domains –Extended Blackwell (Cooper-Hewitt Museum) –Tactical questioning ELECT BiLAT character Hassan C3IT character Raed
Closing thought: NL Dialogue Processing - what are the best techniques for a task? Keep history Text classification Recorded answers Sgt. Blackwell, RAED BAR, TACQ, C3IT Follow Protocol Information extraction Template- based Radiobot Rule-based reasoning Semantic parsing Statistical & Grammar-based generation MRE, SASO (Doctor Perez) Understand language Manage dialog Generate language