Asking Questions to Limited Domain Virtual Characters: How Good Does Speech Recognition Have to Be? Dr. Anton Leuski, 2LT Brandon Kennedy, Ronak Patel,

Slides:



Advertisements
Similar presentations
BLR’s Human Resources Training Presentations
Advertisements

Oral Presentations.
My Story Robbie. My story begins……. My stuttering first became a problem in 6 th grade when I started noticing it My stuttering first became a problem.
CRITICAL THINKING The Discipline The Skill The Art.
Linda Gask University of Manchester. Problem-Based interviewing a model Development by Art Lesser in Canada in 1980s. One of several models!
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Teaching Comprehension in the early grades Leecy Wise
Everything you need to know in order to set up your Reader’s Notebook
Evaluating Human-Machine Conversation for Appropriateness David Benyon, Preben Hansen, Oli Mival and Nick Webb.
CS147 - Terry Winograd - 1 Lecture 14 – Agents and Natural Language Terry Winograd CS147 - Introduction to Human-Computer Interaction Design Computer Science.
MITRE © 2001 The MITRE Corporation. ALL RIGHTS RESERVED. What Works, What Doesn’t -- And What Needs to Work Lynette Hirschman Information Technology Center.
What can humans do when faced with ASR errors? Dan Bohus Dialogs on Dialogs Group, October 2003.
User Interface Testing. Hall of Fame or Hall of Shame?  java.sun.com.
Heuristic Evaluation IS 485, Professor Matt Thatcher.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
Presented by Zeehasham Rasheed
Focus Groups for the Health Workforce Retention Study.
Artificial Intelligence. Agenda StartEnd Introduction AI Future Recent Developments Turing Test Turing Test Evaluation.
How the Social Studies Interns are Viewed by their Mentors Going Public Presentation Mike Broda, Mark Helmsing, Chris Kaiser, and Claire Yates.
TELEPHONE INTERVIEWS : Telephone Interviews are very popular in modern fast work culture. Telephone interviews are often conducted by employers in the.
How Do I Find a Job to Apply to?
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
LREC Combining Multiple Models for Speech Information Retrieval Muath Alzghool and Diana Inkpen University of Ottawa Canada.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Beyond Usability: Measuring Speech Application Success Silke Witt-Ehsani, PhD VP, VUI Design Center TuVox.
Performance Assessments (or PBL) J.H. MacMillan (2011) Classroom Assessment – Principles and Practice for Effective Standards-Based Instruction, 5th ed.,
Performance Assessments. Students construct responses and knowledge Create products, or perform demonstrations to provide evidence of their knowledge.
OB : Building Effective Interviewing Skills Building Effective Interviewing Skills Structure Objectives Basic Design Content Areas Questions Interview.
A study on Prediction on Listener Emotion in Speech for Medical Doctor Interface M.Kurematsu Faculty of Software and Information Science Iwate Prefectural.
What’s Next? Life After High School
Author: James Allen, Nathanael Chambers, etc. By: Rex, Linger, Xiaoyi Nov. 23, 2009.
Generic Approaches to Model Validation Presented at Growth Model User’s Group August 10, 2005 David K. Walters.
Socratic Seminars EXPECTATIONS FOR A SUCCESSFUL DISCUSSION.
Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.
Informal transactional letter
Lecture-3.
How Solvable Is Intelligence? A brief introduction to AI Dr. Richard Fox Department of Computer Science Northern Kentucky University.
1 CS 2710, ISSP 2610 Foundations of Artificial Intelligence introduction.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Are you ready to play…. Deal or No Deal? Deal or No Deal?
Dialog Management for Rapid-Prototyping of Speech-Based Training Agents Victor Hung, Avelino Gonzalez, Ronald DeMara University of Central Florida.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
Cover Letter YOUTH CENTRAL – Cover Letters & Templates
1 CRANDEM: Conditional Random Fields for ASR Jeremy Morris 11/21/2008.
HITIQA: Scenario Based Question Answering Tomek Strzalkowski, et al The State University of New York at Albany Paul Kantor, et al Rutgers University Boris.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Copyright © 2013 by Educational Testing Service. All rights reserved. Evaluating Unsupervised Language Model Adaption Methods for Speaking Assessment ShaSha.
U.S. Army Research Institute How to Train Deployed Soldiers: New Advances in Interactive Multimedia Instruction Mr. Scott Shadrick Dr. James Lussier ARI.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Supporting Comprehension through Text-Based Discussion EDC423.
DECISION MAKING. GET READY FOR CLASS Pick up – Lesson 3: Decision Making Take out your homework assignment from last time. – Make sure everything is completed.
Questioning as Formative Assessment: GRECC Math Alliance February 4 th - 7 th, 2008.
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
TKT COURSE SUMMARY UNIT –14 Differences between l1 and l2 learning learners characteristics LEARNER NEEDS DIANA OLIVA VALDÉS RAMÍREZ.
SERIOUS GAMES AND THE MILITARY Presented by Dana Elhertani.
Soliciting Reader Contributions to Software Tutorials
School of Computer Science & Engineering
Chapter 6. Data Collection in a Wizard-of-Oz Experiment in Reinforcement Learning for Adaptive Dialogue Systems by: Rieser & Lemon. Course: Autonomous.
BECOMING AN ELEMENTARY TEACHER
Learning to Sportscast: A Test of Grounded Language Acquisition
CSCI 5832 Natural Language Processing
Great News! You got an interview!
Demand-High Teaching.
A User study on Conversational Software
Article of the Week – A.o.W.
Presentation transcript:

Asking Questions to Limited Domain Virtual Characters: How Good Does Speech Recognition Have to Be? Dr. Anton Leuski, 2LT Brandon Kennedy, Ronak Patel, Dr. David Traum

Outline Question-Answering Characters Sgt Blackwell Answer selection mechanisms Research questions –How good are the responses? –What is the impact of imperfect ASR? Experiment and Results Summary, Future Work, & Final Thoughts

Question-answering characters Q&A dialogue –Focus on information and social interaction Simulate person answering question, e.g.: –From reporter –From interviewer –From police interrogator –Different from Question-answering system Give appropriate answer rather than correct information –Different from believable characters Focus on simulation of question-answering process rather than Turing test Uses for Q&A characters –Simulation –Training –Games

Examples of ICT Question-answering Characters Be a Reporter C3IT/TACQ: Raed Sgt Blackwell

Technology demo for ASC 24 Highlights: –Life-sized, mixed reality Trans-screen –High-production quality Rendering (> 60K polygons) Voice Authored Text Robust responsiveness –Speech recognition –speech and non-verbal reply –Limited domain of interaction: responding to interview/Q&A

Sgt Blackwell Video

Virtual Character Creation: Data-driven method 1.Collect data (questions) a)Scripted b)Paraphrases c)Wizard of Oz d)System 2.Annotate data – Pick appropriate answers – Rate level of appropriateness 3.Train statistical algorithms 4.Integrate in system 5.Evaluate = (1d)

WOZ Data collection

Sgt Blackwell Dialogue Model Set of pre-constructed answers –In domain –Off-topic –Prompt Local history IR-based classification –Given (possibly previously unseen) question, map to best answer

Methods for Computing Responses to Questions Classification-Response Selection Extraction-Template selection- template filling Parsing/Interpretation - Inference - Generation

Some Word-based Classification Approaches Key-word spotting Bayesian Classification Latent Semantic Analysis (LSA) Support-Vector Machines (SVM) Relevance Model Retrieval Cross-language Relevance Model

Relevance Model Relevance Model: P(w|R) - prob that a random word from appropriate answer is w. P(w|R) ~apprx P(w|Q) Estimate P(w|Q) - prob observing word w in an answer given question Estimate P(w|A) - prob observing word w in an answer given answer Compare two probabilities

Estimate P(w|Q) v. 1 Approach 1: consider answers as class labels (ignore content of answer) combine together all questions for an answer into a pseudo-answer Compare a test question to each pseudo-answer and select the best match

Estimate P(w|Q) v. 2 Approach 2: Answer text matters! Questions and answers are two different languages

Sgt Blackwell Evaluation Questions: 1.How good are the responses to expected questions? 2.How good are the responses to open questions? 3.How much do speech recognition errors affect performance?

Experiment Data: –20 users asked Sgt Blackwell 20 questions each: 10 given, in domain 10 composed by user Metrics –Accuracy of Speech recognizer & classifier –Appropriateness of replies (including to unseen and out of domain questions) Answers rated for relevance (scale from 1-6)

Example Session:user questions Q Who created you (Who created you) Tell you the truth, I have spent a lot of time with the smart folks at ICT, testing and tweaking. I am about ready to get out there and train with real soldiers. I am thinking first stop is going to be Fort Sill. But my ultimate goal is to get forward deployed... QWhy should we spend money on you (What should we spend money on you) Why do you need me?! What you should be saying is "How did you get along without me?" I will show you how to be a leader, how to make critical decisions under stress... And I am glad to be here, because I can help. I know I can. Look... I have one mission: to make better soldiers, make better warfighters. And if I can... can save one life... bring one loved one back to his family... Well, then, I am definitely worth it. You be the judge. QWhat's your value (What's the value in) I can do "Virtually" anything. I do mission rehearsals, Urban Combat training, language training - I speak Farsi, Pashtun and Arabic. "Mar Haba shabab! That means how are you in Arabic. QDo you really work or is this just a demo (G. really worker is this just a demo) Sorry. That's outside my AO. QHow could you be better (How could you be better) Stop mumbling. Just kidding. I didn't get that. QWhat do you wish you could do (When you wish you could do) Career Army. That is what I am thinking. Just call me a lifer. QWhat kind of people have you trained (What other people in the turn in) I teach cultural awareness, function as a training mentor and can call in the full range of Joint fires and effects. QWhat's the hardest part of your job (He was the hardest are you job) I am not authorized to comment on that QYou have any virtual friends (The universal friends) Sorry, I can't hear you QAre you hard of hearing (Are you had a hearing) I can't understand you.

Response coherence coding Gandhe et al ASC Response is not related in any way to the question 2.Response contains some discussion of people or objects mentioned in the question, but does not really address the question itself. 3.Response partially addresses the question, but little or no coherence between the question and response. 4.Response does mostly address the question, but with major problems in the coherence between question and response; seems like the response is really addressing a different question than the one asked. 5.Response does address the question, but the transition is somewhat awkward. 6.Response answers the question in a perfectly fluent manner.

Performance on in-domain questions Avg Ans Score ASR WER % data

Performance on user-selected questions Avg Ans Score ASR WER % data

Summary Question Answering Characters How important is Speech recognition accuracy? –Not very getting some correct words is good enough –Even a moderate quality recognizer is good enough, and worth convenience factor of speech

Future Work More use of context –Information transfer –Mood of character New domains –Extended Blackwell (Cooper-Hewitt Museum) –Tactical questioning ELECT BiLAT character Hassan C3IT character Raed

Closing thought: NL Dialogue Processing - what are the best techniques for a task? Keep history Text classification Recorded answers Sgt. Blackwell, RAED BAR, TACQ, C3IT Follow Protocol Information extraction Template- based Radiobot Rule-based reasoning Semantic parsing Statistical & Grammar-based generation MRE, SASO (Doctor Perez) Understand language Manage dialog Generate language