Spoken Dialogue for Intelligent Tutoring Systems: Opportunities and Challenges Diane Litman Computer Science Department Learning Research & Development.

Spoken Dialogue for Intelligent Tutoring Systems: Opportunities and Challenges Diane Litman Computer Science Department Learning Research & Development Center University of Pittsburgh HLT-NAACL 2006

Outline  Motivation and History  The ITSPOKE System and Corpora  Opportunities and Challenges – Performance Evaluation – Affective Reasoning – Discourse Analysis  Summing Up

What is Tutoring? “A one-on-one dialogue between a teacher and a student for the purpose of helping the student learn something.” [Evens and Michael 2006] Human Tutoring Excerpt [Thanks to Natalie Person and Lindsay Sears, Rhodes College]

Intelligent Tutoring Systems  Students who receive one-on-one instruction perform as well as the top two percent of students who receive traditional classroom instruction [Bloom 1984]  Unfortunately, providing every student with a personal human tutor is infeasible – Develop computer tutors instead

Tutorial Dialogue Systems  Why is one-on-one tutoring so effective? “...there is something about discourse and natural language (as opposed to sophisticated pedagogical strategies) that explains the effectiveness of unaccomplished human [tutors].” [Graesser, Person et al. 2001]  Working hypothesis regarding learning gains –Human Dialogue > Computer Dialogue > Text

Spoken Tutorial Dialogue Systems  Most human tutoring involves face-to-face spoken interaction, while most computer dialogue tutors are text-based  Can the effectiveness of dialogue tutorial systems be further increased by using spoken interactions?

A Brief History  1970 – Mid 1980s –SCHOLAR (Carbonell) –WHY (Stevens and Collins) –SOPHIE (Burton and Brown) –Meno-Tutor (Woolf and McDonald) …  Late 1980s - 1990s –CIRCSIM-Tutor (Evens, Michael and Rovick) –SHERLOCK II (Lesgold) –Unix Consultant (Wilensky et al. ) –EDGE (Cawsey) …  Currently… –Why2-AutoTutor (Graesser et al.) (speech synthesis) –Why2-Atlas (VanLehn et al.) –CyclePad (Rose et al.) –Beetle (Moore et al.) –DIAG-NLG (Di Eugenio) –SCoT (Peters et al.)(spoken dialogue) –ITSPOKE (Litman et al.) …(spoken dialogue)

Potential Benefits of Speech: I  Self-explanation correlates with learning [Chi et al. 1994] and occurs more in speech [Hausmann and Chi 2002] –Tutor: The right side pumps blood to the lungs, and the left side pumps blood to the other parts of the body. Could you explain how that works? –Student 1 (self-explains): So the septum is a divider so that the blood doesn't get mixed up. So the right side is to the lungs, and the left side is to the body. So the septum is like a wall that divides the heart into two parts...it kind of like separates it so that the blood doesn't get mixed up... –Student 2 (doesn’t self-explain): right side pumps blood to lungs

Potential Benefits of Speech: II  Speech contains prosodic information, providing new sources of information about the student for dialogue adaptation [Fox 1993; Litman and Forbes-Riley 2003; Pon-Barry et al. 2005]  A correct but uncertain student turn –ITSPOKE: How does his velocity compare to that of his keys? –STUDENT: his velocity is constant

Potential Benefits of Speech: III  Spoken computational environments may foster social relationships that may enhance learning –AutoTutor [Graesser et al. 2003]

Potential Benefits of Speech: IV Some applications inherently involve spoken language –Spoken Conversational Interface for Language Learning [Thanks to Stephenie Seneff, MIT and Cambridge] –Reading Tutors [Mostow, Cole] Others require hands-free interaction –Circuit Fix-It Shop [Smith 1992]

Why Should NLP Researchers Care?  Many reasons why tutoring researchers are interested in spoken dialogue  Why should spoken dialogue researchers become interested in tutoring? –Tutoring applications differ in many ways from typical spoken dialogue applications –Opportunities and Challenges!

Back-end is Why2-Atlas system [VanLehn et al. 2002] Sphinx2 speech recognition and Cepstral text-to-speech

Two Types of Tutoring Corpora  Human Tutoring –14 students / 128 dialogues (physics problems) –5948 student turns, 5505 tutor turns  Computer Tutoring –ITSPOKE v1 »20 students / 100 dialogues »2445 student turns, 2967 tutor turns –ITSPOKE v2 » 57 students / 285 dialogues » both synthesized and pre-recorded tutor voices

ITSPOKE Experimental Procedure  College students without physics –Read a small background document –Took a multiple-choice Pretest –Worked 5 problems (dialogues) with ITSPOKE –Took an isomorphic Posttest  Goal was to optimize Learning Gain – e.g., Posttest – Pretest

Predictive Performance Modeling  Opportunity –Spoken dialogue system evaluation methodologies can improve our understanding of how dialogue facilitates student learning [Forbes-Riley and Litman 2006]  Challenges – How to measure system performance? – What are predictive interaction parameters?

Predictive Performance Modeling  Understand why a spoken dialogue system fails or succeeds  PARADISE [Walker et al. 1997] –Measure parameters (interaction costs and benefits) and performance in a system corpus –Train model via multiple linear regression over parameters, predicting performance System Performance = ∑ w i * p i –Test model on new corpus –Predict performance during future system design n i=1

Challenges  System Performance –Prior evaluations used User Satisfaction –Is Student Learning more relevant for the tutoring domain?  Interaction Parameters –Prior applications used Generic parameters –Are Task-Specific and Affective parameters also useful?

Findings  Using PARADISE to predict Learning –Posttest =.86 * Time +.65 * Pretest -.54 * #Neutrals  Useful Predictors – Traditional parameters » e.g., Elapsed Time, Dialogue and Turn Length – New parameters » e.g., Affect, Correctness

Contrasts with Non-Tutorial Dialogue  User Satisfaction models are less useful –Tutoring systems are not designed to maximize User Satisfaction  Interaction parameters for learning –Posttest =.86 * Time +.65 * Pretest -.54 * #Neutrals » longer dialogues are better » speech recognition problems don’t seem to matter » lack of some types of affect is bad

Detecting and Responding to Student Affective States  Opportunity –Affective spoken dialogue system technology can improve student learning and other measures of performance [Aist et al. 2002; Pon-Barry et al. 2006]  Challenges – What to detect? – How to respond? – Pedagogical versus spoken dialogue performance?

Monitoring Student State (motivation) Tutor26: Let me ask my question in a more general way. What happens to the velocity of a body when there is no force acting on it? Student27:dammit (ASR: it is) Tutor28 :Could you please repeat that? Student29 :same (ASR: i same) Tutor30 :Right. And what happens to the HORIZONTAL velocity of a body when there is no HORIZONTAL force acting on it? Student31:zero (ASR: the zero) Tutor32 :Well... OK, let's consider how forces in the vertical direction affect the motion of a body in a horizontal direction in a different scenario Student33:oh gravity you already said this (ASR: beats gravity you're exert less) Tutor34:Fine. Are there any other forces acting on the apple as it falls? Student35:no why are you doing this again (ASR: no y and to it yes) Tutor36:Correct. The only force acting on the apple as it falls to the earth is the earth's gravitational force. In which direction does gravitational force act? Student37:downward you computer (ASR: downward you computer)

Affective Spoken Dialogue Systems: Standard Methodology  Manual Annotation of Affect and Attitudes –Naturally-occurring spoken dialogue data [Ang et al. 2002; Lee et al. 2002; Batliner et al. 2003; Devillers et al. 2003; Shafran et al. 2003; Liscombe et al. 2005]  Prediction via Machine Learning –Automatically extract features from user turns –Use different feature sets (e.g. prosodic, lexical) to predict affect –Significant reduction of baseline error

Challenge 1: What “emotions” to detect?  Communicator and Customer Care Systems –Negative: Angry, Annoyed, Frustrated, Tired –Positive/Neutral: Amused, Cheerful, Delighted, Happy, Serious [Ang et al. 2002; Shafran et al. 2003; Lee and Narayanan 2005; Liscombe et al. 2005]

Challenge 1: What “emotions” to detect?  Communicator and Customer Care Systems –Negative: Angry, Annoyed, Frustrated, Tired –Positive/Neutral: Amused, Cheerful, Delighted, Happy, Serious [Ang et al. 2002; Shafran et al. 2003; Lee and Narayanan 2005; Liscombe et al. 2005]  Tutorial Dialogue Systems –Negative: Angry, Annoyed, Frustrated, Bored, Confused, Uncertain, Contempt, Disgusted, Sad –Positive/Neutral: Certain, Curious, Enthusiastic, Eureka [Litman and Forbes-Riley 2006, D’Mello et al. 2006]

Example Student States in ITSPOKE ITSPOKE: What else do you need to know to find the box‘s acceleration? Student: the direction [UNCERTAIN] ITSPOKE : If you see a body accelerate, what caused that acceleration? Student: force [CERTAIN] ITSPOKE : Good job. Say there is only one force acting on the box. How is this force, the box's mass, and its acceleration related? Student: velocity [UNCERTAIN] ITSPOKE : Could you please repeat that? Student: velocity [ANNOYED]

Challenge 2: How to respond? In tutoring, not all negatively-valenced emotions are bad! –While frustration/anger/annoyance is often frustrating… –Frustration can also be an opportunity to learn Example from AutoTutor – neutral  flow  confusion  frustration  neutral [Thanks to Sidney D‘Mello and Arthur Graesser, University of Memphis]

Example from ITSPOKE ITSPOKE: What is the net force acting on the truck equal to? Student: I don’t know [Uncertain, Frustrated] ITSPOKE: Since net force is defined as the vector sum of all the forces exerted on an object, let's first find the forces acting on the truck. Try to name them. Student: The impact force, the normal force, and gravity [Certain, Neutral] ITSPOKE: Yep.

Uncertainty is also a Learning Opportunity  Uncertainty represents one type of learning impasse [VanLehn et al. 2003]: An impasse motivates a student to take an active role in constructing a better understanding of the principle.  Uncertainty is also associated with cognitive disequilibrium [Craig et al. 2004]: A state of failed expectations causing deliberation aimed at restoring equilibrium –Uncertainty positively correlates with learning

Do Human Tutors Respond to Student Uncertainty?  A data-driven method for designing dialogue systems adaptive to student state [Forbes-Riley and Litman 2005] –extraction of “dialogue bigrams” from annotated human tutoring corpora –χ 2 analysis to identify dependent bigrams –generalizable to any domain with corpora labeled for user state and system response

Example Human Tutoring Excerpt S: So the- when you throw it up the acceleration will stay the same? [Uncertain] T: Acceleration uh will always be the same because there is- that is being caused by force of gravity which is not changing. [Restatement, Expansion] S: mm-k. [Neutral] T: Acceleration is– it is in- what is the direction uh of this acceleration- acceleration due to gravity? [Short Answer Question] S: It’s- the direction- it’s downward. [Certain] T: Yes, it’s vertically down. [Positive Feedback, Restatement]

Bigram Dependency Analysis EXPECTED Tutor IncludePos Tutor OmitsPos neutral439.462329.54 certain175.21928.79 uncertain129.51686.49 mixed36.82195.18 OBSERVED Tutor IncludesPos Tutor OmitsPos neutral2522517 certain273832 uncertain185631 mixed71161 χ2 = 225.92 (critical χ2 value at p =.001 is 16.27) - “Student Certainness – Tutor Positive Feedback” Bigrams

Bigram Dependency Analysis (cont.) EXPECTED Includes Pos Omits Pos neutral439.462329.54 OBSERVED Includes Pos Omits Pos neutral2522517 - Less Tutor Positive Feedback after Student Neutral turns

Bigram Dependency Analysis (cont.) EXPECTED Includes Pos Omits Pos neutral439.462329.54 certain175.21928.79 uncertain129.51686.49 mixed36.82195.18 OBSERVED Includes Pos Omits Pos neutral2522517 certain273832 uncertain185631 mixed71161 - Less Tutor Positive Feedback after Student Neutral turns - More Tutor Positive Feedback after “Emotional” turns

Findings  Statistically significant dependencies exist between students’ state of certainty and the responses of an expert human tutor –After uncertain, tutor Bottoms Out and avoids expansions –After certain, tutor Restates –After mixed, tutor Hints –After any emotion, tutor increases Feedback  Dependencies suggest adaptive strategies for implementation in computer tutoring systems

Challenge 3: Pedagogical versus spoken dialogue performance?  Negative user emotions (e.g. frustration) are often associated with speech recognition problems [Boozer et al. 2003; Goldberg et al. 2003] –Is this also true in tutoring?  Speech recognition problems negatively correlate with user satisfaction [Walker et al. 2002, Pon-Barry et al. 2006] –Is this also true for learning?

Findings  Statistically significant dependencies exist between student state and speech recognition problems [Rotaru and Litman 2006] –Frustrated/Angry turns are rejected more than expected –Uncertain turns have more problems than expected (certain turns have less) –Incorrect turns have more problems than expected (correct turns have less)  Learning opportunities (e.g. uncertain and incorrect student states) have more speech recognition problems –However, speech recognition problems have not negatively correlated with learning [Litman and Forbes-Riley 2005, Pon-Barry et al. 2005]

Discourse Structure  Opportunity –Dialogues with tutoring systems have more complex hierarchical discourse structures compared to many other types of dialogues  Challenges –How can discourse structure be exploited in the context of spoken dialogue systems?

Exploiting Discourse Structure (Motivation)  Average ITSPOKE dialogue is 20 minutes  Student turns are hierarchically structured –Level 1 : 1350 (57.3%) –Level 2 : 643 (27.3%) –Level 3 : 248 (10.5%) –Levels 4-6 :113 (4.8%)

Discourse structure Annotation and Transitions  Based on the Grosz & Sidner theory of discourse structure –Discourse segment  Discourse segment purpose –Hierarchy of discourse segments  Tutoring information encoded in a hierarchical structure –Human tutor manually authored dialogue paths for ITSPOKE –Automatic traversal of logs places utterances into the structure Q1Q1 Q2Q2 Q3Q3 Q 2.1 Q 2.2

Q1Q1 Q2Q2 Q3Q3 Q 2.1 Q 2.2 ITSPOKE behavior & Discourse structure annotation

Q1Q1 Q2Q2 Q3Q3 Q 2.1 Q 2.2 Discourse structure transitions

Findings  Student correctness is predictive of student learning, but only after particular discourse transitions [Rotaru and Litman 2006] –e.g., After Pops (PopUp, PopUpAdvance) » incorrect turns negatively predict learning » correct turns positively predict learning  Student certainness is more predictive only after particular transitions

Findings (cont.)  While single discourse transitions are not predictive of learning, patterns in the discourse structure are –e.g., Advance-Advance and Push-Push both positively correlate with learning  Statistically significant dependencies exist between discourse transitions and speech recognition – e.g., after both Pushes and Pops, more misrecognitions

Summing Up: I  Spoken Dialogue Systems are of great interest to researchers in Intelligent Tutoring –One-on-one tutoring is a powerful technique for helping students learn –Natural language dialogue contributes in a powerful way to the efficacy of one-on-one-tutoring –Using presently available NLP technology, computer tutors can be built and can serve as a valuable aid to student learning

Summing Up: II  Intelligent Tutoring in turn provides many opportunities and challenges for researchers in Spoken Dialogue Systems –Performance Evaluation –Affective Reasoning –Discourse Analysis

Summing Up: II  Intelligent Tutoring in turn provides many opportunities and challenges for researchers in Spoken Dialogue Systems –Performance Evaluation –Affective Reasoning –Discourse Analysis –and many more! »Initiative, Cohesion/Coherence, Dialogue Acts, Turn-Taking, Reinforcement Learning, User Simulation, Question-Answering

Acknowledgements  ITSPOKE group –Hua Ai, Kate Forbes-Riley, Alison Huettner, Beatriz Maeireizo-Tokeshi, Greg Nicholas, Amruta Purandare, Mihai Rotaru, Scott Silliman, Joel Tetrault, Art Ward –Columbia Collaborators: Julia Hirschberg, Jackson Liscombe, Jennifer Venditti  NLP@Pitt –Jan Wiebe, Rebecca Hwa, Wendy Chapman, Paul Hoffmann, Behrang Mohit, Carol Nichols, Swapna Somasundaran, Theresa Wilson, Chenhai Xi  Why2-Atlas and Human Tutoring groups –Kurt Vanlehn, Pam Jordan, Uma Pappuswamy, Carolyn Rose –Micki Chi, Scotty Craig, Bob Hausmann, Margueritte Roy  Art Graesser, Natalie Person, Sidney D’Mello, Lindsay Sears  Stephenie Seneff  Martha Evens

Thank You!  Questions?  Further Information –http://www.cs.pitt.edu/~litman/itspoke.html  And in September, come to Pittsburgh for Interspeech 2006!

Spoken Dialogue for Intelligent Tutoring Systems: Opportunities and Challenges Diane Litman Computer Science Department Learning Research & Development.

Similar presentations

Presentation on theme: "Spoken Dialogue for Intelligent Tutoring Systems: Opportunities and Challenges Diane Litman Computer Science Department Learning Research & Development."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Spoken Dialogue for Intelligent Tutoring Systems: Opportunities and Challenges Diane Litman Computer Science Department Learning Research & Development.

Similar presentations

Presentation on theme: "Spoken Dialogue for Intelligent Tutoring Systems: Opportunities and Challenges Diane Litman Computer Science Department Learning Research & Development."— Presentation transcript:

Similar presentations

About project

Feedback