Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Tutorial Dialogue System that Adapts to Student Uncertainty Diane Litman Computer Science Department & Intelligent Systems Program & Learning Research.

Similar presentations


Presentation on theme: "A Tutorial Dialogue System that Adapts to Student Uncertainty Diane Litman Computer Science Department & Intelligent Systems Program & Learning Research."— Presentation transcript:

1 A Tutorial Dialogue System that Adapts to Student Uncertainty Diane Litman Computer Science Department & Intelligent Systems Program & Learning Research and Development Center

2 Outline  Motivation  The ITSPOKE System and Corpora  Detecting and Adapting to Student Uncertainty (joint work with Kate Forbes-Riley) – Uncertainty Detection and Adaptation – Experimental Evaluation »Wizard-of-Oz »Fully-Automated  Summing Up

3 Tutorial Dialogue Systems  Why is one-on-one tutoring so effective? “...there is something about discourse and natural language (as opposed to sophisticated pedagogical strategies) that explains the effectiveness of unaccomplished human [tutors].” [Graesser, Person et al. 2001]  Goal: improve Intelligent Tutoring Systems using Natural Language Processing

4 More generally... Natural Language Processing and Tools for Learning

5 More generally... Natural Language Processing and Tools for Learning Learning Language (reading, writing, speaking) Tutors Scoring

6 More generally... Natural Language Processing and Tools for Learning Learning Language (reading, writing, speaking) Using Language (to teach everything else) Tutors Scoring Conversational Tutors / Peers CSCL

7 More generally... Natural Language Processing and Tools for Learning Learning Language (reading, writing, speaking) Using Language (to teach everything else) Tutors Scoring Readability Processing Language Conversational Tutors / Peers CSCL Discourse Coding Lecture Retrieval Questioning & Answering

8 Outline  Motivation  The ITSPOKE System and Corpora  Detecting and Adapting to Student Uncertainty – Uncertainty Detection and Adaptation – Experimental Evaluation  Summing Up

9 ITSPOKE: Intelligent Tutoring Spoken Dialogue System  Back-end is Why2-Atlas [VanLehn, Jordan, Rose et al. 2002]  Speech Enhanced – Sphinx2 speech recognition – Cepstral text-to-speech  Reimplemented, other changes

10 10

11 ITSPOKE Corpora  Wizard Tutoring (ITSPOKE-WOZ) –81 students / 405 dialogues –human performs speech recognition, semantic analysis –computer performs dialogue management  Computer Tutoring (ITSPOKE-AUTO) –72 students / 360 dialogues

12 Experimental Procedure  College students without physics –Read a small background document –Took a multiple-choice Pretest –Worked 5 problems (dialogues) with ITSPOKE –Took an isomorphic Posttest  Goal was to optimize Learning Gain – e.g., Posttest – Pretest

13 Outline  Motivation  The ITSPOKE System and Corpora  Detecting and Adapting to Student Uncertainty – Uncertainty Detection and Adaptation – Experimental Evaluation  Summing Up

14 Why Uncertainty?  Most frequent student state in our dialogue corpora [Litman and Forbes-Riley 2004]  Focus of other learning sciences, speech and language processing, and psycholinguistic studies [Craig et al. 2004; Liscombe et al. 2005; Pon-Barry et al. 2006; Dijkstra et al. 2006] .73 Kappa [Forbes-Riley et al. 2008]

15 Corpus-Based Detection Methodology  Learn detection models from training corpora –Use spoken language processing to automatically extract features from user turns –Use extracted features (e.g., prosodic, lexical) to predict uncertainty annotations  Evaluate learned models on testing corpora –Significant reduction of error compared to baselines [Litman and Forbes-Riley 2006; Litman et al. 2007]

16 System Adaptation: How to Respond?  Theory-based –[VanLehn et al. 2003; Craig et al. 2004]  Corpus-based –How do humans respond? e.g. [Forbes-Riley, Rotaru, Litman, and Tetreault 2007] * –What are optimal responses? e.g. [Chi, VanLehn and Litman 2010] * * Best paper awards

17 Theory-Based Adaptation: Uncertainty as Learning Opportunity  Uncertainty represents one type of learning impasse, and is also associated with cognitive disequilibrium – An impasse motivates a student to take an active role in constructing a better understanding of the principle. [VanLehn et al. 2003] –A state of failed expectations causing deliberation aimed at restoring equilibrium. [Craig et al. 2004]  Hypothesis: The system should adapt to uncertainty in the same way it responds to other impasses (e.g., incorrectness)

18 Outline  Motivation  The ITSPOKE System and Corpora  Detecting and Adapting to Student Uncertainty – Uncertainty Detection and Adaptation – Experimental Evaluation  Summing Up

19 Adaptation to Student Uncertainty in ITSPOKE  Most systems respond only to (in)correctness  Literature suggests uncertain as well as incorrect student answers signal learning impasses  Experimentally manipulate tutor responses to student uncertainty, over and above correctness, and investigate impact on learning –Platform: Adaptive version(s) of ITSPOKE

20 Normal (non-adaptive) ITSPOKE  System Initiative Dialogue Format: –Tutor Question – Student Answer – Tutor Response  Tutor Response Types: –to Corrects (C): positive feedback (e.g. “Fine”) –to Incorrects (I): negative feedback (e.g. “Well…”) and »Bottom Out: correct answer with reasoning »Subdialogue: questions walk through reasoning

21  Our Prior Work: Rank correctness (C, I) + uncertainty (U, nonU) states in terms of impasse severity State:I+nonUI+UC+UC+nonU Severity:mostlessleastnone Adaptive ITSPOKE

22  Our Prior Work: Rank correctness (C, I) + uncertainty (U, nonU) states in terms of impasse severity State:I+nonUI+UC+UC+nonU Severity:mostlessleastnone  Adaptation Hypothesis: –ITSPOKE already resolves I impasses (I+nonU, I+U), but it ignores one type of U impasse (C+U) –Performance improvement if ITSPOKE provides additional content to resolve all impasses Adaptive ITSPOKE(s)

23  Simple Adaptation –Same response for all 3 impasses –Feedback on only (in)correctness  Complex Adaptation –Different responses for the 3 impasses –Feedback on both uncertainty and (in)correctness Two Uncertainty Adaptations

24 Simple Adaptation Example: C+U TUTOR1: By the same reasoning that we used for the car, what’s the overall net force on the truck equal to? STUDENT1: The force of the car hitting it?? [C+U] TUTOR2: Fine. [FEEDBACK] We can derive the net force on the truck by summing the individual forces on it, just like we did for the car. First, what horizontal force is exerted on the truck during the collision? [SUBDIALOGUE]  Same TUTOR2 subdialogue if student was I+U or I+nonU

25 Experiment 1: ITSPOKE-WOZ  Wizard of Oz version of ITSPOKE –Human recognizes speech, annotates correctness and uncertainty –Provides upper-bound language performance  Conditions –Simple Adaptation: used same response for all impasses –Complex Adaptation: used different responses for each impasse –Normal Control: used original system (no adaptation) –Random Control: gave Simple Adaptation to random 20% of correct answers (to control for additional tutoring)

26 Results I: Learning MetricConditionNMeanDiffp Learning Gain (Posttest – Pretest) Normal Control21.183< Simple Adaptation.03 Random Control20.269- Simple Adaptation20.307- Complex Adaptation20.213- F(3, 77) = 3.275, p = 0.02

27 Results I: Learning MetricConditionNMeanDiffp Learning Gain (Posttest – Pretest) Normal Control21.183< Simple Adaptation.03 Random Control20.269- Simple Adaptation20.307- Complex Adaptation20.213-  Simple Adaptation yields more student learning than Normal Control (original ITSPOKE) [Forbes-Riley and Litman 2010] F(3, 77) = 3.275, p = 0.02

28 Results I: Learning MetricConditionNMeanDiffp Learning Gain (Posttest – Pretest) Normal Control21.183< Simple Adaptation.03 Random Control20.269- Simple Adaptation20.307- Complex Adaptation20.213-  Simple Adaptation yields more student learning than Normal Control (original ITSPOKE) [Forbes-Riley and Litman 2010]  Similar results for learning efficiency [Forbes-Riley and Litman 2009] F(3, 77) = 3.275, p = 0.02

29 Additional Evaluations - Metacognition  Do metacognitive performance measures differ across experimental conditions? –e.g., Monitoring Accuracy [Nietfield et al. 2006]  Do metacognitive and cognitive performance measures (i.e. learning) correlate?

30 Metacognitive Results  Simple (and random) increased monitoring accuracy compared to normal (p <.06 in paired contrasts)  Monitoring Accuracy is positively correlated with learning [Litman and Forbes-Riley 2009]

31 Experiment 2: ITSPOKE-AUTO  Fully automated ITSPOKE –Sphinx2 speech recognizer / TuTalk semantic analyzer »Correctness Accuracy of 85% –Weka uncertainty model »Logistic regression (includes lexical, prosodic, dialogue features) »Uncertainty Accuracy of 80%  Only 3 Conditions –Simple Adaptation –Normal Control –Random Control

32 Preliminary Results: ITSPOKE-AUTO  Simple Adaptation yields more student learning than Normal and Random Controls  Differences only significant for a subset of students  Noisy uncertainty detection is the system bottleneck  3 of the 4 metacognitive metrics remain correlated with learning [Forbes-Riley and Litman, 2010]

33 Current and Future Research  More sophisticated ITSPOKE adaptations –User modeling (domain knowledge, gender) –Multiple student states (disengagement) –Motivation [Ward 2010]  Remediate metacognition, not just domain content

34 Summing Up  Spoken dialogue contributes to the success of human tutors  Using presently available technology, successful tutorial dialogue systems can also be built  Adapting to uncertainty can further improve performance –Learning gains, efficiency, metacognition  Tutors can serve as platforms for learning science studies

35 Related Projects Natural Language Processing and Tools for Learning Learning Language (reading, writing, speaking) Using Language (to teach everything else) Processing Language Conversational Tutors

36 Related Projects Natural Language Processing and Tools for Learning Learning Language (reading, writing, speaking) Using Language (to teach everything else) Processing Language Conversational Tutors Tutor Abstraction and Specialization during Reflective Conversation [Katz/Jordan/Litman poster]

37 Related Projects Natural Language Processing and Tools for Learning Learning Language (reading, writing, speaking) Using Language (to teach everything else) Processing Language Conversational Tutors Tutor Abstraction and Specialization during Reflective Conversation [Katz/Jordan/Litman poster] Semantic Class Acquisition via Web-Learning [Lipschultz/Litman poster]

38 Related Projects Natural Language Processing and Tools for Learning Learning Language (reading, writing, speaking) Using Language (to teach everything else) Processing Language Computer-Supported Peer Review for Writing [Xiong/Litman/Schunn poster]

39 Acknowledgements  ITSPOKE group past and present –Hua Ai, Min Chi, Joanna Drummond, Kate Forbes-Riley, Heather Friedberg, Alison Huettner, Michael Lipschultz, Beatriz Maeireizo-Tokeshi, Greg Nicholas, Amruta Purandare, Mihai Rotaru, Scott Silliman, Joel Tetreault, Art Ward, Wenting Xiong  NLP@Pitt –Jan Wiebe, Rebecca Hwa, Wendy Chapman  Why2-Atlas and Human Tutoring groups –Kurt Vanlehn, Pamela Jordan, Carolyn Rose –Micki Chi, Scotty Craig, Bob Hausmann, Margueritte Roy, Sandra Katz

40 Thank You!  Questions?  Further Information –http://www.cs.pitt.edu/~litman/itspoke.html

41 The End

42 Example Student States in ITSPOKE ITSPOKE: What else do you need to know to find the box‘s acceleration? Student: the direction [UNCERTAIN] ITSPOKE : If you see a body accelerate, what caused that acceleration? Student: force [CERTAIN] ITSPOKE : Good job. Say there is only one force acting on the box. How is this force, the box's mass, and its acceleration related? Student: velocity [UNCERTAIN] ITSPOKE : Could you please repeat that? Student: velocity [ANNOYED]

43 WOZ-TUT Screenshot

44 Bigram Dependency Analysis EXPECTED Tutor IncludePos Tutor OmitsPos neutral439.462329.54 certain175.21928.79 uncertain129.51686.49 mixed36.82195.18 OBSERVED Tutor IncludesPos Tutor OmitsPos neutral2522517 certain273832 uncertain185631 mixed71161 χ2 = 225.92 (critical χ2 value at p =.001 is 16.27) - “Student Certainness – Tutor Positive Feedback” Bigrams

45 Bigram Dependency Analysis (cont.) EXPECTED Includes Pos Omits Pos neutral439.462329.54 OBSERVED Includes Pos Omits Pos neutral2522517 - Less Tutor Positive Feedback after Student Neutral turns

46 Bigram Dependency Analysis (cont.) EXPECTED Includes Pos Omits Pos neutral439.462329.54 certain175.21928.79 uncertain129.51686.49 mixed36.82195.18 OBSERVED Includes Pos Omits Pos neutral2522517 certain273832 uncertain185631 mixed71161 - Less Tutor Positive Feedback after Student Neutral turns - More Tutor Positive Feedback after “Emotional” turns

47 Survey Tutoring Uncertainty Spoken Dialogue

48 Learning Efficiency Results MetricConditionNMeanDiffp Normalized learning gain / total tutoring time in minutes Normal Control21.010< Simple Adapt.004 Random Control20.014- Simple Adaptation20.016- Complex Adaptation20.011< Simple Adapt.013  Given same amount of tutoring time, Simple Adaptation yields more student learning than either Normal Control or Complex Adaptation  Results also hold using raw learning gain, and total number of student turns F(3, 77) = 3.56, p = 0.02

49 Bias CorrectIncorrect NonUncertainCnonUInonU UncertainCUIU Bias scores greater than and less than zero indicate over-confidence and under-confidence, with zero indicating best performance

50 Discrimination CorrectIncorrect NonUncertainCnonUInonU UncertainCUIU Discrimination scores greater than zero indicate higher metacognitive performance, in terms of certainty for correct responses and uncertainty for incorrect responses

51 Results I: Means across Conditions Metacognitive Measure Complex Adaptation (20) Simple Adaptation (20) Random Control (20) Normal Control (21) Average Impasse Severity.59.60.73 Monitoring Accuracy.58.62.52 Bias-.01-.03-.01-.02 Discrimination.34.46.48.41  No statistically significant differences or trends for bias

52 Results I: Means across Conditions Metacognitive Measure Complex Adaptation (20) Simple Adaptation (20) Random Control (20) Normal Control (21) Average Impasse Severity.59.60.73 Monitoring Accuracy.58.62.52 Bias-.01-.03-.01-.02 Discrimination.34.46.48.41  Trend for discrimination differences overall (p =.09)  However, contrary to our predictions, complex reduced discrimination ability, compared to random and simple (p <.04 in paired contrasts)

53 Intelligent Tutoring

54 Corpus-Based Adaptation: How Do Human Tutors Respond?  An empirical method for designing dialogue systems adaptive to student state –extraction of “dialogue bigrams” from annotated human tutoring corpora –χ 2 analysis to identify dependent bigrams –generalizable to any domain with corpora labeled for user state and system response

55 Example Human Tutoring Excerpt S: So the- when you throw it up the acceleration will stay the same? [Uncertain] T: Acceleration uh will always be the same because there is- that is being caused by force of gravity which is not changing. [Restatement, Expansion] S: mm-k. [Neutral] T: Acceleration is– it is in- what is the direction uh of this acceleration- acceleration due to gravity? [Short Answer Question] S: It’s- the direction- it’s downward. [Certain] T: Yes, it’s vertically down. [Positive Feedback, Restatement]

56 Findings  Statistically significant dependencies exist between students’ state of certainty and the responses of an expert human tutor –After uncertain, tutor Bottoms Out and avoids expansions –After certain, tutor Restates –After any emotion, tutor increases Feedback  Dependencies suggest adaptive strategies for implementation in our computer tutor [Forbes-Riley and Litman 2010]

57 TUTOR2: That’s exactly right, but you seem unsure, so let’s sum up. [FEEDBACK] The net force on the truck is equal to the impact force on it. We can prove this just like we did for the car. First, we know that gravity and the normal force on the truck must cancel each other, otherwise the truck would not be at rest vertically. Second, we know that the impact force is the only horizontal force exerted on the truck. [LONG BOTTOM OUT] Complex Adaptation Example: C+U

58 TUTOR1: By the same reasoning that we used for the car, what’s the overall net force on the truck equal to? STUDENT1: Zero?? [I+U] TUTOR2: That’s not correct, but don’t worry. You seem to be aware of your mistake. Let’s resolve it. [FEEDBACK] The net force on the truck is equal to the impact force on it. Let’s walk through this answer step by step. [SHORT BOTTOM OUT] We can derive the net force on the truck by summing the individual forces on it, just like we did for the car. First, what horizontal force is exerted on the truck during the collision? [SUBDIALOGUE] Complex Adaptation Example: I+U

59 TUTOR1: By the same reasoning that we used for the car, what’s the overall net force on the truck equal to? STUDENT1: The force of gravity. [I+nonU] TUTOR2: I’m sorry, but I see an error in your answer. Let’s fix it. [FEEDBACK] We can derive the net force on the truck by summing the individual forces on it, just like we did for the car. First, what horizontal force is exerted on the truck during the collision? [SUBDIALOGUE] Complex Adaptation Example: I+nonU

60 Discussion u Predictions versus results: - Complex Adaptation > Simple Adaptation > Random Control > Normal Control u Why didn’t Simple Adaptation and Complex Adaptation outperform Random Control? –Random Control adapted to some C+U, diminishing differences –Adapting to C+nonU may increase certainty u Why didn’t Complex Adaptation outperform Simple Adaptation? –Complex Adaptation’s human-based content responses were based on frequency, not effectiveness

61  Depending on if answer is C+U, I+U, I+nonU: –ITSPOKE gives same content but varies dialogue act »Based on human tutor responses significantly associated with C+U, I+U, I+nonU answers –ITSPOKE gives complex feedback on uncertainty and (in)correctness »Based on empathetic computer tutor literature (Wang et al., 2005; Hall et al., 2004; Burleson et al., 2004) Complex Adaptation to Uncertainty

62 Impasse Severity  Use the scalar value associated with each student turn to compute an average impasse severity, per student Nominal State:I+nonUI+UC+UC+nonU Scalar State:3 2 1 0 Severity:mostlessleastnone

63 Results II Metacognitive Measure (n=81)Rp Average Impasse Severity-.56.00 Monitoring Accuracy.42.00  Correlations of Metacognitive Measures with Posttest, after controlling for Pretest  Average Impasse Severity (where smaller is better) is negatively correlated with learning [Litman and Forbes-Riley 2009]

64 Additional Results II Metacognitive Measure (n=81)Rp Average Impasse Severity-.56.00 Monitoring Accuracy.42.00  Monitoring Accuracy (where higher is better) is positively correlated with learning [Litman and Forbes-Riley 2009]

65 Preliminary Results: ITSPOKE-AUTO Metacognitive Measure WOZAUTO RpRp Average Impasse Severity-.56.00-.40.00 Monitoring Accuracy.42.00.35.00  Impasse Severity and Monitoring Accuracy remain correlated with learning in ITSPOKE-AUTO corpus [Forbes-Riley and Litman, submitted]

66 Monitoring Accuracy CorrectIncorrect NonUncertainCnonUInonU UncertainCUIU The wizard's annotations for each student are first represented in an array, where each cell represents a mutually exclusive option motivated by Feeling of (Another’s) Knowing [Smith and Clark 1993; Brennan and Williams 1995] which is closely related to uncertainty [Dijkstra et al. 2006] The array is then used to compute monitoring accuracy

67 Monitoring Accuracy CorrectIncorrect NonUncertainCnonUInonU UncertainCUIU Ranges from -1 (no monitoring accuracy) to 1 (perfect monitoring accuracy)

68  Knowledge monitoring accuracy (HC) (Nietfeld et al., 2006)  Monitoring one’s own knowledge ≈ one’s Certainty level ≈ one’s Feeling of Knowing (FOK) –HC has been used to measure FOK accuracy (Smith & Clark, 1993): the accuracy with which one’s certainty corresponds to correctness  Feeling of Another’s Knowing (FOAK): inferring the FOK of someone else (Brennan & Williams, 1995) –We use HC to measure FOAK accuracy (our certainty is inferred) HC = (COR_CER + INC_UNC) – (INC_CER + COR_UNC) (COR_CER + INC_UNC) + (INC_CER + COR_UNC) Metacognitive Performance Metrics

69  Knowledge monitoring accuracy (HC) (Nietfeld et al., 2006)  Monitoring one’s own knowledge ≈ one’s Certainty level ≈ one’s Feeling of Knowing (FOK) –HC has been used to measure FOK accuracy (Smith & Clark, 1993): the accuracy with which one’s certainty corresponds to correctness  Feeling of Another’s Knowing (FOAK): inferring the FOK of someone else (Brennan & Williams, 1995) –We use HC to measure FOAK accuracy (our certainty is inferred) HC = (COR_CER + INC_UNC) – (INC_CER + COR_UNC) (COR_CER + INC_UNC) + (INC_CER + COR_UNC) Metacognitive Performance Metrics Denominator sums over all cases

70  Knowledge monitoring accuracy (HC) (Nietfeld et al., 2006)  Monitoring one’s own knowledge ≈ one’s Certainty level ≈ one’s Feeling of Knowing (FOK) –HC has been used to measure FOK accuracy (Smith & Clark, 1993): the accuracy with which one’s certainty corresponds to correctness  Feeling of Another’s Knowing (FOAK): inferring the FOK of someone else (Brennan & Williams, 1995) –We use HC to measure FOAK accuracy (our certainty is inferred) HC = (COR_CER + INC_UNC) – (INC_CER + COR_UNC) (COR_CER + INC_UNC) + (INC_CER + COR_UNC) Metacognitive Performance Metrics cases where (un)certainty and (in)correctness agree

71  Knowledge monitoring accuracy (HC) (Nietfeld et al., 2006)  Monitoring one’s own knowledge ≈ one’s Certainty level ≈ one’s Feeling of Knowing (FOK) –HC has been used to measure FOK accuracy (Smith & Clark, 1993): the accuracy with which certainty corresponds to correctness  Feeling of Another’s Knowing (FOAK): inferring the FOK of someone else (Brennan & Williams, 1995) –We use HC to measure FOAK accuracy (our uncertainty is inferred) HC = (COR_CER + INC_UNC) – (INC_CER + COR_UNC) (COR_CER + INC_UNC) + (INC_CER + COR_UNC) Metacognitive Performance Metrics cases where (un)certainty and (in)correctness are at odds

72  Knowledge monitoring accuracy (HC) (Nietfeld et al., 2006)  Monitoring one’s own knowledge ≈ one’s Certainty level ≈ one’s Feeling of Knowing (FOK) –HC has been used to measure FOK accuracy (Smith & Clark, 1993): the accuracy with which certainty corresponds to correctness  Feeling of Another’s Knowing (FOAK): inferring the FOK of someone else (Brennan & Williams, 1995) –We use HC to measure FOAK accuracy (our uncertainty is inferred) HC = (COR_CER + INC_UNC) – (INC_CER + COR_UNC) (COR_CER + INC_UNC) + (INC_CER + COR_UNC) Metacognitive Performance Metrics Scores range from -1 (no accuracy) to 1 (perfect accuracy)


Download ppt "A Tutorial Dialogue System that Adapts to Student Uncertainty Diane Litman Computer Science Department & Intelligent Systems Program & Learning Research."

Similar presentations


Ads by Google