Presentation is loading. Please wait.

Presentation is loading. Please wait.

Detecting and Adapting to Student Uncertainty in a Spoken Tutorial Dialogue System Diane Litman Computer Science Department & Learning Research & Development.

Similar presentations


Presentation on theme: "Detecting and Adapting to Student Uncertainty in a Spoken Tutorial Dialogue System Diane Litman Computer Science Department & Learning Research & Development."— Presentation transcript:

1 Detecting and Adapting to Student Uncertainty in a Spoken Tutorial Dialogue System Diane Litman Computer Science Department & Learning Research & Development Center University of Pittsburgh, USA Currently Leverhulme Visiting Professor School of Informatics University of Edinburgh, UK

2 Outline  Affective Spoken Dialogue Systems  The ITSPOKE System and Corpora  Research Opportunities and Challenges – Affect Detection – System Adaptation – Experimental Evaluation  Summing Up

3 Overview: Towards Adaptive Spoken Dialogue Systems User Affect Detection u Promising across affective states and applications, e.g.: u Craig et al., 2006 u Litman & Forbes-Riley, 2006 u Lee & Narayanan, 2005 u Vidrascu & Devillers, 2005 u Batliner et al., 2003 Adaptation u Sparse, can be difficult to show adaptation improves performance u Some used basic adaptations and showed likeability increases u For other performance metrics, basic adaptations not clear a priori System Adaptation Health Assessment Stress Empathy [Liu & Picard 2005] Gaming Frustration Apology [Klein et al. 2002] Application Tutoring ??????????????

4 Monitoring Student Affect (Motivation) Tutor26: Let me ask my question in a more general way. What happens to the velocity of a body when there is no force acting on it? Student27:dammit (ASR: it is) Tutor28 :Could you please repeat that? Student29 :same (ASR: i same) Tutor30 :Right. And what happens to the HORIZONTAL velocity of a body when there is no HORIZONTAL force acting on it? Student31:zero (ASR: the zero) Tutor32 :Well... OK, let's consider how forces in the vertical direction affect the motion of a body in a horizontal direction in a different scenario Student33:oh gravity you already said this (ASR: beats gravity you're exert less) Tutor34:Fine. Are there any other forces acting on the apple as it falls? Student35:no why are you doing this again (ASR: no y and to it yes) Tutor36:Correct. The only force acting on the apple as it falls to the earth is the earth's gravitational force. In which direction does gravitational force act? Student37:downward you computer (ASR: downward you computer)

5 Affective Tutorial Dialogue Systems  Opportunity –Affective spoken dialogue system technology can improve student learning and other measures of performance [Aist et al. 2002; Pon-Barry et al. 2006]  Challenges – What to detect? – How to respond? – Evaluation?

6 Outline  Affective Spoken Dialogue Systems  The ITSPOKE System and Corpora  Research Opportunities and Challenges – Affect Detection – System Adaptation – Experimental Evaluation  Summing Up

7 ITSPOKE: Motivation  Current learning gap between human and computer tutors –Humans: learning increases of up to 2 standard deviations [Bloom 1984] –Computers: learning increases of only 1 standard deviation [Anderson 1995, VanLehn 2006]  How to bridge this gap? –Currently only humans use full-fledged natural language dialogue –ITSPOKE: a platform for investigating the role of speech and affect in tutorial dialogue systems

8 Back-end is Why2-Atlas system [VanLehn, Jordan, Rose et al. 2002] Sphinx2 speech recognition and Cepstral text-to-speech

9 Back-end is Why2-Atlas system [VanLehn, Jordan, Rose et al. 2002] Sphinx2 speech recognition and Cepstral text-to-speech

10 Back-end is Why2-Atlas system [VanLehn, Jordan, Rose et al. 2002] Sphinx2 speech recognition and Cepstral text-to-speech

11 Two Types of Tutoring Corpora  Human Tutoring –14 students / 128 dialogues (physics problems) –5948 student turns, 5505 tutor turns  Computer Tutoring –ITSPOKE v1 »20 students / 100 dialogues »2445 student turns, 2967 tutor turns –ITSPOKE v2 » 57 students / 285 dialogues » both synthesized and pre-recorded tutor voices

12 ITSPOKE Experimental Procedure  College students without physics –Read a small background document –Took a multiple-choice Pretest –Worked 5 problems (dialogues) with ITSPOKE –Took an isomorphic Posttest  Goal was to optimize Learning Gain – e.g., Posttest – Pretest

13 Outline  Affective Spoken Dialogue Systems  The ITSPOKE System and Corpora  Research Opportunities and Challenges – Affect Detection – System Adaptation – Experimental Evaluation  Summing Up

14 Affect Detection in Spoken Dialogue: Empirical Methodology  Manual Annotation of Affect and Attitudes –Naturally-occurring spoken dialogue data –[Ang et al. 2002; Lee et al. 2002; Batliner et al. 2003; Devillers et al. 2003; Shafran et al. 2003; Liscombe et al. 2005]  Prediction via Machine Learning –Automatically extract features from user turns –Use different feature sets (e.g. prosodic, lexical) to predict affect –Significant reduction of baseline error  Analytical approaches also possible

15 What to Annotate?  Communicator and Customer Care Systems –Negative: Angry, Annoyed, Frustrated, Tired –Positive/Neutral: Amused, Cheerful, Delighted, Happy, Serious [Ang et al. 2002; Shafran et al. 2003; Lee and Narayanan 2005; Liscombe et al. 2005]

16 What to Annotate?  Communicator and Customer Care Systems –Negative: Angry, Annoyed, Frustrated, Tired –Positive/Neutral: Amused, Cheerful, Delighted, Happy, Serious [Ang et al. 2002; Shafran et al. 2003; Lee and Narayanan 2005; Liscombe et al. 2005]  Tutorial Dialogue Systems –Negative: Angry, Annoyed, Frustrated, Bored, Confused, Uncertain, Contempt, Disgusted, Sad –Positive/Neutral: Certain, Curious, Enthusiastic, Eureka [Litman and Forbes-Riley 2006, D’Mello et al. 2006]

17 Example Student Affect in ITSPOKE ITSPOKE: What else do you need to know to find the box‘s acceleration? Student: the direction [NEGATIVE: UNCERTAIN] ITSPOKE : If you see a body accelerate, what caused that acceleration? Student: force [POSITIVE: CERTAIN] ITSPOKE : Good job. Say there is only one force acting on the box. How is this force, the box's mass, and its acceleration related? Student: velocity [NEGATIVE: UNCERTAIN] ITSPOKE : Could you please repeat that? Student: velocity [NEGATIVE: ANNOYED]

18 How to Annotate?  Trained Judges versus Self-Report  Offline versus Online Coding  Evaluation

19 How to Annotate?  Trained Judges versus Self-Report  Offline versus Online Coding  Evaluation –Kappas of.32-.5 in [Ang et al. 2002; Narayanan 2002; Shafran et al. 2003, Litman and Forbes-Riley 2004]

20 Prediction via Machine Learning  Multiple feature types per student turn, e.g. –Acoustic-prosodic –Lexical –Identifiers –System and student performance  Sample research questions –Relative utility of feature types –Impact of speech recognition –Speaker and task dependence –Impact of learning algorithm, amount of training data

21 Detecting Neg/Pos/Neu in ITSPOKE - Baseline Accuracy via Majority Class Prediction

22 Detecting Neg/Pos/Neu in ITSPOKE -Use of prosodic (sp), recognized (asr) and/or actual (lex) lexical features outperforms baseline

23 Detecting Neg/Pos/Neu in ITSPOKE -As with other applications, highest predictive accuracies are obtained by combining multiple feature types [Litman and Forbes-Riley, Speech Communication 2006]

24 Detecting Neg/Pos/Neu in ITSPOKE - However, relative feature utility differs in tutoring (e.g., for speech features: temporal > energy > pitch)

25 Outline  Affective Spoken Dialogue Systems  The ITSPOKE System and Corpora  Research Opportunities and Challenges – Affect Detection – System Adaptation – Experimental Evaluation  Summing Up

26 System Adaptation: How to Respond?  Our initial focus: responding to student uncertainty –Most frequent affective state in our data –Focus of other studies [VanLehn et al. 2003; Craig et al. 2006, Porayska-Pomsta et al. 2007; Pon-Barry et al. 2006] –.62 Kappa  Approaches to adaptive system design –Theory-based –Data-driven

27 Theory-Based Adaptation  In tutoring, not all negatively valenced affect is bad –Such states often represent learning opportunities  E.g., uncertainty represents one type of learning impasse [VanLehn et al. 2003] –An impasse motivates a student to take an active role in constructing a better understanding of the principle  Hypothesis: The system should adapt to student uncertainty in the same way it responds to other impasses

28 Example from ITSPOKE ITSPOKE: What is the net force acting on the truck equal to? Student: I don’t know [Uncertain, Frustrated] ITSPOKE: Since net force is defined as the vector sum of all the forces exerted on an object, let's first find the forces acting on the truck. Try to name them. Student: The impact force, the normal force, and gravity [Certain, Neutral] ITSPOKE: Yep.

29 Data-Driven Adaptation: How Do Human Tutors Respond?  An empirical method for designing dialogue systems adaptive to student affect –extraction of “dialogue bigrams” from annotated human tutoring corpora –χ 2 analysis to identify dependent bigrams –generalizable to any domain with corpora labeled for user state and system response –[Forbes-Riley and Litman, Sigdial 2005; Forbes-Riley and Litman ACII 2007; Forbes-Riley et al., NAACL- HLT 2007]

30 Example Human Tutoring Excerpt S: So the- when you throw it up the acceleration will stay the same? [Uncertain] T: Acceleration uh will always be the same because there is- that is being caused by force of gravity which is not changing. [Restatement, Expansion] S: mm-k. [Neutral] T: Acceleration is– it is in- what is the direction uh of this acceleration- acceleration due to gravity? [Short Answer Question] S: It’s- the direction- it’s downward. [Certain] T: Yes, it’s vertically down. [Positive Feedback, Restatement]

31 Bigram Dependency Analysis EXPECTED Tutor IncludePos Tutor OmitsPos neutral439.462329.54 certain175.21928.79 uncertain129.51686.49 mixed36.82195.18 OBSERVED Tutor IncludesPos Tutor OmitsPos neutral2522517 certain273832 uncertain185631 mixed71161 χ2 = 225.92 (critical χ2 value at p =.001 is 16.27) - “Student Certainness – Tutor Positive Feedback” Bigrams

32 Bigram Dependency Analysis (cont.) EXPECTED Includes Pos Omits Pos neutral439.462329.54 OBSERVED Includes Pos Omits Pos neutral2522517 - Less Tutor Positive Feedback after Student Neutral turns

33 Bigram Dependency Analysis (cont.) EXPECTED Includes Pos Omits Pos neutral439.462329.54 certain175.21928.79 uncertain129.51686.49 mixed36.82195.18 OBSERVED Includes Pos Omits Pos neutral2522517 certain273832 uncertain185631 mixed71161 - Less Tutor Positive Feedback after Student Neutral turns - More Tutor Positive Feedback after “Emotional” turns

34 Findings  Statistically significant dependencies exist between students’ state of certainty and the responses of an expert human tutor –After uncertain, tutor Bottoms Out & avoids expansions –After certain, tutor Restates –After mixed, tutor Hints –After any emotion, tutor increases Feedback  Dependencies suggest adaptive strategies for implementation in computer tutoring systems

35 Outline  Affective Spoken Dialogue Systems  The ITSPOKE System and Corpora  Research Opportunities and Challenges – Affect Detection – System Adaptation – Experimental Evaluation  Summing Up

36 Approaches to Evaluation  “Correlational” Studies, e.g. –Student uncertainty positively correlates with learning [Craig et al. 2004] –Adding uncertainty and frustration metrics to regression models increases model fit [Forbes-Riley et al. 2008]  “Causal” Studies, e.g. –Adding human-provided emotional scaffolding to a reading tutor increases student persistence [Aist et al. 2002] –Experimentally manipulate tutor responses to student uncertainty and investigate impact on learning [Pon-Barry et al. 2006]

37 Adaptation to Student Uncertainty in ITSPOKE: A First Causal Evaluation  Theoretically-motivated uncertainty adaptation –Most systems respond only to (in)correctness –Recall that literature suggests uncertain as well as incorrect student answers signal learning impasses  Proposed Adaptation: Treat uncertain+correct student answers as if they were incorrect

38 Platform: Adaptive WOZ-TUT System  Modified version of ITSPOKE –Dialogue manager adapts to uncertainty »system responses based on combined uncertainty and correctness –Full automation replaced by Wizard of Oz (WOZ) components »human wizard recognizes student speech »human also annotates both uncertainty and correctness

39 WOZ-TUT Screenshot

40 Experimental Design  3 conditions: (parameterized versions of WOZ-TUT) –Experimental: treat all uncertain+incorrect turns as incorrect –First Control: ignore uncertainty (logged) –Second Control: ignore uncertainty (logged), but treate a percentage of random correct answers as incorrect

41  TUTOR: What will the velocity of the object be a second after that (where the initial velocity is 9.8m/s and the acceleration is 9.8m/s2)?  STUDENT: Nineteen point six meters per second?? [uncertain+correct]  TUTOR in First Control Condition moves on: Good. So at every point in time during the fall of the man and his keys, how do their velocities compare with each other?  TUTOR in Experimental Condition remediates: Okay. As we have seen, if a falling object has an acceleration of 9.8m/s2, its velocity changes by 9.8m/s every second. So if a second after it began falling its velocity is 9.8m/s, a second later its velocity will be 9.8m/s + 9.8m/s = 19.6m/s. So what will its velocity be a second after that? Treatments in Different Conditions

42 Experimental Procedure  60 subjects randomly assigned to 3 conditions (gender- balanced) –Native English speakers with no college physics –Procedure: 1) read background material, 2) took pretest, 3) worked training problem with WOZ-TUT, 4) took posttest, 5) worked isomorphic test problem with non-adaptive WOZ-TUT

43 Resulting Corpus u 120 dialogues from 60 students (.ogg format, 20 hours) u Student turns manually transcribed u Tutor turns and Wizard annotations in log files u Available through the Pittsburgh Science of Learning Center u https://learnlab.web.cmu.edu/datashop/index.jsp https://learnlab.web.cmu.edu/datashop/index.jsp u [Forbes-Riley et al., LREC 2008 ] StudentTutor Total Turns21712531 Total Uncertain Turns796- Total Words13533111829 Average Words per Turn6.2344.20

44 Evaluation Results (1)  Learning Gains – short answer pre and posttests – no significant differences across conditions  Dialogue-Based –incorrect and/or uncertain answers in the isomorphic test problem –no significant differences across conditions overall –however ….

45 Evaluation Results (2) Comparing questions originally answered Correct+Uncertain (CU) Answer to Same Question in Isomorphic Test Dialogue EXP (contingent adaptation) CTRL1 (original system) CTRL2 (random adaptation) Total CU -> C4.52.65.1 Total CU -> nonU3.52.34.0 Total CU -> CnonU3.42.23.9 Significant Differences and Trends (compared to CTRL1)

46 Evaluation Results (3)  Summary of Findings [Forbes-Riley et al., Intelligent Tutoring Systems 2008] –Correct+Uncertain answers are more likely to stay correct if they receive the uncertainty adaptation(s) –The uncertainty adaptations also reduce uncertainty –However, results are stronger in the random condition  Problems with Experimental Design –One (rather than five) training problems –Use of vague “Okay” for positive feedback

47 Outline  Affective Spoken Dialogue Systems  The ITSPOKE System and Corpora  Research Opportunities and Challenges – Affect Detection – System Adaptation – Experimental Evaluation  Summing Up

48 Summing Up  Affective Systems are receiving increasing attention in Spoken (Tutorial) Dialogue Research  Many opportunities and challenges remain – Affect Detection – System Adaptation – Experimental Evaluation

49 Current Directions in ITSPOKE  Affect Detection – bootstrapping approaches to annotation – further development of features  System Adaptation – new methods for learning data-driven strategies – responding with tutor affect  Experimental Evaluation – new WOZ-TUT experiment »5 problems, clearer feedback, new empirical adaptation – future ITSPOKE (fully automated) experiment

50 Acknowledgements  Kate Forbes-Riley  ITSPOKE group –Hua Ai, Alison Huettner, Beatriz Maeireizo-Tokeshi, Greg Nicholas, Amruta Purandare, Mihai Rotaru, Scott Silliman, Joel Tetrault, Art Ward –Columbia Collaborators: Julia Hirschberg, Jackson Liscombe, Jennifer Venditti  NLP@Pitt  Why2-Atlas and Human Tutoring groups

51 Thank You! Questions?  Further Information –http://www.cs.pitt.edu/~litman/itspoke.htmlhttp://www.cs.pitt.edu/~litman/itspoke.html  Annotated WOZ-TUT Corpus –https://learnlab.web.cmu.edu/datashop/index.jsp

52 Corpus Description by Condition u One-way ANOVAs showed no significant differences: u number of correct, uncertain, or uncertain+correct turns u number adapted-to turns (EXP vs CTRL2) Training ProblemEXPCTRL1CTRL2 Ave Turns20.6518.6019.75 Ave Correct Turns13.8012.5514.20 Ave Uncertain Turns9.958.6011.15 Ave Uncertain+Correct Turns4.753.756.10 Ave Adapted-To Turns4.7503.65 Ave Uncertain+Correct and Adapted-To Turns 100%0%36%


Download ppt "Detecting and Adapting to Student Uncertainty in a Spoken Tutorial Dialogue System Diane Litman Computer Science Department & Learning Research & Development."

Similar presentations


Ads by Google