Metacognition and Learning in Spoken Dialogue Computer Tutoring Kate Forbes-Riley and Diane Litman Learning Research and Development Center University.

Metacognition and Learning in Spoken Dialogue Computer Tutoring Kate Forbes-Riley and Diane Litman Learning Research and Development Center University of Pittsburgh Pittsburgh, PA USA

2 Outline u Overview u Spoken Dialogue Computer Tutoring Data u Metacognitive Metrics based on Student Uncertainty and Correctness labels u Do Metacognitive Metrics Predict Learning? u Conclusions, Future Work

3 Background u Metacognition: important measure of performance and learning u Uncertainty: metacognitive state in tutorial dialogue research u Signals learning impasses (e.g., VanLehn et al., 2003) u Correlates with learning (Litman & Forbes-Riley, 2009; Craig et al., 2004) u Computer tutor responses improve performance (Forbes-Riley & Litman, 2010; Aist et al., 2002; Tsukahara & Ward, 2001) u Complex metrics: combine dimensions (uncertainty, correctness) u Learning impasse severity (Forbes-Riley et al., 2008) u Knowledge monitoring accuracy (Nietfeld et al., 2006) u Bias (Kelemen et al., 2000; Saadawi et al. 2009) u Discrimination (Kelemen et al., 2000; Saadawi et al. 2009)

4 Our Research u Prior work: do metacognitive metrics predict learning in a wizarded spoken dialogue tutoring corpus? (Litman & Forbes-Riley, 2009) u Computed on manually-labeled uncertainty and correctness u All four complex metrics predicted learning u Current Work: Do metrics also predict learning in a comparable fully automated corpus? u One set computed on real-time automatic (noisy) labels u One set computed on post-experiment manual labels u Most complex metrics still predict learning (noisy or manual) u Worthwhile/Feasible to remediate noisy metacognitive metrics

5 Spoken Dialogue Computer Tutoring Data u ITSPOKE: speech-enhanced, modified version of Why2-Atlas qualitative physics tutor (VanLehn, Jordan, Rosé et al., 2002) u Two prior controlled experiments evaluated utility of responding to uncertainty over and above correctness u Uncertainty and incorrectness are learning impasses (opportunities to learn) (e.g., VanLehn et al., 2003) u Enhanced ITSPOKE: response contingent on student turn’s combined uncertainty and correctness labels (impasse state) u Details in Forbes-Riley & Litman, 2010 u Procedure: reading, pretest, 5 problems, survey, posttest

6 Spoken Dialogue Computer Tutoring Data u 1 st Experiment: ITSPOKE-WOZ corpus (wizarded) u 405 dialogues, 81 students u speech recognition, uncertainty, correctness labeling by human u 2 nd Experiment: ITSPOKE-AUTO corpus (fully-automated) u 360 dialogues, 72 students u Manually transcribed and labeled after experiment u Speech recognition accuracy = 74.6% (Sphinx2) u Correctness accuracy: 84.7% (TuTalk (Jordan et al., 2007)) u Uncertainty accuracy: 80.3% (logistic regression model built with speech/dialogue features, trained on ITSPOKE-WOZ corpus)

7 ITSPOKE-AUTO Corpus Excerpt t1: […] How does the man’s velocity compare to that of the keys? sAUTO: his also the is same as that of his keys incorrect+certain sMANU: his velocity is the same as that of his keys correct+uncertain t2: […] What forces are exerted on the man after he releases his keys? sAUTO: the only force is incorrect+certain sMANU: the only force is incorrect+uncertain t3: […] What’s the direction of the force of gravity on the man? sAUTO: that in the pull in the man vertically down correct+certain sMANU: gravity will be pulling the man vertically down correct+certain

8 Metacognitive Performance Metrics u Metrics computed using four equations that combine Uncertainty and Correctness labels in different ways u Metrics computed per student (over all 5 dialogues) u Two sets of metrics: u one set used real-time automatic (noisy) labels (-auto) u one set used post-experiment manual labels (-manu) u Metrics represent inferred (tutor-perceived) values, because uncertainty labeled by system/human judge u For each metric, we computed a Partial Pearson’s correlation with posttest, controlled for pretest

9 u Average Learning Impasse Severity (Forbes-Riley & Litman, 2008) u Uncertainty and incorrectness are learning impasses u We distinguish four impasse states: all combinations of binary uncertainty (UNC, CER) and correctness (INC, COR) u We rank impasse states by severity based on impasse awareness u We label state of each turn and compute average impasse severity State:INC_CERINC_UNCCOR_UNC COR_CER Severity: most (3) less (2) least (1) none (0) Metacognitive Performance Metrics

10 u Knowledge monitoring accuracy (HC) (Nietfeld et al., 2006) u Monitoring one’s own knowledge ≈ one’s Certainty level ≈ one’s Feeling of Knowing (FOK) u HC has been used to measure FOK accuracy (Smith & Clark, 1993): the accuracy with which one’s certainty corresponds to correctness u Feeling of Another’s Knowing (FOAK): inferring the FOK of someone else (Brennan & Williams, 1995) u We use HC to measure FOAK accuracy (our certainty is inferred) HC = (COR_CER + INC_UNC) – (INC_CER + COR_UNC) (COR_CER + INC_UNC) + (INC_CER + COR_UNC) Metacognitive Performance Metrics

11 u Knowledge monitoring accuracy (HC) (Nietfeld et al., 2006) u Monitoring one’s own knowledge ≈ one’s Certainty level ≈ one’s Feeling of Knowing (FOK) u HC has been used to measure FOK accuracy (Smith & Clark, 1993): the accuracy with which one’s certainty corresponds to correctness u Feeling of Another’s Knowing (FOAK): inferring the FOK of someone else (Brennan & Williams, 1995) u We use HC to measure FOAK accuracy (our certainty is inferred) HC = (COR_CER + INC_UNC) – (INC_CER + COR_UNC) (COR_CER + INC_UNC) + (INC_CER + COR_UNC) Metacognitive Performance Metrics Denominator sums over all cases

12 u Knowledge monitoring accuracy (HC) (Nietfeld et al., 2006) u Monitoring one’s own knowledge ≈ one’s Certainty level ≈ one’s Feeling of Knowing (FOK) u HC has been used to measure FOK accuracy (Smith & Clark, 1993): the accuracy with which one’s certainty corresponds to correctness u Feeling of Another’s Knowing (FOAK): inferring the FOK of someone else (Brennan & Williams, 1995) u We use HC to measure FOAK accuracy (our certainty is inferred) HC = (COR_CER + INC_UNC) – (INC_CER + COR_UNC) (COR_CER + INC_UNC) + (INC_CER + COR_UNC) Metacognitive Performance Metrics cases where (un)certainty and (in)correctness agree

13 u Knowledge monitoring accuracy (HC) (Nietfeld et al., 2006) u Monitoring one’s own knowledge ≈ one’s Certainty level ≈ one’s Feeling of Knowing (FOK) u HC has been used to measure FOK accuracy (Smith & Clark, 1993): the accuracy with which certainty corresponds to correctness u Feeling of Another’s Knowing (FOAK): inferring the FOK of someone else (Brennan & Williams, 1995) u We use HC to measure FOAK accuracy (our uncertainty is inferred) HC = (COR_CER + INC_UNC) – (INC_CER + COR_UNC) (COR_CER + INC_UNC) + (INC_CER + COR_UNC) Metacognitive Performance Metrics cases where (un)certainty and (in)correctness are at odds

14 u Knowledge monitoring accuracy (HC) (Nietfeld et al., 2006) u Monitoring one’s own knowledge ≈ one’s Certainty level ≈ one’s Feeling of Knowing (FOK) u HC has been used to measure FOK accuracy (Smith & Clark, 1993): the accuracy with which certainty corresponds to correctness u Feeling of Another’s Knowing (FOAK): inferring the FOK of someone else (Brennan & Williams, 1995) u We use HC to measure FOAK accuracy (our uncertainty is inferred) HC = (COR_CER + INC_UNC) – (INC_CER + COR_UNC) (COR_CER + INC_UNC) + (INC_CER + COR_UNC) Metacognitive Performance Metrics Scores range from -1 (no accuracy) to 1 (perfect accuracy)

15 minus u Bias (Kelemen et al., 2000; Saadawi et al. 2009) u Measures how much more certainty than correctness there is u Scores less / greater than 0 indicate under- / over-confidence Bias = COR_CER + INC_CER COR_CER + INC_CER + COR_UNC + INC_UNC COR_CER + COR_UNC COR_CER + INC_CER + COR_UNC + INC_UNC Metacognitive Performance Metrics

16 u Bias (Kelemen et al., 2000; Saadawi et al. 2009) u Measures how much more certainty than correctness there is u Scores less / greater than 0 indicate under- / over-confidence Bias = COR_CER + INC_CER COR_CER + INC_CER + COR_UNC + INC_UNC COR_CER + COR_UNC COR_CER + INC_CER + COR_UNC + INC_UNC Metacognitive Performance Metrics Denominator sums over all cases minus

17 u Bias (Kelemen et al., 2000; Saadawi et al. 2009) u Measures how much more certainty than correctness there is u Scores less / greater than 0 indicate under- / over-confidence Bias = COR_CER + INC_CER COR_CER + INC_CER + COR_UNC + INC_UNC COR_CER + COR_UNC COR_CER + INC_CER + COR_UNC + INC_UNC minus Metacognitive Performance Metrics Total certain answers Total correct answers

18 u Discrimination (Kelemen et al., 2000; Saadawi et al. 2009) u Measures one’s ability to discriminate whether one is correct u Scores greater than 0 indicate higher performance Discrimination = COR_CER INC_CER COR_CER + COR_UNC INC_CER + INC_UNC Metacognitive Performance Metrics minus

19 u Discrimination (Kelemen et al., 2000; Saadawi et al. 2009) u Measures one’s ability to discriminate whether one is correct u Scores greater than 0 indicate higher performance Discrimination = COR_CER INC_CER COR_CER + COR_UNC INC_CER + INC_UNC Metacognitive Performance Metrics minus Correct answersIncorrect answers

20 u Discrimination (Kelemen et al., 2000; Saadawi et al. 2009) u Measures one’s ability to discriminate whether one is correct u Scores greater than 0 indicate higher performance Discrimination = COR_CER INC_CER COR_CER + COR_UNC INC_CER + INC_UNC Metacognitive Performance Metrics minus Proportion of correct certain answers Proportion of incorrect certain answers

21 Prior ITSPOKE-WOZ Corpus Results MetricMeanSDRp AV Impasse Severity.63.24-.56.00 HC.59.16.42.00 Bias-.02.12-.21.06 Discrimination.42.19.32.00 %Correct.79.09.52.00 %Uncertain.23.11-.13.24 u In ideal conditions, higher learning correlates with: u Less severe impasses (that include uncertainty)/no impasses u Higher knowledge monitoring accuracy u Underconfidence about correctness u Better discrimination of when one is correct u Being correct

22 MetricMeanSDRp AV Impasse Severity_auto.96.26-.40.00 HC_auto.42.14.35.00 Bias_auto.21.07-.36.00 Discrimination_auto.19.10-.04.77 %Correct_auto.66.10.39.00 %Uncertain_auto.13.07-.15.20 Current ITSPOKE-AUTO Corpus Results: -auto labels u In noisy/realistic conditions, higher learning still correlates with: u Less severe/no impasses u Higher knowledge monitoring accuracy u Underconfidence about correctness u Being correct

23 MetricMeanSDRp AV Impasse Severity_manu.82.23-.50.00 HC_manu.49.13.29.02 Bias_manu.06.13-.19.11 Discrimination_manu.30.14-.03.81 %Correct_manu.72.09.52.00 %Uncertain_manu.22.14-.13.28 Current ITSPOKE-AUTO Corpus Results: -manu labels u In corrected noisy conditions, higher learning still correlates with: u Less severe/no impasses u Higher knowledge monitoring accuracy u Being correct

24 Discussion u Does metacognition add value over correctness for predicting learning in ideal and realistic conditions? u Recomputed correlations controlling for pretest and %Correct: u ITSPOKE-WOZ: All complex metrics correlate with posttest u ITSPOKE-AUTO: No metrics correlate with posttest u Metacognition adds value in ideal conditions u Stepwise linear regression greedily selects from all metrics+pretest u ITSPOKE-WOZ: selects HC after %Correct and pretest u ITSPOKE-AUTO: selects Impasse Severity_auto after pretest u Metacognition adds value in realistic conditions too

25 Conclusions u Metacognitive performance metrics predict learning in a fully automated spoken dialogue computer tutoring corpus u Prior work: four metrics predict learning in a wizarded corpus u Three metrics still predict learning even with automated speech recognition, uncertainty and correctness labeling u Average impasse severity, Knowledge monitoring accuracy, Bias u Metacognitive metrics add value over correctness for predicting learning in ideal and realistic conditions u At least some metrics (e.g., noisy average impasse severity)

26 Current and Future Work u Use results to inform system modification aimed at improving metacognitive abilities (and therefore learning) u Feasible to use fully automated system and noisy metacognitive metrics, rather than expensive wizarded system u Metacognitive metrics represent inferred values u Self-judged values differ from inferred (Pon-Barry & Shieber, 2010); expert-judged values are most reliable (D’Mello et al., 2008) u FOK ratings in future system versions can help measure metacognitive improvement u “Metacognition in ITS” literature will also inform system modification (e.g., AIED’07 and ITS’08 workshops)

27 Questions/Comments? Further Information? web search: ITSPOKE Thank You!

28 Future Work cont. u Why didn’t Discrimination_auto, Discrimination_manu and Bias_manu correlate with learning in ITSPOKE-AUTO? u Due to NLP errors in ITSPOKE-AUTO? u Rerun correlations over students with few speech recognition, uncertainty and correctness errors to see if results pattern like ITSPOKE-WOZ u Due to different user populations? u Run ITSPOKE-AUTO on ITSPOKE-WOZ corpus then compute noisy metric correlations to see if results pattern like ITSPOKE-AUTO corpus

29 u For C+U, I+U, I+nonU answers u ITSPOKE gives same content with same dialogue act u ITSPOKE gives feedback on (in)correctness Simple Adaptation to Uncertainty

30 Simple Adaptation Example TUTOR1: By the same reasoning that we used for the car, what’s the overall net force on the truck equal to? STUDENT1: The force of the car hitting it?? [C+U] TUTOR2: Fine. [FEEDBACK] We can derive the net force on the truck by summing the individual forces on it, just like we did for the car. First, what horizontal force is exerted on the truck during the collision? [SUBDIALOGUE] u Same TUTOR2 subdialogue if student was I+U or I+nonU

Metacognition and Learning in Spoken Dialogue Computer Tutoring Kate Forbes-Riley and Diane Litman Learning Research and Development Center University.

Similar presentations

Presentation on theme: "Metacognition and Learning in Spoken Dialogue Computer Tutoring Kate Forbes-Riley and Diane Litman Learning Research and Development Center University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Metacognition and Learning in Spoken Dialogue Computer Tutoring Kate Forbes-Riley and Diane Litman Learning Research and Development Center University.

Similar presentations

Presentation on theme: "Metacognition and Learning in Spoken Dialogue Computer Tutoring Kate Forbes-Riley and Diane Litman Learning Research and Development Center University."— Presentation transcript:

Similar presentations

About project

Feedback