Presentation is loading. Please wait.

Presentation is loading. Please wait.

circle Towards Spoken Dialogue Systems for Tutorial Applications Diane Litman Reprise of LRDC Board of Visitors Meeting, April 2003.

Similar presentations


Presentation on theme: "circle Towards Spoken Dialogue Systems for Tutorial Applications Diane Litman Reprise of LRDC Board of Visitors Meeting, April 2003."— Presentation transcript:

1

2 circle Towards Spoken Dialogue Systems for Tutorial Applications Diane Litman Reprise of LRDC Board of Visitors Meeting, April 2003

3 circle Outline  Introduction  System and Corpora  Two Pilot Studies  Summary

4 circle (Spoken) Dialogue Tutoring  Typed Dialogue > Text  greater adaptivity in dialogue (shorter cycle, sensitivity to student beliefs and misconceptions)  Is Spoken Dialogue >> Text ?  more natural, new sources of evidence for adaptation  Speech R&D has only recently made such investigations possible

5 circle Research Questions  What are the advantages – and disadvantages – of using speech over text in tutoring dialogues?  Can acoustic/prosodic features in spoken tutoring dialogues be used to infer pedagogically significant information?  Can the tutoring system make use of such inferences?

6 circle ITSPOKE: Intelligent Tutoring SPOKEn Dialogue System  “Back-end” is Why2-Atlas  Speech input via Sphinx2 speech recognizer  Speech output via Festival text-to-speech synthesizer  Other Issues –Internals, AAV, Verbosity/Interface, etc.

7 circle System Status: Language Models (LMs)  55 dialogue-dependent language models by categorizing prompts for 4551 typed student responses in Why2-Atlas human-computer corpus, will enhance with h-h spoken corpus Ex: one LM for prompts taking “yes/no” type answers prompt: Just as the car starts moving, the string is vertical, so it can't exert any horizontal force on the dice. No other objects are touching the dice. So are there any horizontal forces on the dice as the car starts moving? 8.332“yes” 4.171“yeah” 4.171“none” 83.3320“no” FrequencyCount User response prompt: When analyzing the motion of the two cars, one towing the other, can we treat them as a single compound body? User ResponseCountFrequency “no”28.70 “yes”2191.30

8 circle System Status: LM Evaluation Different Errors: Rejections/Misrecognitions Different metrics: Word Error Rate/Concept Accuracy Separate test set used for evaluation “Yes/No” LM Example Raw Error No Rejects Concept Err No Rejects Mean52.740.530.115.1 STD16.620.923.718.7

9 circle ITSPOKE Corpora  Human-Human Corpus - Target size: 20 subjects - Current size: ~10 subjects/73 dialogues/27 transcribed & turn-annotated  Human-Computer Corpus - ITSPOKE: still in active development

10 circle ITSPOKE: Human-Human Corpus Transcription and Annotation

11 circle Emotion Pilot Study Towards Emotion Prediction in Spoken Tutoring Dialogues [Human-Human] Diane Litman Kate Forbes Scott Silliman Proc. HLT-NAACL, May 2003

12 circle Motivation  Human tutors listen to both “what” and “how” (e.g. “confident” vs. “uncertain”)  Speech supplies acoustic-prosodic information about user state; some spoken dialogue applications already handle “problem” dialogues specially (Litman et al. 2001)  Can effectiveness of computer tutors increase by detecting/adapting to student state (Evens 2001)

13 circle Goals  Are there features in our human-human spoken tutoring dialogues that can automatically predict useful student states?  Investigate such features in our human- computer spoken tutoring dialogues  Construct our system to predict and respond to these features

14 circle Annotating Emotion 14 transcribed dialogues (n=553 student turns) Each student turn was annotated (intuition of 1 coder) with one of 3 general categories: -negative (e.g. ‘uncertain’ or ‘frustrated’): n=141 -positive (e.g. ‘confident’ or ‘certain’): n=167 - neutral/indeterminate: n=248

15 circle Example Annotated Dialogue Tutor: Now this law that force is equal to mass times acceleration, what's this law called? This is uh since this it is a very important basic uh fact uh it is it is a law of physics. Um you have you have read it in the background material. Can you recall it? Student: Um no it was one of Newton's laws but I don't- remember which one. (laugh) (EMOTION = NEGATIVE) Tutor: Right, right- That- is Newton's second law of motion. Student: he I- Ok, because I remember one, two, and three, but I didn't know if there was a different name (EMOTION = POSITIVE) Tutor: Yeah that's right you know Newton was a genius-

16 circle Predicting Emotion Ripper (machine learning program) Input: 1) classes to be learned (our 3 emotion categories) 2) names and possible values for a set of features (next slide) 3) training examples with class and feature values (the annotated student turns) Output: an ordered set of if…then rules for classifying future examples

17 circle Machine Learning Results  Six turn features - Problem, Student, Duration, StartTime, Transcription, #Words -all features automatically available in real-time  The cross-validated error rate (33.03%) significantly reduces the majority class baseline (55.69%) if (duration ≥ 0.65) & (text has “I”) then negative else if (duration ≥ 2.98) then negative else if (duration ≥ 0.93) & (startTime ≥ 297.62) then positive else if (text has “right”) then positive else neutral  Results suggest there are indeed features that can be used to automatically predict emotion in tutoring dialogues

18 circle Current Directions  Reliable Emotion Annotation Guidelines -natural as opposed to elicited data -relative to context and domain  Wider variety of features from many knowledge sources (acoustic, prosodic, lexical, syntactic, semantic, discourse, local and global contextual dialogue features)  Response Coding and Analysis

19 circle A Comparison of Tutor and Student Behavior in Speech Versus Text Based Tutoring [Human-Human] Carolyn Penstein Ros é, Diane Litman, Dumisizwe Bhembe, Kate Forbes, Scott Silliman, Ramesh Srivastava, Kurt VanLehn Proc. Building Educational Applications Using Natural Language Processing, May 2003 Text/Speech Pilot Study

20 circle Motivation  Working hypothesis regarding learning gains: Human Dialogue > Computer Dialogue > Text  Most dialogue tutors are text-based; however, self- explanation correlates with learning and occurs more in speech (Hausmann and Chi, 2002)  Effectiveness of intelligent dialogue tutorial systems could further increase if speech-based?

21 circle Additional Goals  What aspects of dialogue correlate with learning gains?  construct our system to encourage them  In text, larger student turn lengths and student-tutor word ratios correlate with learning (Rosé et al., 2003):  - do the same dialogue features correlate in speech? - do the same tutor actions elicit such features?

22 circle Experimental Procedure Three conditions - typed dialogue - spoken dialogue - reading targeted text Spoken versus typed conditions -input and output modalities differ - strict turn-taking in typed; overlaps in speech

23 circle Post-Test Results MeanSDN Spoken Dialogue.72 (.70).15 6 ( 7) Typed Dialogue.65 (.67).1315 (19) Reading Targeted Text.57.1320  Spoken > Reading Targeted (p=.03, sigma=1.08) (p=.03,effect_size=1.23 )  Typed> Reading Targeted (p=.07, sigma=0.62) (p=.02, effect_size=0.68)

24 circle Time on Task Results MeanSDN Reading Targeted Text 85 3820 Spoken Dialogue160 59 6 Typed Dialogue37113412  Reading Targeted Text > Spoken Dialogue (p=.001)  Spoken Dialogue > Typed Dialogue (p=.002)

25 circle Dialogue Differences:Text/Speech 39.04 391.85 11.04tutortext 26.781199.14 46.94tutorspeech 13.39 146.72 11.04studenttext 5.72 264.18 47.49studentspeech #words/ turn #words/ dialog #turns/ dialog participantcondition  Speech: (n=24); Text (n=69)  mean student turn length (correlated with learning gains in text) is shorter in speech (p<.001)

26 circle Similarities  Relative proportion of student to tutor words doesn’t differ across conditions (p=.242) 1.00 0.37 0.99 0.29 #Student trns/ #Tutor trns #Student wrds/ #Tutor wrds #Student trns/ #Tutor trns #Student wrds/ #Tutor wrds Text Condition Speech Condition

27 circle Current Directions  Completion of corpus collection and transcription!  Characterization of overlaps  Learning gains analysis  Coding of questions and other phenomena  Other dialogue evaluation metrics

28 circle Summary Goal: develop and apply language and speech technology to yield the next generation of intelligent tutoring systems Expected Contributions: – Empirical comparisons with typed tutorial dialogue – Annotation scheme for student state – Use of prosodic and other features to predict such states – Exploitation of such predictions by the tutoring system

29 circle Why2-Atlas Ex. ITSPOKE Ex. (Human Human Corpora) Problem: Suppose that you released 3 identical balls of clay in a vacuum at exactly the same instant. They would all hit the ground at the same instant. Now you stick two of the balls together, forming one ball that is twice as heavy as the remaining, untouched clay ball. Both balls are released in a vacuum at exactly the same instant. Which ball hits the ground first? Original Essay: Both balls will hit the ground at the same time. The balls are in free fall (only gravitational forces). The ratio of the masses and weight are equal. --------------------------2 min into dialogue-------------------------------- Tutor: You have correctly said the balls are in free fall. What do you conclude from this fact? Student: There is no air resistance and the balls’ accelerations are constant in regards to one another Tutor: Right, so the conclusion is that they both have the same acceleration, the free fall acceleration. Now you have to show that time taken by them to reach the ground will be the same. How do you do that? Student: F (net forces) / mass = f/m because F = m*a therefore ratio should equal one another Tutor: But that does not tell you anything about the time taken. What physical quantities will determine the time taken. Will the initial velocity of the balls matter? Student: yes, assuming starting at rest? sorry, kind of at a loss at the moment Problem: If a car is able to accelerate at 2 m/s2, what acceleration can it attain if it is towing another car of equal mass? Original Essay: If the car is towing another car of equal mass, the maximum acceleration would be the same because the car would be towed behind and the friction caused would only be by the front of the first car. -----------------------------6.5 min into dialogue-------------------------- Tutor: Now this law that force is equal to mass times acceleration, what's this law called? This is uh since this it is a very important basic uh fact uh it is it is a law of physics. Um you have you have read it in the background material. Can you recall it? Student: Um no it was one of Newton's laws but I don't- remember which one. (laugh) Tutor: Right, right- That- is Newton's second law of motion. Student: he I- Ok, because I remember one, two, and three, but I didn't know if there was a different name Tutor: Yeah that's right you know Newton was a genius- Student: (laugh) Tutor: and uh he looked at a large number of experiments and experimental data that was available and from that he could come to this general law and it is known as Newton's second law of motion. Um many many other scientists before him had seen all this data which was collected by scientists but had not concluded this now it looks very simple but to come to the conclusion from a mass of data was something which required the genius of Newton. Student: mm hm

30 circle Pilot Study #1: FUTURE DIRECTIONS  (Hausmann and Chi, 2002): student self-explanation correlates with learning  (Rosé et al., 2003): larger average student turn lengths and student-tutor word ratios correlate with learning gains in text condition and occur after tutor open ended questions/negative feedback - Smaller average student turn length in speech condition  less effective? - Similar student-tutor word ratios in conditions  similarly effective? - Larger average number of student words in speech condition  more self- explanation  more effective? - Larger average number of tutor words in speech condition  more open-ended questions and negative feedback  more student self-explanation  more effective?  Relative merits of speech- vs. text-based human - computer tutoring? - Intelligent spoken dialogue tutoring system will deal with more/different noise (ASR vs. spelling errors)  more clarifications/corrections  more self-explanation  more effective?  Additional tutors with different tutoring styles

31 circle Pilot Study #2: Text-Feature Ruleset  1 Feature: Text in Turn Figure 2: Text-Feature Ruleset for Emotion Prediction (excerpt from 21 rules)) if (text has “the”) & (text has “don't”) then negative else if (text has “I”) & (text has “don't”) then negative … else if (text has “um”) & (text has “ ”) then negative else if (text has “the”) & (text has “ ”) then negative … else if (text has “right”) then positive … else if (text has “so”) then positive … else if (text has “(laugh)”) & (text has “that's”) then positive … else neutral  Estimated mean error and standard deviation: 39.03% +/- 2.40%, based on 25-fold cross-validation

32 circle Pilot Study #3: ACOUSTIC FEATURES w/ Julia Hirschberg & Jennifer Vendetti, Columbia U.  Acoustic Features: (normalized) meanFO (pitch); meanRMS (amplitude)  8 dialogues; 302 labelled turns (110 = neutral; 100 = negative; 92 = positive)  Descriptive Results: - main effect of emotion (meanF0: p<0.001; meanRMS: p=.003) - neutral turns have a higher meanFO than both negative/positive turns - neutral turns have a lower meanRMS than negative turns - positive turns show the highest meanRMS  Predictive Results: - acoustic features have same utility as other features for predicting emotional category via RIPPER

33 circle Pilot Study #2-3: FUTURE DIRECTIONS  Wider variety of features from many knowledge sources (acoustic, prosodic, lexical, syntactic, semantic, discourse, local and global contextual dialogue features)  Reliable Emotion Annotation Guidelines (previous studies’ inter-annotator agreement around 70%, Kappa values around 0.47)  Differences in human-human versus human-computer dialogues?  Demonstrate that enhancing a spoken dialogue system to automatically predict and dynamically respond to student emotional states (reinforcing + states, rectifying – states) will measurably improve system performance

34 circle Why2-Atlas Screen Shot 4. A rock is thrown straight upward with a velocity v. What is its acceleration at the top of its path? What is its velocity when it comes back to the starting point? Why? At the very top of the path, where the velocity is equal to 0, the acceleration will be -9.8 m/(s*s), the acceleration of gravity. It will no longer have any upward acceleration. When it comes back the its starting point (assuming that it is the ground) its velocity will be equal to 0 because its downward progress will be stopped by the ground Tutor: What is the velocity just before it hits the ground? Student: That will depend on the time that the ball is in the air/distance ball traveled. Tutor: How will it compare with the velocity with which it was thrown up? They will be equal.The height the ball reaches will depend on the initial velocity


Download ppt "circle Towards Spoken Dialogue Systems for Tutorial Applications Diane Litman Reprise of LRDC Board of Visitors Meeting, April 2003."

Similar presentations


Ads by Google