Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computational Extraction of Social and Interactional Meaning from Speech Dan Jurafsky and Mari Ostendorf Lecture 7: Dialog Acts & Sarcasm Mari Ostendorf.

Similar presentations

Presentation on theme: "Computational Extraction of Social and Interactional Meaning from Speech Dan Jurafsky and Mari Ostendorf Lecture 7: Dialog Acts & Sarcasm Mari Ostendorf."— Presentation transcript:


2 Computational Extraction of Social and Interactional Meaning from Speech Dan Jurafsky and Mari Ostendorf Lecture 7: Dialog Acts & Sarcasm Mari Ostendorf Note: Uncredited examples are from Dialogue & Conversational Agents chapter.

3 H-H Conversation Dynamics (from Stolcke et al., CL 2000) (from Jurafsky book)

4 Human-Computer Dialog Greeting Request Clarification Question Inform Response Welcome to the Communicator... I wanna go from Denver to... What time do you want to leave Denver? I’d like to leave in the morning... Eight flight options were returned. Option 1...

5 Overview Dialog acts Definitions Important special cases Detection Role of prosody Sarcasm In speech In text

6 Overview Dialog acts Definitions Important special cases Detection Role of prosody Sarcasm

7 Speech/Dialog/Conversation Acts Characterize the purpose of an utterance Associated with sentences (or intonational phrases) Used for: Determining and controlling the “state” of a conversation in a spoken language system Conversation analysis, e.g. extracting social information Many different tag sets, depending on application

8 Example Dialog Acts

9 Aside: Speech vs. Text Speech/dialog/conversation act inventories were developed when conversations were spoken Now, conversations can happen online or via text messaging Dialog acts are also relevant here, researchers are starting to look at this Some differences: Text is impoverished relative to speech, so extra punctuation, emoticons, etc., are added Turn-taking & grounding

10 Overview Dialog acts Definitions Important special cases Detection Role of prosody Sarcasm

11 Special Cases Question detection  punctuation prediction 4-category general set: statement, question, incomplete, backchannel  cross-domain training and transfer Agreement vs. disagreement  social analysis Error corrections (for communication errors)  human-computer dialogs

12 Questions: Harder than you’d think… Indirect speech act

13 Correction Example

14 Overview Dialog acts Definitions Important special cases Detection Role of prosody Sarcasm

15 Automatic Detection Two problems: Classification given segmentation Segmentation (often multiple DAs per turn) Best treated jointly, but this can be computationally complex – start with known segmentation case ok uh let me pull up your profile and I’ll be right with you here and you said you wanted to travel next week 1.ok uh let me pull up your profile and I’ll be right with you here 2.and you said you wanted to travel next week 1.ok 2.uh let me pull up your profile and 3.I’ll be right with you here and said you wanted to travel week

16 Looking at Segmentation (from Stolcke et al., CL 2000)

17 More Segmentation Challenges A: Ok, so what do you think? B: Well that’s a pretty loaded topic. A: Absolutely. B: Well, here in uh – Hang on just a minute, the dog is barking -- Ok, here in Oklahoma, we just went through a major educational reform… A: After all these things, he raises hundreds of millions of dollars. I mean uh the fella B: but he never stops talking about it. A: but ok B: Aren’t you supposed to y- I mean A: well that’s a little- the Lord says B: Does charity mean something if you’re constantly using it as a cudgel to beat your enemies over the- I’m better than you. I give money to charity. A: Well look, now I…

18 Knowledge Sources for Classification Words and grammar “please,” “would you” – cue to request Aux inversion – cue to Y/N question “uh-huh,” “yeah” – often backchannels Prosody Rising final pitch – Y/N question, declarative question Pitch & energy can distinguish backchannel (yeah) from agreement, pitch reset may indicate incomplete Pitch accent type… (more on this) Conversational structure (context) Answers follow questions

19 Feature extraction Words N-grams as features DA-dependent n-gram language model score Presence/absence of syntactic constituents Prosody (typically with normalization) Speaking rate Mean and variance of log energy Fundamental frequency: mean, variance, overall contour trend, utterance final contour shape, change in mean across utterance boundaries

20 Combining Cues with Context With conversational structure: need a sequence model d = dialog act sequence d 1, …, d T f = prosody features, w = word/grammar features Direct model (e.g. conditional random field) Generative model (e.g. HMM, or hidden event model) Experimental results show small gain from context argmax p(d|f,w) where p(d|f,w) =  t p(d t |f t,w t,d t-1 ) argmax p(f,w|d)p(d) where p(f,w|d) =  t p(f t |d t ) p(w t |d t ) p(d t |d t-1 )

21 Assuming Independent Segments No sequence model, but DA prior (unigram) still important Direct model: features can extend beyond utterance to approximately capture context, need to handle nonhomogeneous cues or make them homogeneous Generative model: Can predict d t using separate w and f classifiers, then do classifier combination argmax p(d t |f t,w t ) argmax p(f t |d t ) p(w t |d t ) p(d t )

22 Some Results (not directly comparable) 42 classes (Stolcke et al., CL 2000) Hidden-event model: prosody & words (& context) 42-class accuracy: 62-65% Switchboard ASR (68-71% hand transcripts) 4 classes (Margolis et al., DANLP 2009) Liblinear, n-grams + length (no prosody), hand transcripts 4-class accuracy: 89% Swbd, 84% MRDA 4-class avg recall: 85% Swbd, 81% MRDA 2 classes (Margolis & Ostendorf, ACL 2011) Liblinear, n-grams + prosody, hand transcripts question F-measure: 0.6 MRDA (recall = 92%) 3 classes (Galley et al., ACL 2004) Maxent, lexical-structural-duration features, hand transcripts 3-class accuracy: 86% MRDA

23 Backchannel “Universals” What is in common with backchannels across languages? Short length, low energy, NOT the words Example: English: uh-huh, right, yeah Spanish: mmm, si, ya  mmm, yes, already Experiment: Cross-language DA classification for English vs. Spanish conversational telephone speech Margolis et al., 2009 Statement, question, incomplete, backchannel Use automatic translation in cross-language classification

24 Spanish vs. English DAs Backchannels: roughly 20% of DAs lexical cues are useful within languages, so length is not used much length more important across languages Questions: “ es que” often starts a statement in Spanish translate: “ is that” indicates a question in English

25 Overview Dialog acts Role of prosody Sarcasm

26 Prosody Impact overall is small: from Stolcke et al., CL 2000 BUT, it can be important for some distinctions Other examples: right, so, absolutely, ok, thank you, …. Oh. (disappointment) vs. Oh! (I get it) Yeah: positive vs. negative

27 Question Detection From Margolis & Ostendorf, ACL 2011

28 Whatever! (Benus, Gravano & Hirschberg, 2007) Production: 1 st syllable more likely to have a pitch accent for negative interpretation. Perception: Listeners negativity judgments from prosody on “whatever” alone is similar to having full context.

29 Overview Dialog acts Role of prosody Sarcasm In speech In text

30 Sarcasm Changing the default (or literal) meaning Objectives of sarcasm Make someone else feel bad or stupid Display anger or annoyance about something Inside joke Why is it interesting? More accurate sentiment detection More accurate agreement/disagreement detection General understanding of communication strategies

31 Negative positives in talk shows: yeah and i don't think you’re going to be going back … yeah oh yeah that's right yeah yeah yeah but … yeah well i well m my understanding is … yeah it it it gosh you know is that the standard that prosecutors use the maybe possibly she's telling the truth standard yeah i i don't think it was just the radical right yeah larry i i want to correct something randi said of course

32 Negative positives (cont.) -- right right th that's right that's right yeah you know what you're right but right right but but you you can't say that punching him … right but the but the psychiatrists in this case were not just … senators are not polling very well right then as a columnist who's offering opinions on what i think the right policy is it seems to me…

33 Yeah, right. (Tepperman et al., 2006) 131 instances of “yeah right” in Switchboard & Fisher, 23% annotated as sarcastic Annotation: In isolation: very low agreement between human listeners (k=0.16)* In context, still weak agreement (k=.31) Gold standard based on discussion Observation: laughter is much more frequent around sarcastic versions * “Prosody alone is not sufficient to discern whether a speaker is being sarcastic.”

34 Sarcasm Detector Features: Prosody: relative pitch, duration & energy for each word Spectral: class-dependent HMM acoustic model score Context: laughter, gender, pause, Q/A DA, location in utterance Classifier: decision tree (WEKA) Implicit feature selection in tree training

35 Results Laughter is most important contextual feature Energy seems a little more important than pitch

36 Let’s do our own experiment absolutely Male Female yeah Male Female exactly Male Female

37 Overview Dialog acts Role of prosody Sarcasm In speech In text Davidov, Tsur & Rappoport, 2010 – DTR10 Gonzalez-Ibanez, Muresan & Wacholder, 2011 – GIMW11

38 Sarcasm in Twitter & Amazon Twitter examples (DTR10) “thank you Janet Jackson for yet another year of Super Bowl classic rock!” “He’s with his other woman: XBox 360. It’s 4:30 fool. Sure I can sleep through the gunfire” “Wow GPRS data speeds are blazing fast.” More twitter examples That must suck. I can't express how much I love shopping on black that's what I love about Miami. Attention to detail in preserving historic landmarks of the im just loving the positive vibes out of that! Amazon examples (DTR10) “[I] Love The Cover” (book) “Defective by design” (music player) Negative positive

39 Twitter #sarcasm issues Problems: DTR10 Used infrequently Used in non-sarcastic cases, e.g. to clarify a previous tweet (it was #Sarcasm) Used when sarcasm is otherwise ambiguous (prosody surrogate?) – biased towards the most difficult cases GIMW11 argues that the non-sarcastic cases are easily filtered by only using ones with #sarcasm at the end

40 DTR10 Study Data Twitter: 5.9M tweets, unconstrained context Amazon: 66k reviews, known product context Mechanical Turk annotation K= 0.34 on Amazon, K = 0.41 on Twitter Features Patterns of high frequency words + content word slots “[COMPANY] CW does not CW much” Punctuation K-NN classifier Semi-supervised labeling of training samples

41 DTR10 Results F-score Punctuation0.28 Patterns0.77 Patts + punc0.81 Enriched patts0.40 Enriched punct0.77 All (SASI)0.83 Amazon results for different feature sets on gold standard F-score Amazon - Turk0.79 Twitter - Turk0.83 Twitter – #Gold0.55 Amazon/Twitter SASI results for eval paradigms

42 GMW11 Study Data: 2700 tweets, equal amounts of positive, negative and sarcastic (no neutral) Annotation by hashtags: sarcasm/sarcastic, happy/joy/lucky, sadness/angry/frustrated Features: Unigrams, LIWC classes (grouped), WordNet affect Interjections and punctuation, Emoticons & ToUser Classifier: SVM & logistic regression

43 Results Automatic system accuracy: 3-way S-P-N: 57%, 2-way S-NS: 65% Equal difficulty in separating sarcastic from positive and negative Human S-P-N labeling: 270 tweet subset, K=0.48 Human “accuracy”: 43% unanimous, 63% avg New humans S-NS labeling, K=.59 Human “accuracy”: 59% unanimous, 67% avg Automatic: 68% Accuracies & agreement go up for subset with emoticons Conclusion: Humans are not so good at this task either…

44 Summary Dialog Acts Purpose of an utterance in conversation Useful for punctuation in transcription, social analysis, dialog management in human-computer interaction Detection leverages words, grammar, prosody & context Prosody … matters for a small subset of DAs, but can matter a lot for these cases Is realized in continuous (range) and symbolic (accents) cues – needs contextual normalization Sarcasm: a difficult task! (for both text and speech)

45 Topics not covered … Joint segmentation and classification Semi-supervised learning Domain-dependent tag set differences etc.

Download ppt "Computational Extraction of Social and Interactional Meaning from Speech Dan Jurafsky and Mari Ostendorf Lecture 7: Dialog Acts & Sarcasm Mari Ostendorf."

Similar presentations

Ads by Google