Automatic Hedge Detection

Automatic Hedge Detection
Morgan Ulinski and Julia Hirschberg May 8, 2017

What is Hedging? Hedge: a word or phrase that adds ambiguity or uncertainty; shows the speaker’s lack of commitment to what they are saying Eg. I think John will arrive tomorrow Eg. John may arrive tomorrow. Eg. John will arrive tomorrow sort of early.

Hedge Types Relational: words that show uncertainty in the relation between speaker and utterance Eg. think, may, probably, in my opinion Propositional: words that add uncertainty to propositional content of utterance Eg. some, frequently, kind of, sort of, among others Propositional: degree of quantity, frequency,

Detection of NonCommitted vs Committed Belief
“Aaron may know/NCB that Bill had/CB an accident, and Chris told/CB me Doris knows /ROB. I hope/CB Bill gets/NA better soon!” Detect hedge word in sentence “Aaron <hRel>may</hRel> know that Bill had an accident, and Chris told me Doris knows. I hope Bill gets better soon!” Goal: Improve CB tagger by adding hedge features

Hedge Features Word features: Dependency features: Sentence features:
HedgeToken, HedgeLemma, HedgeType (prop/rel) Dependency features: HedgeToken{Child,Parent,DepAncestor,Sibling} HedgeLemma{Child,Parent,DepAncestor,Sibling} HedgeType{Child,Parent,DepAncestor,Sibling} Sentence features: SentenceContainsHedge

Hedge Detection Dictionary-based: simple lookup in list of hedge words
Rule-based: Use rules based on context, part-of- speech, and dependency structure "I assume his train was late” (hedge) vs. "When will the President assume office?” (non-hedge) Rule: If assume has dependent with relation ccomp, mark as hedge. Otherwise, non-hedge. “Her work is pretty good.” (hedge) vs. “She has a pretty face.” (non-hedge) Rule: If part-of-speech of pretty is adverb, mark as hedge. Otherwise, non-hedge.

Baseline Belief Results (without hedge detection)*
Tag (count): Precision: Recall: F-measure: ROB (256): 28.02 19.92 23.29 NCB (193): 44.93 16.06 23.66 NA (2762): 77.49 56.34 65.24 CB (4299): 69.80 74.78 72.21 Overall (49643) : 70.69 64.62 67.52 CB: Committed belief NA: Not applicable NCB: Non-committed belief ROB: Reported belief *Experiments used 5-fold cross validation on 2014 DEFT Committed Belief Corpus (Release No. LDC2014E55)

Belief Results: Dictionary lookup
Tag (count): Precision: Recall: F-measure: ROB (256): 30.22 21.48 25.11 NCB (193): 49.28 17.62 25.95 NA (2762): 77.69 56.73 65.58 CB (4299): 70.27 75.04 72.58 Overall (49643) : 71.18 65.01 67.95 Belief Results: Rule-based hedge detector Tag (count): Precision: Recall: F-measure: ROB (256): 31.63 24.22 27.43 NCB (193): 50.60 21.76 30.43 NA (2762): 77.89 56.52 65.51 CB (4299): 70.58 74.95 72.70 Overall (49643) : 71.36 65.07 68.07 Improvements primarily in ROB and NCB tho since these represent small portions of the overall corpus

Current Plan Obtain word sense annotation through crowd-sourcing
Data Acquisition Obtain word sense annotation through crowd-sourcing Analysis Analyze the accuracy of Turker judgments Classification Train classifier to take into account WSD Evaluation Compare to simple lexical hedge detector

Current Plan Get word sense annotation through crowd sourcing
Data Acquisition Get word sense annotation through crowd sourcing Analysis Analyze accuracy of Turker judgments Classification Train classifier to take into account WSD Evaluation Compare to simple lexical hedge detector

Process Get word sense annotation through crowd sourcing
Data Acquisition Get word sense annotation through crowd sourcing Analysis Analyze the accuracy of Turker judgments Classification Train classifier to disambiguate hedge from non-hedge uses Evaluation Compare to simple lexical hedge detector

Process Get word sense annotation through crowd-sourcing
Data Acquisition Get word sense annotation through crowd-sourcing Analysis Analyze accuracy of Turker judgments Classification Train classifier to disambiguate hedge from non-hedge uses Evaluation Compare to current lexical-based hedge detectors

Obtaining Word Sense Annotations
For each hedge word we currently have (80 words, 40 phrases), get hedge and non-hedge definitions from WordNet For each sentence containing hedge word(s), use definitions to formulate task for AMT. “The book takes [about] 400 pages to say what Veblen said in one sentence.” Does the [about] in this sentence mean: almost, approximately, near, on the verge of regarding, other Now: 80 hedges, 40 multi word phrases Previously: 133 with all the tenses; only did AMT gather on 47 of them that had appropriate alternate definitions examples: it was about 2 o’clock, he was about the lake, he was talking about john For task - 10 Qs per hit, randomize which is first, 1 gold check Q

New Annotated Data Potential hedges Hedge Incidence 20,683 9311 hedge
11,372 non-hedge (45.02%) about think like know could … and so forth to a certain extent in some ways et cetera 2124 1724 1507 1399 915 2 1 A little = 83.2% (a little girl, He’s a little much for me to handle.) About – 12.3% (I was talking about Steve. He’s about 5 inches tall). Table 4. Analysis of AMT Annotated Data. (Forum posts from 2014 DEFT Committed Belief Corpora – release no. LDC2014E55, LDC2014E106, LDC2014E125)

Future Work Methods: SVMs, Neural Nets Features
Part of speech Position of the hedge Lemmatization LIWC Features Bi-grams/ Tri-grams Likelihood of hedge vs. non-hedge use for word Integrate new (disambiguating) Hedge Detector into BeST System

Thank you!

Automatic Hedge Detection

Similar presentations

Presentation on theme: "Automatic Hedge Detection"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Automatic Hedge Detection

Similar presentations

Presentation on theme: "Automatic Hedge Detection"— Presentation transcript:

Similar presentations

About project

Feedback