Using Query Patterns to Learn the Durations of Events Andrey Gusev joint work with Nate Chambers, Pranav Khaitan, Divye Khilnani, Steven Bethard, Dan Jurafsky.

Using Query Patterns to Learn the Durations of Events Andrey Gusev joint work with Nate Chambers, Pranav Khaitan, Divye Khilnani, Steven Bethard, Dan Jurafsky

Examples of Event Durations Talk to a friend – minutes Driving – hours Study for an exam – days Travel – weeks Run a campaign – months Build a museum – years

Why are we interested in durations? Event Understanding Duration is an important aspectual property Can help build timelines and events Event coreference Duration may be a cue that events are coreferent Gender (learned from the web) helps nominal coreference Integration into search products Query: “healthy sleep time for age groups” Query: “president term length in [country x]”

Approach1: Supervised System How can we learn event durations?

Dataset (Pan et al., 2006) Labeled 58 documents from TimeBank with event durations Average of minimum and maximum labeled durations A Brooklyn woman who was watching her clothes dry in a laundromat. Min duration – 5 min Max Duration – 1 hour Average – 1950 seconds

Original Features (Pan et al., 2006) Event Properties Event token, lemma, POS tag Subject and Object Head word of syntactic subject and objects of the event, along with their lemmas and POS tags. Hypernyms WordNet hypernyms for the event, its subject and its object. Starting from the first synset of each lemma, three hyperhyms were extracted from the WordNet hierarchy.

New Features Event Attributes Tense, aspect, modality, event class Named Entity Class of Subjects and Objects Person, organization, locations, or other. Typed Dependencies Binary feature for each typed dependency Reporting Verbs Binary feature for reporting verbs (say, report, reply, etc.)

Limitations of the Supervised Approach Need explicitly annotated datasets Sparse and limited data Limited to the annotated domain Low inter-annotator agreement More than a Day and Less Than a Day– 87.7% Duration Buckets – 44.4% Approximate Duration Buckets– 79.8%

Overcoming Supervised Limitations Statistical Web Count approach Lots of text/data that can be used Not limited to the annotated domain Implicit annotations from many sources Hearst(1998), Ji and Lin (2009)

Approach 2: Statistical Web Counts How can we learn event durations?

Terms - Durations Buckets and Distributions “talked for * seconds” “talked for * minutes” “talked for * hours” “talked for * days” “talked for * weeks” “talked for * months” “talked for * years” Duration Bucket Distribution - 1638 hits - 61816 hits - 68370 hits - 4361 hits - 3754 hits - 5157 hits - 103336 hits

Two Duration Prediction Tasks Coarse grained prediction “Less than a day” or “Longer than a day” Fine grained prediction Second, minute, hour, etc.

Task 1: Coarse Grained Prediction

Yesterday Pattern for Coarse Grained Task yesterday event past = past tense event pastp = past progressive tense Normalize yesterday event pattern counts with counts of event occurrence in general Average the two ratios Find threshold on the training set

Example: “to say” with Yesterday Pattern “said yesterday” – 14,390,865 hits “said” – 1,693,080,248 hits “was saying yesterday” – 29,626 hits “was saying” – 14,167,103 hits Average Ratio = 0.0053

Threshold for Yesterday Pattern

Task 2: Fine Grained Prediction

Fine Grained Durations from Web Counts How long does the event “X” last? Ask the web: “X for * seconds” “X for * minutes” … Output distribution over time units Said

Not All Time Units are Equal Need to look at the base distribution “for * seconds” “for * minutes” … In habituals, etc. people like to say “for years”

Conditional Frequencies for Buckets Divide “X for * seconds” By “for * seconds” Reduce credit for seeing “X for years” Said

Double Peak Distribution Two interpretations Durative Iterative Distributions show that with two peaks

Merging Patterns Multiple patterns Distributions averaged Reduces noise from individual patterns Pattern needs to have greater than 100 and less 100,000 hits Said

Fine Grained Patterns Used Patterns for * spent * Patterns not used in * takes * to last

Evaluation and Results

Evaluation TimeBank annotations (Pan, Mulkar and Hobbs 2006) Coarse Task: Greater or less than a day Fine Task: Time units (seconds, minutes, hours, …, years) Counted as correct if within 1 time unit Baseline: Majority Class Fine Grained – months Coarse Grained – greater than a day Compare with re-implementation of supervised (Pan, Mulkar and Hobbs 2006)

New Split for TimeBank Dataset Train – 1664 events (714 unique verbs) Test – 471 events (274 unique verbs) TestWSJ – 147 events (84 unique verbs) Split info is available at http://cs.stanford.edu/~agusev/durations/

Web Counts System Scoring Fine grained Smooth over the adjacent buckets and select top bucket score(b i ) = b i-1 + b i + b i+1 Coarse grained “Yesterday” classifier with a threshold (t = 0.002) Use fine grained approach Select coarse grained bucket based on fine grained bucket

Results Coarse - TestFine - TestCoarse - WSJFine - WSJ Baseline62.459.257.152.4 Supervised73.062.474.866.0 Bucket Counts72.466.573.568.7 Yesterday Counts70.7N/A74.8N/A Web counts perform as well as the fully supervised system

Backoff Statistics (“Spent” Pattern) BothSubjectObjectNone 356446195548 Events in training dataset Had at least 10 hits BothSubjectObjectNone 386841372

Effect of the Event Context Supervised classifier use context in their features Web counts system doesn’t use context of the events Significantly fewer hits when including context Better accuracy with more hits than with context What is the effect of subject/object context on the understanding of event duration?

Human Annotation: Mechanical Turk Can humans do this task without context?

MTurk Setup 10 MTurk workers for each event Without the context Event – choice for each duration bucket With the context Event with subject/object – choice for each duration bucket

Sometimes Context Doesn’t Matter ExplodedIntolerant

Web counts vs. Turk distributions “said” (web count)“said” (MTurk)

Web counts vs. Turk distributions “looking” (web count)“looking” (MTurk)

Web counts vs. Turk distributions “considering” (web count)“considering” (MTurk)

 Compare accuracy –Event with context –Event without context Coarse - TestFine - TestCoarse - WSJFine - WSJ Baseline62.459.257.152.4 Event only52.042.149.443.8 Event and context65.056.770.159.9 Results: Mechanical Turk Annotations Context significantly improves accuracy of MTurk annotations

Event Duration Lexicon Event Duration Lexicon Distributions for 1000 most frequent verbs from the NYT portion of the Gigaword with 10 most frequent grammatical objects of each verb Due to thresholds not all the events have distributions EVENT=to use, ID=e13-7, OBJ=computer, PATTERNS=2, DISTR=[0.009;0.337;0.238;0.090;0.130;0.103;0.092;0.002;] http://cs.stanford.edu/~agusev/durations/

Summary We learned aspectual information from the web Event durations from the web counts are as accurate as a supervised system Web counts are domain-general, work well even without context New lexicon with 1000 most frequent verbs with 10 most frequent objects MTurk suggests that context can improve accuracy of event duration annotation

Thanks! Questions?

Using Query Patterns to Learn the Durations of Events Andrey Gusev joint work with Nate Chambers, Pranav Khaitan, Divye Khilnani, Steven Bethard, Dan Jurafsky.

Similar presentations

Presentation on theme: "Using Query Patterns to Learn the Durations of Events Andrey Gusev joint work with Nate Chambers, Pranav Khaitan, Divye Khilnani, Steven Bethard, Dan Jurafsky."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Using Query Patterns to Learn the Durations of Events Andrey Gusev joint work with Nate Chambers, Pranav Khaitan, Divye Khilnani, Steven Bethard, Dan Jurafsky.

Similar presentations

Presentation on theme: "Using Query Patterns to Learn the Durations of Events Andrey Gusev joint work with Nate Chambers, Pranav Khaitan, Divye Khilnani, Steven Bethard, Dan Jurafsky."— Presentation transcript:

Similar presentations

About project

Feedback