Using Query Patterns to Learn the Durations of Events Andrey Gusev joint work with Nate Chambers, Pranav Khaitan, Divye Khilnani, Steven Bethard, Dan Jurafsky.

Slides:



Advertisements
Similar presentations
Multi-Document Person Name Resolution Michael Ben Fleischman (MIT), Eduard Hovy (USC) From Proceedings of ACL-42 Reference Resolution workshop 2004.
Advertisements

Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.
Overview of the TAC2013 Knowledge Base Population Evaluation: Temporal Slot Filling Mihai Surdeanu with a lot help from: Hoa Dang, Joe Ellis, Heng Ji,
Imbalanced data David Kauchak CS 451 – Fall 2013.
Fast Algorithms For Hierarchical Range Histogram Constructions
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen SIMS, UC Berkeley Susan Dumais Adaptive Systems & Interactions Microsoft.
Qinqing Gan Torsten Suel Improved Techniques for Result Caching in Web Search Engines Presenter: Arghyadip ● Konark.
Problem Semi supervised sarcasm identification using SASI
WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.
LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: From EMNLP.
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
A Novel Approach to Event Duration Prediction
Ang Sun Ralph Grishman Wei Xu Bonan Min November 15, 2011 TAC 2011 Workshop Gaithersburg, Maryland USA.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.
Information Extraction with Unlabeled Data Rayid Ghani Joint work with: Rosie Jones (CMU) Tom Mitchell (CMU & WhizBang! Labs) Ellen Riloff (University.
PAKDD'15 DATA MINING COMPETITION: GENDER PREDICTION BASED ON E-COMMERCE DATA Team members: Maria Brbic, Dragan Gamberger, Jan Kralj, Matej Mihelcic, Matija.
Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews K. Dave et al, WWW 2003, citations Presented by Sarah.
Keyphrase Extraction in Scientific Documents Thuy Dung Nguyen and Min-Yen Kan School of Computing National University of Singapore Slides available at.
From Devices to People: Attribution of Search Activity in Multi-User Settings Ryen White, Ahmed Hassan, Adish Singla, Eric Horvitz Microsoft Research,
This week: overview on pattern recognition (related to machine learning)
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Classifying Tags Using Open Content Resources Simon Overell, Borkur Sigurbjornsson & Roelof van Zwol WSDM ‘09.
Automatically Identifying Localizable Queries Center for E-Business Technology Seoul National University Seoul, Korea Nam, Kwang-hyun Intelligent Database.
ONTOLOGY LEARNING AND POPULATION FROM FROM TEXT Ch8 Population.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
Researcher affiliation extraction from homepages I. Nagy, R. Farkas, M. Jelasity University of Szeged, Hungary.
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
Research Ranked Recall: Efficient Classification by Learning Indices That Rank Omid Madani with Michael Connor (UIUC)
A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.
Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.
Improving Subcategorization Acquisition using Word Sense Disambiguation Anna Korhonen and Judith Preiss University of Cambridge, Computer Laboratory 15.
1 Automating Slot Filling Validation to Assist Human Assessment Suzanne Tamang and Heng Ji Computer Science Department and Linguistics Department, Queens.
Semi-supervised Training of Statistical Parsers CMSC Natural Language Processing January 26, 2006.
A Cross-Lingual ILP Solution to Zero Anaphora Resolution Ryu Iida & Massimo Poesio (ACL-HLT 2011)
Leveraging Asset Reputation Systems to Detect and Prevent Fraud and Abuse at LinkedIn Jenelle Bray Staff Data Scientist Strata + Hadoop World New York,
Entity Set Expansion in Opinion Documents Lei Zhang Bing Liu University of Illinois at Chicago.
Summarization Focusing on Polarity or Opinion Fragments in Blogs Yohei Seki Toyohashi University of Technology Visiting Scholar at Columbia University.
Prototype-Driven Learning for Sequence Models Aria Haghighi and Dan Klein University of California Berkeley Slides prepared by Andrew Carlson for the Semi-
Improving Search Results Quality by Customizing Summary Lengths Michael Kaisser ★, Marti Hearst  and John B. Lowe ★ University of Edinburgh,  UC Berkeley,
CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein.
Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.
School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.
Relational Duality: Unsupervised Extraction of Semantic Relations between Entities on the Web Danushka Bollegala Yutaka Matsuo Mitsuru Ishizuka International.
Automatic acquisition for low frequency lexical items Nuria Bel, Sergio Espeja, Montserrat Marimon.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.
FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks EMNLP 2008 Rion Snow CS Stanford Brendan O’Connor Dolores.
OVERVIEW OF VERB TENSE.
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
Relevance Feedback Prof. Marti Hearst SIMS 202, Lecture 24.
Evaluating NLP Features for Automatic Prediction of Language Impairment Using Child Speech Transcripts Khairun-nisa Hassanali 1, Yang Liu 1 and Thamar.
Semi-Supervised Recognition of Sarcastic Sentences in Twitter and Amazon -Smit Shilu.
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
Identifying Expressions of Opinion in Context Eric Breck and Yejin Choi and Claire Cardie IJCAI 2007.
Language Identification and Part-of-Speech Tagging
A Brief Introduction to Distant Supervision
Web News Sentence Searching Using Linguistic Graph Similarity
Erasmus University Rotterdam
Aspect-based sentiment analysis
Prototype-Driven Learning for Sequence Models
Review-Level Aspect-Based Sentiment Analysis Using an Ontology
Sanjna Kashyap 18th February 2019
Presentation transcript:

Using Query Patterns to Learn the Durations of Events Andrey Gusev joint work with Nate Chambers, Pranav Khaitan, Divye Khilnani, Steven Bethard, Dan Jurafsky

Examples of Event Durations Talk to a friend – minutes Driving – hours Study for an exam – days Travel – weeks Run a campaign – months Build a museum – years

Why are we interested in durations? Event Understanding Duration is an important aspectual property Can help build timelines and events Event coreference Duration may be a cue that events are coreferent Gender (learned from the web) helps nominal coreference Integration into search products Query: “healthy sleep time for age groups” Query: “president term length in [country x]”

Approach1: Supervised System How can we learn event durations?

Dataset (Pan et al., 2006) Labeled 58 documents from TimeBank with event durations Average of minimum and maximum labeled durations A Brooklyn woman who was watching her clothes dry in a laundromat. Min duration – 5 min Max Duration – 1 hour Average – 1950 seconds

Original Features (Pan et al., 2006) Event Properties Event token, lemma, POS tag Subject and Object Head word of syntactic subject and objects of the event, along with their lemmas and POS tags. Hypernyms WordNet hypernyms for the event, its subject and its object. Starting from the first synset of each lemma, three hyperhyms were extracted from the WordNet hierarchy.

New Features Event Attributes Tense, aspect, modality, event class Named Entity Class of Subjects and Objects Person, organization, locations, or other. Typed Dependencies Binary feature for each typed dependency Reporting Verbs Binary feature for reporting verbs (say, report, reply, etc.)

Limitations of the Supervised Approach Need explicitly annotated datasets Sparse and limited data Limited to the annotated domain Low inter-annotator agreement More than a Day and Less Than a Day– 87.7% Duration Buckets – 44.4% Approximate Duration Buckets– 79.8%

Overcoming Supervised Limitations Statistical Web Count approach Lots of text/data that can be used Not limited to the annotated domain Implicit annotations from many sources Hearst(1998), Ji and Lin (2009)

Approach 2: Statistical Web Counts How can we learn event durations?

Terms - Durations Buckets and Distributions “talked for * seconds” “talked for * minutes” “talked for * hours” “talked for * days” “talked for * weeks” “talked for * months” “talked for * years” Duration Bucket Distribution hits hits hits hits hits hits hits

Two Duration Prediction Tasks Coarse grained prediction “Less than a day” or “Longer than a day” Fine grained prediction Second, minute, hour, etc.

Task 1: Coarse Grained Prediction

Yesterday Pattern for Coarse Grained Task yesterday event past = past tense event pastp = past progressive tense Normalize yesterday event pattern counts with counts of event occurrence in general Average the two ratios Find threshold on the training set

Example: “to say” with Yesterday Pattern “said yesterday” – 14,390,865 hits “said” – 1,693,080,248 hits “was saying yesterday” – 29,626 hits “was saying” – 14,167,103 hits Average Ratio =

Threshold for Yesterday Pattern

Task 2: Fine Grained Prediction

Fine Grained Durations from Web Counts How long does the event “X” last? Ask the web: “X for * seconds” “X for * minutes” … Output distribution over time units Said

Not All Time Units are Equal Need to look at the base distribution “for * seconds” “for * minutes” … In habituals, etc. people like to say “for years”

Conditional Frequencies for Buckets Divide “X for * seconds” By “for * seconds” Reduce credit for seeing “X for years” Said

Double Peak Distribution Two interpretations Durative Iterative Distributions show that with two peaks

Merging Patterns Multiple patterns Distributions averaged Reduces noise from individual patterns Pattern needs to have greater than 100 and less 100,000 hits Said

Fine Grained Patterns Used Patterns for * spent * Patterns not used in * takes * to last

Evaluation and Results

Evaluation TimeBank annotations (Pan, Mulkar and Hobbs 2006) Coarse Task: Greater or less than a day Fine Task: Time units (seconds, minutes, hours, …, years) Counted as correct if within 1 time unit Baseline: Majority Class Fine Grained – months Coarse Grained – greater than a day Compare with re-implementation of supervised (Pan, Mulkar and Hobbs 2006)

New Split for TimeBank Dataset Train – 1664 events (714 unique verbs) Test – 471 events (274 unique verbs) TestWSJ – 147 events (84 unique verbs) Split info is available at

Web Counts System Scoring Fine grained Smooth over the adjacent buckets and select top bucket score(b i ) = b i-1 + b i + b i+1 Coarse grained “Yesterday” classifier with a threshold (t = 0.002) Use fine grained approach Select coarse grained bucket based on fine grained bucket

Results Coarse - TestFine - TestCoarse - WSJFine - WSJ Baseline Supervised Bucket Counts Yesterday Counts70.7N/A74.8N/A Web counts perform as well as the fully supervised system

Backoff Statistics (“Spent” Pattern) BothSubjectObjectNone Events in training dataset Had at least 10 hits BothSubjectObjectNone

Effect of the Event Context Supervised classifier use context in their features Web counts system doesn’t use context of the events Significantly fewer hits when including context Better accuracy with more hits than with context What is the effect of subject/object context on the understanding of event duration?

Human Annotation: Mechanical Turk Can humans do this task without context?

MTurk Setup 10 MTurk workers for each event Without the context Event – choice for each duration bucket With the context Event with subject/object – choice for each duration bucket

Sometimes Context Doesn’t Matter ExplodedIntolerant

Web counts vs. Turk distributions “said” (web count)“said” (MTurk)

Web counts vs. Turk distributions “looking” (web count)“looking” (MTurk)

Web counts vs. Turk distributions “considering” (web count)“considering” (MTurk)

 Compare accuracy –Event with context –Event without context Coarse - TestFine - TestCoarse - WSJFine - WSJ Baseline Event only Event and context Results: Mechanical Turk Annotations Context significantly improves accuracy of MTurk annotations

Event Duration Lexicon Event Duration Lexicon Distributions for 1000 most frequent verbs from the NYT portion of the Gigaword with 10 most frequent grammatical objects of each verb Due to thresholds not all the events have distributions EVENT=to use, ID=e13-7, OBJ=computer, PATTERNS=2, DISTR=[0.009;0.337;0.238;0.090;0.130;0.103;0.092;0.002;]

Summary We learned aspectual information from the web Event durations from the web counts are as accurate as a supervised system Web counts are domain-general, work well even without context New lexicon with 1000 most frequent verbs with 10 most frequent objects MTurk suggests that context can improve accuracy of event duration annotation

Thanks! Questions?