Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,

RTE @ Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe, Christopher D. Manning and Andrew Y. Ng PASCAL Challenges Workshop April 12, 2005

Our approach Represent using syntactic dependencies But also use semantic annotations. Try to handle language variability. Perform semantic inference over this representation Use linguistic knowledge sources. Compute a “cost” for inferring hypothesis from text. Low cost  Hypothesis is entailed.

Outline of this talk Representation of sentences Syntax: Parsing and post-processing Adding annotations on representation (e.g., semantic roles) Inference by graph matching Inference by abductive theorem proving A combined system Results and error analysis

Sentence processing Parse with a standard PCFG parser. [Klein & Manning, 2003] Al Qaeda: [Aa]l[ -]Qa’?[ie]da Train on some extra sentences from recent news. Used a high-performing Named Entity Recognizer (next slide) Force parse tree to be consistent with certain NE tags. Example: American Ministry of Foreign Affairs announced that Russia called the United States... (S (NP (NNP American_Ministry_of_Foreign_Affairs)) (VP (VBD announced) (…)))

Named Entity Recognizer Trained a robust conditional random field model. [Finkel et al., 2003] Interpretation of numeric quantity statements Example: T: Kessler's team conducted 60,643 face-to-face interviews with adults in 14 countries. H: Kessler's team interviewed more than 60,000 adults in 14 countries.TRUE Annotate numerical values implied by: “6.2 bn”, “more than 60000”, “around 10”, … MONEY/DATE named entities

Parse tree post-processing Recognize collocations using WordNet Example: Shrek 2 rang up $92 million. (S (NP (NNP Shrek) (CD 2)) (VP (VBD rang_up) (NP (QP ($ $) (CD 92) (CD million)))) (..)) MONEY, 9200000

Parse tree  Dependencies Find syntactic dependencies Transform parse tree representations into typed syntactic dependencies, including a certain amount of collapsing and normalization Example: Bill’s mother walked to the grocery store. subj(walked, mother) poss(mother, Bill) to(walked, store) nn(store, grocery) Dependencies can also be written as a logical formula mother(A) Bill(B) poss(B, A) grocery(C) store(C) walked(E, A, C) Basic representations

Representations Dependency graph Logical formula mother(A) Bill(B) poss(B, A) grocery(C) store(C) walked(E, A, C) poss grocery store walked mother Bill subjto nn Can make representation richer “walked” is a verb “Bill” is a PERSON (named entity). “store” is the location/destination of “walked”. … PERSON VBD ARGM-LOC VBD PERSON ARGM-LOC

Annotations Parts-of-speech, named entities Already computed. Semantic roles Example: T: C and D Technologies announced that it has closed the acquisition of Datel, Inc. H 1 : C and D Technologies acquired Datel Inc. TRUE H 2 : Datel acquired C and D Technologies. FALSE Use a state-of-the-art semantic role classifier to label verb arguments. [Toutanova et al. 2005]

More annotations Coreference Example: T: Since its formation in 1948, Israel … H: Israel was established in 1948. TRUE Use a conditional random field model for coreference detection. Note: Appositive “references” were previously detected. T: Bush, the President of USA, went to Florida. H: Bush is the President of USA. TRUE Other annotations Word stems (very useful) Word senses (no performance gain in our system)

Event nouns Use a heuristic to find event nouns Augment text representation using WordNet derivational links. Example: T: … witnessed the murder of police commander... H: Police officer killed. TRUE Text logical formula: murder(M) police_commander(P) of(M, P) Augment: murder(E, M, P) NOUN: VERB:

Graph Matching Approach Why Graph Matching?: Dependency tree has natural graphical interpretation Successful in other domains: e.g., Lossy image matching Input: Hypothesis (H) and Text (T) Graphs Toy example: Vertices are words and phrases Edges are labeled dependencies Output: Cost of matching H to T (next slide) BMW bought John ARG0(Agent)ARG1(Theme) PERSON T: John bought a BMW.H: John purchased a car. TRUE car purchased John ARG0(Agent)ARG1(Theme) PERSON

Graph Matching: Idea Idea: Align H to T so that vertices are similar and preserve relations (as in machine translation) A matching M is a mapping from vertices of H to vertices of T Thus, for each vertex v in H, M(v) is a vertex in T BMW bought John ARG0(Agent)ARG1(Theme) PERSON car purchased John ARG0(Agent)ARG1(Theme) PERSON T H matching

Graph Matching: Costs The cost of a matching MatchCost(M) measures the “quality” of a matching M VertexCost(M) – Compare vertices in H with matched vertices in T RelationCost(M) – Compare edges (relations) in H with corresponding edges (relations) in T MatchCost(M) = (1 - ß) VertexCost(M) + ß RelationCost(M)

Graph Matching: Costs VertexCost(M) For each vertex v in H, and vertex M(v) in T: Do vertex heads share same stem and/or POS ? Is T vertex head a hypernym of H vertex head? Are vertex heads “similar” phrases? (next slide) RelationCost(M) For each edge (v,v’) in H, and edge (M(v),M(v’)) in T Are parent/child pairs in H parent/child in T ? Are parent/child pairs in H ancestor/descendant in T ? Do parent/child pairs in H share a common ancestor in T?

Digression: Phrase similarity Measures based on WordNet (Resnik/Lesk). Distributional similarity Example: “run” and “marathon” are related. Latent Semantic Analysis to discover words that are distributionally similar (i.e., have common neighbors). Used a web-search based measure Query google.com for all pages with: “run” “marathon” Both “run” and “marathon” Learning paraphrases. [Similar to DIRT: Lin and Pantel, 2001] “World knowledge” (labor intensive) CEO = Chief_Executive_Officer Philippines  Filipino [Can add common facts: “Paris is the capital of France”, …]

Graph Matching: Costs VertexCost(M) For each vertex v in H, and vertex M(v) in T: Do vertex heads share same stem and or POS ? Is T vertex head a hypernym of H vertex head? Are vertex heads “similar” phrases? (next slide) RelationCost(M) For each edge (v,v’) in H, and edge (M(v),M(v’)) in T Are parent/child pairs in H parent/child in T ? Are parent/child pairs in H ancestor/descendant in T ? Do parent/child pairs in H share a common ancestor in T?

Graph Matching: Example VertexCost: (0.0 + 0.2 + 0.4)/3 = 0.2 RelationCost: 0 (Graphs Isomorphic) ß = 0.45 (say) MatchCost: 0.55 * (0.2) + 0.45 * (0.0) = 0.11 BMW bought John ARG0(Agent)ARG1(Theme) PERSON car purchased John ARG0(Agent)ARG1(Theme) PERSON Hypernym Match Cost: 0.4 Exact Match Cost: 0.0 Synonym Match Cost: 0.2

Abductive inference Idea: Represent text and hypothesis as logical formulae. A hypothesis can be inferred from the text if and only if the hypothesis logical formula can be proved from the text logical formula. Toy example: T: John bought a BMW.H: John purchased a car. TRUE John(A) BMW(B) bought(E, A, B)John(x) car(y) purchased(z, x, y) Prove ? Allow assumptions at various “costs” BMW(t) + $2 => car(t) bought(p, q, r) + $1 => purchased(p, q, r)

Abductive assumptions Assign costs to all assumptions of the form: P(p 1, p 2, …, p m ) Q(q 1, q 2, …, q n ) Build an assumption cost model Predicate match cost Phrase similarity cost? Same named entity type? … Argument match cost Same semantic role? Constant similarity cost? …

Abductive theorem proving Each assumption provides a potential proof step. Find the proof with the minimum total cost Uniform cost search If there is a low-cost proof, the hypothesis is entailed. Example: T: John(A) BMW(B) bought(E, A, B)H: John(x) car(y) purchased(z, x, y) Here is a possible proof by resolution refutation (for the earlier costs): $0-John(x) -car(y) -purchased(z, x, y)[Given: negation of hypothesis] $0-car(y) -purchased(z, A, y)[Unify with John(A)] $2-purchased(z, A, B)[Unify with BMW(B)] $3NULL[Unify with purchased(E, A, B)] Proof cost = 3

Abductive theorem proving Can automatically learn good assumption costs Start from a labeled dataset (e.g.: the PASCAL development set) Intuition: Find assumptions that are used in the proofs for TRUE examples, and lower their costs (by framing a log-linear model). Iterate. [Details: Raina et al., in submission]

Some interesting features Examples of handling “complex” constructions in graph matching/abductive inference. Antonyms/Negation: High cost for matching verbs, if they are antonyms or one is negated and the other not. T: Stocks fell. H: Stocks rose. FALSE T: Clinton’s book was not a hit H: Clinton’s book was a hit. FALSE Non-factive verbs: T: John was charged for doing X. H: John did X. FALSE Can detect because “doing” in text has non-factive “charged” as a parent but “did” does not have such a parent.

Some interesting features “Superlative check” T: This is the tallest tower in western Japan. H: This is the tallest tower in Japan. FALSE

Results Combine inference methods Each system produces a score. Separately normalize each system’s score variance. Suppose normalized scores are s 1 and s 2. Final score S = w 1 s 1 + w 2 s 2 Learn classifier weights w 1 and w 2 on the development set using logistic regression. Two submissions: Train one classifier weight for all RTE tasks. (General) Train different classifier weights for each RTE task. (ByTask)

Results GeneralByTask AccuracyCWSAccuracyCWS DevSet 164.8%0.77865.5%0.805 DevSet 252.1%0.57855.7%0.661 DevSet 1 + DevSet 2 58.5%0.67960.8%0.743 Test Set56.3%0.62055.3%0.686 Balanced predictions. [55.4%, 51.2% predicted TRUE on test set.] Best other results: Accuracy=58.6%, CWS=0.617

Results by task GeneralByTask AccuracyCWSAccuracyCWS CD79.3%0.90384.7%0.926 IE47.5%0.49355.0%0.590 IR56.7%0.59055.6%0.604 MT46.7%0.48047.5%0.479 PP58.0%0.62354.0%0.535 QA48.5%0.47843.9%0.466 RC52.9%0.52350.7%0.480

Partial coverage results Can also draw coverage-CWS curves. For example: at 50% coverage, CWS = 0.781 at 25% coverage, CWS = 0.873 ByTask General ByTask Task-specific optimization seems better!

Some interesting issues Phrase similarity away from the coast  farther inland won victory in presidential election  became President stocks get a lift  stocks rise life threatening  fatal Dictionary definitions believe there is only one God  are monotheistic “World knowledge” K Club, venue of the Ryder Cup, …  K Club will host the Ryder Cup

Future directions Need more NLP components in there: Better treatment of frequent nominalizations, parenthesized material, etc. Need much more ability to do inference Fine distinctions between meanings, and fine similarities. e.g., “reach a higher level” and “rise” We need a high-recall, reasonable precision similarity measure! Other resources (e.g., antonyms) are also very sparse. More task-specific optimization.

Thanks!

Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,

Similar presentations

Presentation on theme: "Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,

Similar presentations

Presentation on theme: "Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,"— Presentation transcript:

Similar presentations

About project

Feedback