Presentation is loading. Please wait.

Presentation is loading. Please wait.

Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,

Similar presentations


Presentation on theme: "Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,"— Presentation transcript:

1 RTE @ Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe, Christopher D. Manning and Andrew Y. Ng PASCAL Challenges Workshop April 12, 2005

2 Our approach Represent using syntactic dependencies But also use semantic annotations. Try to handle language variability. Perform semantic inference over this representation Use linguistic knowledge sources. Compute a “cost” for inferring hypothesis from text. Low cost  Hypothesis is entailed.

3 Outline of this talk Representation of sentences Syntax: Parsing and post-processing Adding annotations on representation (e.g., semantic roles) Inference by graph matching Inference by abductive theorem proving A combined system Results and error analysis

4 Sentence processing Parse with a standard PCFG parser. [Klein & Manning, 2003] Al Qaeda: [Aa]l[ -]Qa’?[ie]da Train on some extra sentences from recent news. Used a high-performing Named Entity Recognizer (next slide) Force parse tree to be consistent with certain NE tags. Example: American Ministry of Foreign Affairs announced that Russia called the United States... (S (NP (NNP American_Ministry_of_Foreign_Affairs)) (VP (VBD announced) (…)))

5 Named Entity Recognizer Trained a robust conditional random field model. [Finkel et al., 2003] Interpretation of numeric quantity statements Example: T: Kessler's team conducted 60,643 face-to-face interviews with adults in 14 countries. H: Kessler's team interviewed more than 60,000 adults in 14 countries.TRUE Annotate numerical values implied by: “6.2 bn”, “more than 60000”, “around 10”, … MONEY/DATE named entities

6 Parse tree post-processing Recognize collocations using WordNet Example: Shrek 2 rang up $92 million. (S (NP (NNP Shrek) (CD 2)) (VP (VBD rang_up) (NP (QP ($ $) (CD 92) (CD million)))) (..)) MONEY, 9200000

7 Parse tree  Dependencies Find syntactic dependencies Transform parse tree representations into typed syntactic dependencies, including a certain amount of collapsing and normalization Example: Bill’s mother walked to the grocery store. subj(walked, mother) poss(mother, Bill) to(walked, store) nn(store, grocery) Dependencies can also be written as a logical formula mother(A) Bill(B) poss(B, A) grocery(C) store(C) walked(E, A, C) Basic representations

8 Representations Dependency graph Logical formula mother(A) Bill(B) poss(B, A) grocery(C) store(C) walked(E, A, C) poss grocery store walked mother Bill subjto nn Can make representation richer “walked” is a verb “Bill” is a PERSON (named entity). “store” is the location/destination of “walked”. … PERSON VBD ARGM-LOC VBD PERSON ARGM-LOC

9 Annotations Parts-of-speech, named entities Already computed. Semantic roles Example: T: C and D Technologies announced that it has closed the acquisition of Datel, Inc. H 1 : C and D Technologies acquired Datel Inc. TRUE H 2 : Datel acquired C and D Technologies. FALSE Use a state-of-the-art semantic role classifier to label verb arguments. [Toutanova et al. 2005]

10 More annotations Coreference Example: T: Since its formation in 1948, Israel … H: Israel was established in 1948. TRUE Use a conditional random field model for coreference detection. Note: Appositive “references” were previously detected. T: Bush, the President of USA, went to Florida. H: Bush is the President of USA. TRUE Other annotations Word stems (very useful) Word senses (no performance gain in our system)

11 Event nouns Use a heuristic to find event nouns Augment text representation using WordNet derivational links. Example: T: … witnessed the murder of police commander... H: Police officer killed. TRUE Text logical formula: murder(M) police_commander(P) of(M, P) Augment: murder(E, M, P) NOUN: VERB:

12 Outline of this talk Representation of sentences Syntax: Parsing and post-processing Adding annotations on representation (e.g., semantic roles) Inference by graph matching Inference by abductive theorem proving A combined system Results and error analysis

13 Graph Matching Approach Why Graph Matching?: Dependency tree has natural graphical interpretation Successful in other domains: e.g., Lossy image matching Input: Hypothesis (H) and Text (T) Graphs Toy example: Vertices are words and phrases Edges are labeled dependencies Output: Cost of matching H to T (next slide) BMW bought John ARG0(Agent)ARG1(Theme) PERSON T: John bought a BMW.H: John purchased a car. TRUE car purchased John ARG0(Agent)ARG1(Theme) PERSON

14 Graph Matching: Idea Idea: Align H to T so that vertices are similar and preserve relations (as in machine translation) A matching M is a mapping from vertices of H to vertices of T Thus, for each vertex v in H, M(v) is a vertex in T BMW bought John ARG0(Agent)ARG1(Theme) PERSON car purchased John ARG0(Agent)ARG1(Theme) PERSON T H matching

15 Graph Matching: Costs The cost of a matching MatchCost(M) measures the “quality” of a matching M VertexCost(M) – Compare vertices in H with matched vertices in T RelationCost(M) – Compare edges (relations) in H with corresponding edges (relations) in T MatchCost(M) = (1 - ß) VertexCost(M) + ß RelationCost(M)

16 Graph Matching: Costs VertexCost(M) For each vertex v in H, and vertex M(v) in T: Do vertex heads share same stem and/or POS ? Is T vertex head a hypernym of H vertex head? Are vertex heads “similar” phrases? (next slide) RelationCost(M) For each edge (v,v’) in H, and edge (M(v),M(v’)) in T Are parent/child pairs in H parent/child in T ? Are parent/child pairs in H ancestor/descendant in T ? Do parent/child pairs in H share a common ancestor in T?

17 Digression: Phrase similarity Measures based on WordNet (Resnik/Lesk). Distributional similarity Example: “run” and “marathon” are related. Latent Semantic Analysis to discover words that are distributionally similar (i.e., have common neighbors). Used a web-search based measure Query google.com for all pages with: “run” “marathon” Both “run” and “marathon” Learning paraphrases. [Similar to DIRT: Lin and Pantel, 2001] “World knowledge” (labor intensive) CEO = Chief_Executive_Officer Philippines  Filipino [Can add common facts: “Paris is the capital of France”, …]

18 Graph Matching: Costs VertexCost(M) For each vertex v in H, and vertex M(v) in T: Do vertex heads share same stem and or POS ? Is T vertex head a hypernym of H vertex head? Are vertex heads “similar” phrases? (next slide) RelationCost(M) For each edge (v,v’) in H, and edge (M(v),M(v’)) in T Are parent/child pairs in H parent/child in T ? Are parent/child pairs in H ancestor/descendant in T ? Do parent/child pairs in H share a common ancestor in T?

19 Graph Matching: Example VertexCost: (0.0 + 0.2 + 0.4)/3 = 0.2 RelationCost: 0 (Graphs Isomorphic) ß = 0.45 (say) MatchCost: 0.55 * (0.2) + 0.45 * (0.0) = 0.11 BMW bought John ARG0(Agent)ARG1(Theme) PERSON car purchased John ARG0(Agent)ARG1(Theme) PERSON Hypernym Match Cost: 0.4 Exact Match Cost: 0.0 Synonym Match Cost: 0.2

20 Outline of this talk Representation of sentences Syntax: Parsing and post-processing Adding annotations on representation (e.g., semantic roles) Inference by graph matching Inference by abductive theorem proving A combined system Results and error analysis

21 Abductive inference Idea: Represent text and hypothesis as logical formulae. A hypothesis can be inferred from the text if and only if the hypothesis logical formula can be proved from the text logical formula. Toy example: T: John bought a BMW.H: John purchased a car. TRUE John(A) BMW(B) bought(E, A, B)John(x) car(y) purchased(z, x, y) Prove ? Allow assumptions at various “costs” BMW(t) + $2 => car(t) bought(p, q, r) + $1 => purchased(p, q, r)

22 Abductive assumptions Assign costs to all assumptions of the form: P(p 1, p 2, …, p m ) Q(q 1, q 2, …, q n ) Build an assumption cost model Predicate match cost Phrase similarity cost? Same named entity type? … Argument match cost Same semantic role? Constant similarity cost? …

23 Abductive theorem proving Each assumption provides a potential proof step. Find the proof with the minimum total cost Uniform cost search If there is a low-cost proof, the hypothesis is entailed. Example: T: John(A) BMW(B) bought(E, A, B)H: John(x) car(y) purchased(z, x, y) Here is a possible proof by resolution refutation (for the earlier costs): $0-John(x) -car(y) -purchased(z, x, y)[Given: negation of hypothesis] $0-car(y) -purchased(z, A, y)[Unify with John(A)] $2-purchased(z, A, B)[Unify with BMW(B)] $3NULL[Unify with purchased(E, A, B)] Proof cost = 3

24 Abductive theorem proving Can automatically learn good assumption costs Start from a labeled dataset (e.g.: the PASCAL development set) Intuition: Find assumptions that are used in the proofs for TRUE examples, and lower their costs (by framing a log-linear model). Iterate. [Details: Raina et al., in submission]

25 Some interesting features Examples of handling “complex” constructions in graph matching/abductive inference. Antonyms/Negation: High cost for matching verbs, if they are antonyms or one is negated and the other not. T: Stocks fell. H: Stocks rose. FALSE T: Clinton’s book was not a hit H: Clinton’s book was a hit. FALSE Non-factive verbs: T: John was charged for doing X. H: John did X. FALSE Can detect because “doing” in text has non-factive “charged” as a parent but “did” does not have such a parent.

26 Some interesting features “Superlative check” T: This is the tallest tower in western Japan. H: This is the tallest tower in Japan. FALSE

27 Outline of this talk Representation of sentences Syntax: Parsing and post-processing Adding annotations on representation (e.g., semantic roles) Inference by graph matching Inference by abductive theorem proving A combined system Results and error analysis

28 Results Combine inference methods Each system produces a score. Separately normalize each system’s score variance. Suppose normalized scores are s 1 and s 2. Final score S = w 1 s 1 + w 2 s 2 Learn classifier weights w 1 and w 2 on the development set using logistic regression. Two submissions: Train one classifier weight for all RTE tasks. (General) Train different classifier weights for each RTE task. (ByTask)

29 Results GeneralByTask AccuracyCWSAccuracyCWS DevSet 164.8%0.77865.5%0.805 DevSet 252.1%0.57855.7%0.661 DevSet 1 + DevSet 2 58.5%0.67960.8%0.743 Test Set56.3%0.62055.3%0.686 Balanced predictions. [55.4%, 51.2% predicted TRUE on test set.] Best other results: Accuracy=58.6%, CWS=0.617

30 Results by task GeneralByTask AccuracyCWSAccuracyCWS CD79.3%0.90384.7%0.926 IE47.5%0.49355.0%0.590 IR56.7%0.59055.6%0.604 MT46.7%0.48047.5%0.479 PP58.0%0.62354.0%0.535 QA48.5%0.47843.9%0.466 RC52.9%0.52350.7%0.480

31 Partial coverage results Can also draw coverage-CWS curves. For example: at 50% coverage, CWS = 0.781 at 25% coverage, CWS = 0.873 ByTask General ByTask Task-specific optimization seems better!

32 Some interesting issues Phrase similarity away from the coast  farther inland won victory in presidential election  became President stocks get a lift  stocks rise life threatening  fatal Dictionary definitions believe there is only one God  are monotheistic “World knowledge” K Club, venue of the Ryder Cup, …  K Club will host the Ryder Cup

33 Future directions Need more NLP components in there: Better treatment of frequent nominalizations, parenthesized material, etc. Need much more ability to do inference Fine distinctions between meanings, and fine similarities. e.g., “reach a higher level” and “rise” We need a high-recall, reasonable precision similarity measure! Other resources (e.g., antonyms) are also very sparse. More task-specific optimization.

34 Thanks!


Download ppt "Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,"

Similar presentations


Ads by Google