Download presentation
Presentation is loading. Please wait.
1
RTE @ Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe, Christopher D. Manning and Andrew Y. Ng PASCAL Challenges Workshop April 12, 2005
2
Our approach Represent using syntactic dependencies But also use semantic annotations. Try to handle language variability. Perform semantic inference over this representation Use linguistic knowledge sources. Compute a “cost” for inferring hypothesis from text. Low cost Hypothesis is entailed.
3
Outline of this talk Representation of sentences Syntax: Parsing and post-processing Adding annotations on representation (e.g., semantic roles) Inference by graph matching Inference by abductive theorem proving A combined system Results and error analysis
4
Sentence processing Parse with a standard PCFG parser. [Klein & Manning, 2003] Al Qaeda: [Aa]l[ -]Qa’?[ie]da Train on some extra sentences from recent news. Used a high-performing Named Entity Recognizer (next slide) Force parse tree to be consistent with certain NE tags. Example: American Ministry of Foreign Affairs announced that Russia called the United States... (S (NP (NNP American_Ministry_of_Foreign_Affairs)) (VP (VBD announced) (…)))
5
Named Entity Recognizer Trained a robust conditional random field model. [Finkel et al., 2003] Interpretation of numeric quantity statements Example: T: Kessler's team conducted 60,643 face-to-face interviews with adults in 14 countries. H: Kessler's team interviewed more than 60,000 adults in 14 countries.TRUE Annotate numerical values implied by: “6.2 bn”, “more than 60000”, “around 10”, … MONEY/DATE named entities
6
Parse tree post-processing Recognize collocations using WordNet Example: Shrek 2 rang up $92 million. (S (NP (NNP Shrek) (CD 2)) (VP (VBD rang_up) (NP (QP ($ $) (CD 92) (CD million)))) (..)) MONEY, 9200000
7
Parse tree Dependencies Find syntactic dependencies Transform parse tree representations into typed syntactic dependencies, including a certain amount of collapsing and normalization Example: Bill’s mother walked to the grocery store. subj(walked, mother) poss(mother, Bill) to(walked, store) nn(store, grocery) Dependencies can also be written as a logical formula mother(A) Bill(B) poss(B, A) grocery(C) store(C) walked(E, A, C) Basic representations
8
Representations Dependency graph Logical formula mother(A) Bill(B) poss(B, A) grocery(C) store(C) walked(E, A, C) poss grocery store walked mother Bill subjto nn Can make representation richer “walked” is a verb “Bill” is a PERSON (named entity). “store” is the location/destination of “walked”. … PERSON VBD ARGM-LOC VBD PERSON ARGM-LOC
9
Annotations Parts-of-speech, named entities Already computed. Semantic roles Example: T: C and D Technologies announced that it has closed the acquisition of Datel, Inc. H 1 : C and D Technologies acquired Datel Inc. TRUE H 2 : Datel acquired C and D Technologies. FALSE Use a state-of-the-art semantic role classifier to label verb arguments. [Toutanova et al. 2005]
10
More annotations Coreference Example: T: Since its formation in 1948, Israel … H: Israel was established in 1948. TRUE Use a conditional random field model for coreference detection. Note: Appositive “references” were previously detected. T: Bush, the President of USA, went to Florida. H: Bush is the President of USA. TRUE Other annotations Word stems (very useful) Word senses (no performance gain in our system)
11
Event nouns Use a heuristic to find event nouns Augment text representation using WordNet derivational links. Example: T: … witnessed the murder of police commander... H: Police officer killed. TRUE Text logical formula: murder(M) police_commander(P) of(M, P) Augment: murder(E, M, P) NOUN: VERB:
12
Outline of this talk Representation of sentences Syntax: Parsing and post-processing Adding annotations on representation (e.g., semantic roles) Inference by graph matching Inference by abductive theorem proving A combined system Results and error analysis
13
Graph Matching Approach Why Graph Matching?: Dependency tree has natural graphical interpretation Successful in other domains: e.g., Lossy image matching Input: Hypothesis (H) and Text (T) Graphs Toy example: Vertices are words and phrases Edges are labeled dependencies Output: Cost of matching H to T (next slide) BMW bought John ARG0(Agent)ARG1(Theme) PERSON T: John bought a BMW.H: John purchased a car. TRUE car purchased John ARG0(Agent)ARG1(Theme) PERSON
14
Graph Matching: Idea Idea: Align H to T so that vertices are similar and preserve relations (as in machine translation) A matching M is a mapping from vertices of H to vertices of T Thus, for each vertex v in H, M(v) is a vertex in T BMW bought John ARG0(Agent)ARG1(Theme) PERSON car purchased John ARG0(Agent)ARG1(Theme) PERSON T H matching
15
Graph Matching: Costs The cost of a matching MatchCost(M) measures the “quality” of a matching M VertexCost(M) – Compare vertices in H with matched vertices in T RelationCost(M) – Compare edges (relations) in H with corresponding edges (relations) in T MatchCost(M) = (1 - ß) VertexCost(M) + ß RelationCost(M)
16
Graph Matching: Costs VertexCost(M) For each vertex v in H, and vertex M(v) in T: Do vertex heads share same stem and/or POS ? Is T vertex head a hypernym of H vertex head? Are vertex heads “similar” phrases? (next slide) RelationCost(M) For each edge (v,v’) in H, and edge (M(v),M(v’)) in T Are parent/child pairs in H parent/child in T ? Are parent/child pairs in H ancestor/descendant in T ? Do parent/child pairs in H share a common ancestor in T?
17
Digression: Phrase similarity Measures based on WordNet (Resnik/Lesk). Distributional similarity Example: “run” and “marathon” are related. Latent Semantic Analysis to discover words that are distributionally similar (i.e., have common neighbors). Used a web-search based measure Query google.com for all pages with: “run” “marathon” Both “run” and “marathon” Learning paraphrases. [Similar to DIRT: Lin and Pantel, 2001] “World knowledge” (labor intensive) CEO = Chief_Executive_Officer Philippines Filipino [Can add common facts: “Paris is the capital of France”, …]
18
Graph Matching: Costs VertexCost(M) For each vertex v in H, and vertex M(v) in T: Do vertex heads share same stem and or POS ? Is T vertex head a hypernym of H vertex head? Are vertex heads “similar” phrases? (next slide) RelationCost(M) For each edge (v,v’) in H, and edge (M(v),M(v’)) in T Are parent/child pairs in H parent/child in T ? Are parent/child pairs in H ancestor/descendant in T ? Do parent/child pairs in H share a common ancestor in T?
19
Graph Matching: Example VertexCost: (0.0 + 0.2 + 0.4)/3 = 0.2 RelationCost: 0 (Graphs Isomorphic) ß = 0.45 (say) MatchCost: 0.55 * (0.2) + 0.45 * (0.0) = 0.11 BMW bought John ARG0(Agent)ARG1(Theme) PERSON car purchased John ARG0(Agent)ARG1(Theme) PERSON Hypernym Match Cost: 0.4 Exact Match Cost: 0.0 Synonym Match Cost: 0.2
20
Outline of this talk Representation of sentences Syntax: Parsing and post-processing Adding annotations on representation (e.g., semantic roles) Inference by graph matching Inference by abductive theorem proving A combined system Results and error analysis
21
Abductive inference Idea: Represent text and hypothesis as logical formulae. A hypothesis can be inferred from the text if and only if the hypothesis logical formula can be proved from the text logical formula. Toy example: T: John bought a BMW.H: John purchased a car. TRUE John(A) BMW(B) bought(E, A, B)John(x) car(y) purchased(z, x, y) Prove ? Allow assumptions at various “costs” BMW(t) + $2 => car(t) bought(p, q, r) + $1 => purchased(p, q, r)
22
Abductive assumptions Assign costs to all assumptions of the form: P(p 1, p 2, …, p m ) Q(q 1, q 2, …, q n ) Build an assumption cost model Predicate match cost Phrase similarity cost? Same named entity type? … Argument match cost Same semantic role? Constant similarity cost? …
23
Abductive theorem proving Each assumption provides a potential proof step. Find the proof with the minimum total cost Uniform cost search If there is a low-cost proof, the hypothesis is entailed. Example: T: John(A) BMW(B) bought(E, A, B)H: John(x) car(y) purchased(z, x, y) Here is a possible proof by resolution refutation (for the earlier costs): $0-John(x) -car(y) -purchased(z, x, y)[Given: negation of hypothesis] $0-car(y) -purchased(z, A, y)[Unify with John(A)] $2-purchased(z, A, B)[Unify with BMW(B)] $3NULL[Unify with purchased(E, A, B)] Proof cost = 3
24
Abductive theorem proving Can automatically learn good assumption costs Start from a labeled dataset (e.g.: the PASCAL development set) Intuition: Find assumptions that are used in the proofs for TRUE examples, and lower their costs (by framing a log-linear model). Iterate. [Details: Raina et al., in submission]
25
Some interesting features Examples of handling “complex” constructions in graph matching/abductive inference. Antonyms/Negation: High cost for matching verbs, if they are antonyms or one is negated and the other not. T: Stocks fell. H: Stocks rose. FALSE T: Clinton’s book was not a hit H: Clinton’s book was a hit. FALSE Non-factive verbs: T: John was charged for doing X. H: John did X. FALSE Can detect because “doing” in text has non-factive “charged” as a parent but “did” does not have such a parent.
26
Some interesting features “Superlative check” T: This is the tallest tower in western Japan. H: This is the tallest tower in Japan. FALSE
27
Outline of this talk Representation of sentences Syntax: Parsing and post-processing Adding annotations on representation (e.g., semantic roles) Inference by graph matching Inference by abductive theorem proving A combined system Results and error analysis
28
Results Combine inference methods Each system produces a score. Separately normalize each system’s score variance. Suppose normalized scores are s 1 and s 2. Final score S = w 1 s 1 + w 2 s 2 Learn classifier weights w 1 and w 2 on the development set using logistic regression. Two submissions: Train one classifier weight for all RTE tasks. (General) Train different classifier weights for each RTE task. (ByTask)
29
Results GeneralByTask AccuracyCWSAccuracyCWS DevSet 164.8%0.77865.5%0.805 DevSet 252.1%0.57855.7%0.661 DevSet 1 + DevSet 2 58.5%0.67960.8%0.743 Test Set56.3%0.62055.3%0.686 Balanced predictions. [55.4%, 51.2% predicted TRUE on test set.] Best other results: Accuracy=58.6%, CWS=0.617
30
Results by task GeneralByTask AccuracyCWSAccuracyCWS CD79.3%0.90384.7%0.926 IE47.5%0.49355.0%0.590 IR56.7%0.59055.6%0.604 MT46.7%0.48047.5%0.479 PP58.0%0.62354.0%0.535 QA48.5%0.47843.9%0.466 RC52.9%0.52350.7%0.480
31
Partial coverage results Can also draw coverage-CWS curves. For example: at 50% coverage, CWS = 0.781 at 25% coverage, CWS = 0.873 ByTask General ByTask Task-specific optimization seems better!
32
Some interesting issues Phrase similarity away from the coast farther inland won victory in presidential election became President stocks get a lift stocks rise life threatening fatal Dictionary definitions believe there is only one God are monotheistic “World knowledge” K Club, venue of the Ryder Cup, … K Club will host the Ryder Cup
33
Future directions Need more NLP components in there: Better treatment of frequent nominalizations, parenthesized material, etc. Need much more ability to do inference Fine distinctions between meanings, and fine similarities. e.g., “reach a higher level” and “rise” We need a high-recall, reasonable precision similarity measure! Other resources (e.g., antonyms) are also very sparse. More task-specific optimization.
34
Thanks!
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.