Presentation is loading. Please wait.

Presentation is loading. Please wait.

COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.

Similar presentations


Presentation on theme: "COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006."— Presentation transcript:

1 COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006

2 @2006 Language Computer Corporation 2 LCC’s Submission to RTE2  Linear combination of three entailment scores 1.COGEX with constituency parse tree-derived logic forms 2.COGEX with dependency parse tree-derived logic forms 3.Lexical alignment between T and H For each pair i (T i,H i ) If then T i entails H i  Lambda ( λ ) parameters learned on the development data for each task (IE, IR, QA, SUM)

3 @2006 Language Computer Corporation 3 Semantic-based Logic Approach  Textual Entailment  Task definition: T entails H, denoted by T → H, if the meaning of H can be inferred from the meaning of T  inferred » logic (theorem prover + axioms)  meaning » semantics (semantic-enhanced representation)

4 @2006 Language Computer Corporation 4 Approach to RTE with COGEX  Transform the two text fragments into 3-layered logic forms  Syntactic  Semantic  Temporal  Automatically create axioms to be used during the proof  Lexical Chains axioms  World Knowledge axioms  Linguistic transformation axioms  Load COGEX’s SOS with T and  H and its USABLE list of clauses with the generated axioms,  Search for a proof by iteratively removing clauses from SOS and searching the USABLE for possible inferences until a refutation is found  If no contradiction is detected  Relax arguments  Drop entire predicates from H  Compute proof score semantic and temporal axioms

5 @2006 Language Computer Corporation 5 COGEX Enhancements (1/3)  Logic Form Transformation  Negations  not_RB(x1,e1) & walk_VB(e1,x2,x3) » - walk_VB(e1,x2,x3)  not_RB(x1,e1) & walk_VB(e1,x2,x3) & fast_RB(x4,e1) » -fast_RB(x4,e1)  no/DT case_NN(x1) & confirm_VB(e1,x2,x1) » - confirm_VB(e1,x2,x1)

6 @2006 Language Computer Corporation 6 COGEX Enhancements (1/3)  Logic Form Transformation  Temporal normalization of date/time predicates  13 th of January 1990 vs. January 13 th, 1990  13th_of_January_1990_NN(x1) vs. January_13th_1990_NN(x1)  time_TMP(BeginFN(x1), year, month, day, hour, minute, second) & time_TMP(EndFN(x1), year, month, day, hour, minute, second)  time_TMP(BeginFN(x1), 1990, 1, 13, 0, 0, 0) & time_TMP(EndFN(x1), 1990, 1, 13, 23, 59, 59)

7 @2006 Language Computer Corporation 7 COGEX Enhancements (1/3)  Logic Form Transformation  Temporal context SUMO predicates (Clark et al., 2005)  (S,E 1,E 2 ) : S is the temporal signal linking two events E 1 and E 2  during_TMP(e1,x1), earlier_TMP(e1,x1), …

8 @2006 Language Computer Corporation 8 Logic Forms Differences  Generate LF from two different sources  Constituency parse of the data  Dependency parse trees (data provided by the challenge organizers) ConstituencyDependency  Semantic information  Temporal information  Captures better the (long-range) syntactic dependencies  Temporal normalization (only)  NEs imported from the constituency LF whenever the tokens matched (no control over tokenization)

9 @2006 Language Computer Corporation 9 Logic Forms Differences  Gilda Flores was kidnapped on the 13 th of January 1990.  Constituency: Gilda_NN(x1) & Flores_NN(x2) & nn_NNC(x3,x1,x2) & _human_NE(x3) & kidnap_VB(e1,x9,x3) & on_IN(e1,x8) & 13th_NN(x4) & of_NN(x5) & January_NN(x6) & 1990_NN(x7) & nn_ NNC(x8,x4,x5,x6,x7) & _date_NE(x8) & THM_SR(x3,e1) & TMP_SR(x8,e1) & time_TMP(BeginFN(x1), 1990, 1, 13, 0, 0, 0) & time_TMP(EndFN(x1), 1990, 1, 13, 23, 59, 59) & during_TMP(e1,x8)  Dependency: Gilda_Flores_NN(x2) & _human_NE(x2) & kidnap_VB(e1,x4,x2) & on_IN(e1,x3) & 13th_NN(x3) & of_IN(x3,x1) & January_1990_NN(x1)

10 @2006 Language Computer Corporation 10 COGEX Enhancements (2/3)  Axioms on Demand  Lexical Chains  Consider the first k =3 senses for each word  Maximum length of a lexical chain = 3  DERIVATIONAL WordNet relation is ambiguous with respect to the role of the noun  Derivation-ACT: employ_VB(e1,x1,x2) → employment_NN(e1)  Derivation-AGENT: employ_VB(e1,x1,x2) → employer_NN(x1)  Derivation-THEME: employ_VB(e1,x1,x2) → employee_NN(x2)  Morphological derivations between adjectives and verbs

11 @2006 Language Computer Corporation 11 COGEX Enhancements (2/3)  Axioms on Demand  Lexical Chains  Augment with the NE predicate for NE target concepts  nicaraguan_JJ(x1,x2) → Nicaragua_NN(x1) & _country_NE(x1)  Discard lexical chains  with more than 2 HYPONYMY relations (H too specific)  with a HYPONYMY followed by an ISA  Chicago_NN(x1)  → Detroit_NN(x1)  which include general concepts: object/NN, act/VB, be/VB  n i = number of hyponyms of concept c i  N = number of concepts in c i ’s hierarchy

12 @2006 Language Computer Corporation 12 More Axioms  Another 73 World Knowledge axioms  Semantic Calculus – combinations of two semantic relations (82 axioms)  ISA, KINSHIP, CAUSE are transitive relations  ISA_SR(x1,x2) & PAH_SR(x3,x2) → PAH_SR(x3,x2)  Mike is a rich man → Mike is rich  Temporal Reasoning Axioms (Clark et al., 2005) (65 axioms)  Dates entail more general times  October 2000 → year 2000  during_TMP(e1,e2) & during_TMP(e2,e3) → during_TMP(e1,e3)

13 @2006 Language Computer Corporation 13 COGEX Enhancements (3/3)  Proof Re-Scoring  (T)  smart people →  people (H)  (T)  people  →  smart people (H)  Entities mentioned in T and H are existentially quantified  Universally quantified T and H entities  (T)  people →  smart people (H)  (T)  smart people  →  people (H)

14 @2006 Language Computer Corporation 14 Shallow Lexical Alignment  Compute the edit distance between T and H  Cost (deletion of a word from T) = 0  Cost (replace of a word from T with another in H) = ∞  Cost (insert a word from H) =  Edit distance between synonyms = 0 T:The Council of Europehas45 member states.Three countries from … DELINSDEL H:The Council of Europeis made up by45 member states.

15 @2006 Language Computer Corporation 15 Results Learned parameters:  IE: score given by COGEX C with some correction from COGEX D  IR: the highest contribution is made by LexAlign (~62%)  COGEX D better on IE, IR, QA (~69% accuracy)  COGEX C better on SUM (~66% accuracy)  Three-way combination outperforms any individual results and any two-system combination

16 @2006 Language Computer Corporation 16 Results  Higher accuracy on the SUM task  SUM is the highest accuracy task for all systems (false entailment pairs had H completely unrelated with the texts T)  IE: highest number of false positives  Need more World Knowledge  (QA task) 15 safety violations → numerous safety violations  Upper bound (human performance) for RTE2 test  97% proportional agreement  Kappa agreement: K = 0.94 (good agreement)  Fewer controversial examples in this year’s test  Performance on RTE1 test: 69% accuracy

17 @2006 Language Computer Corporation 17 Future Work  Other types of context: report, planning, etc.  Pairs (T:X said Y, H:Y) labeled as both TRUE and FALSE  Need for more axioms  Paraphrase acquisition (phrase 1 → phrase 2 )  Automatic gathering of semantic axioms  Lexical chains link only concepts  WordNet gloss axioms link a concept to a phrase

18 Thank You ! Questions?


Download ppt "COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006."

Similar presentations


Ads by Google