Penn 1 Kindle: Knowledge and Inference via Description Logics for Natural Language Dan Roth University of Illinois, Urbana-Champaign Martha Palmer University.

Penn 1 Kindle: Knowledge and Inference via Description Logics for Natural Language Dan Roth University of Illinois, Urbana-Champaign Martha Palmer University of Pennsylvania kindle: encourage, stimulate, promote, inspire Cross Cutting/Enabling Technologies

Penn Page 2 Progress in Natural Language Understanding requires the ability to learn, represent and reason with respect to structured and relational data. Learning, Representing and Reasoning take part at several levels in the understanding process. A unified knowledge representation of the text, that provides an hierarchical encoding of the structural, relational and semantic properties of the given text is integrated with learning mechanisms that can be used to induce such information from newly observed raw text, and that is equipped with an inferential mechanism that can be used to support inferences with respect to such representations. Fundamental Claim

Penn Page 3 Given: Q: Who acquired Overture? Determine: A: Eyeing the huge market potential, currently led by Google, Yahoo took over search company Overture Services Inc last year. Fundamental Task Eyeing the huge market potential, currently led by Google, Yahoo took over search company Overture Services Inc last year Yahoo acquired Overture Entails Subsumed by  (and distinguish from other candidates)

Penn Page 4 General Strategy Given a sentence (answer) Find the optimal set of transformations that maps one sentence to the target sentence. Given a KB of semantic; structural and pragmatic transformations (rules). Given a sentence (question) ee Represent as a concept graph Embellish the representation

Penn Page 5 Generating a Representation Learning Test (extended) subsumption Inference: between two sentence representations Match rules to current representation Matching a substructure (rule’s body) to representation Find optimal mapping (to allow choosing best candidate) Inference (Optimization) Processes + Feature extraction (subsumption)+ Inference Learning + Learning

Penn Page 6 Representation Progress on the representation language; parsing and syntactic sugaring From sentences to concept graphs Real time mapping of sentences to a concept graph representation (read: description logic representation) (a learning+inference task) : a demonstration Resources and Inference Rules Identification of required resources and types of rules Learning and Inference for Extended Subsumption This talk: Resources; progress on semantic parsing A brief introduction to our Integer Learning Programming (ILP) inference framework (and some about learning issues in it). (general framework; broader appeal) Tools and Collaborations Progress

Penn Page 7 Scenario: Global decisions in which several local decisions/components play a role, but there are mutual dependencies on their outcome. Assume: Possible to learned classifiers for different sub-problems Constraints on classifiers’ labels (may be known during training or only at evaluation time). Goal: Incorporate classifiers’ predictions, along with the constraints, in making coherent decisions – decisions that respect the classifiers as well as domain/context specific constrains. Formally: Global inference for best assignment to all variables of interest. Using an Integer Linear Programming formulation to study coherent inferences that respect of domain and task specific constraint. (Learning Decoupled from or Interleaved with Inference) Inference with Classifiers

Penn Page 8 Learning and Inference Problems in NLP Pipelining is a crude approximation; interactions occur across levels and down stream decisions often interact with previous decisions. Leads to propagation of errors Occasionally, later stage problems are easier but upstream mistakes will not be corrected. We propose: Global inference over the outcomes of different (learned) predictors as a way to break away from this paradigm. Supports general constraint structure (not amenable to dynamic programming) Allows a flexible way to incorporate linguistic and structural constraints. POS TaggingPhrasesSemantic EntitiesRelations Vehicle for the study of non-pipeline approaches Most problems are not single classification problems ParsingWSDSemantic Role Labeling

Penn Page 9 Learning structured representations: Learning a semantic parse by learning to make “local” decisions: Candidate arguments; Types of arguments and constraints among them: # or type of arguments the verb likes; arguments the can live together, etc. [Punyakanok,Roth,Yih,Zimak; COLING’04] Simultaneous identification of semantic categories and relations among them. Learn semantic categories (entities); learn relations among them; Use natural constraints [Leaves-In( Person, Location)], to find global solution [Roth, Yih CoNLL’04] [Zelenko,SRA@ACE] Determine how to map one structure to another Which of the applicable (transformations) rules to apply Exploit constraints to determine how to prefer one set of rules over another Many many others All are combinatorial optimization problems of the same type. Applications

Penn Page 10 Learning can be interleaved with inference Problem Setting x4x4 x5x5 x6x6 x7x7 x8x8 x1x1 x2x2 x3x3 z C(x 1,x 4 ) C(x 2,x 3,x 6,x 7,x 8 ) Random Variables X: Conditional Distributions P (learned by classifiers) Constraints C – any Boolean function defined on partial assignments (possible weights W on constraints) Goal: Find the “best” assignment The assignment that achieves the highest global accuracy. This is an Integer Programming Problem X*=argmax X P  X subject to constraints C (+ W  C)

Penn Page 11 Semantic Role Labeling Assign type-likelihood (“My Pearls = Arg1 |Arg2”) How likely is it that arg a is type t? For all potential arguments a  POTARG, t  T P (argument a = type t ) I left my nice pearls to her [ [ [ [ [ ] ] ] ] ] I left my nice pearls to her 0.3 0.2 0.2 0.3 0.6 0.0 0.0 0.4 A0 C-A1A1Ø

Penn Page 12 Inference Maximize the expected number of correct argument predictions T* = argmax T  i P( a i = t i ) Subject to some constraints Structural and Linguistic (R-A1  A1) 0.3 0.2 0.2 0.3 0.6 0.0 0.0 0.4 0.1 0.3 0.5 0.1 0.1 0.2 0.3 0.4 I left my nice pearls to her Cost = 0.3 + 0.4 + 0.5 + 0.4 = 1.6Non-OverlappingCost = 0.3 + 0.4 + 0.3 + 0.4 = 1.4 Blue  Red & N-O Cost = 0.3 + 0.6 + 0.5 + 0.4 = 1.8Independent Max

Penn Page 13 LP Formulation – Linear Cost Cost function  a  P OT A RG P(a=t) =  a  P OT A RG, t  T P(a=t) x {a=t} Indicator variables x {a1= A0 }, x {a1= A1 }, …, x {a4= AM-LOC }, x {P4=  }  {0,1} Total Cost = p (a1= A0 ) · x (a1= A1 ) + p (a1=  ) · x (a1=  ) +… + p (a4=  ) · x (a4=  )

Penn Page 14 Binary values  a  P OT A RG, t  T, x { a = t }  {0,1} Unique labels  a  P OT A RG,  t  T x {a = t} = 1 No overlapping or embedding a1 and a2 overlap  x {a1= Ø } + x {a2= Ø }  1 Linear Constraints (1/2)

Penn Page 15 No duplicate argument classes  a  P OT A RG x { a = A0 }  1 R-XXX  a2  P OT A RG,  a  P OT A RG x { a = A0 }  x { a2 = R-A0 } C-XXX  a2  P OT A RG,  (a  P OT A RG )  (a is before a2 ) x { a = A0 }  x { a2 = C-A0 } Exactly one argument of type Z (e.g, verb) Given a verb, what argument types may appear. Any Boolean Rule can be encoded as a linear constraint. Experimental advantages already shown is several problems Linear Constraints (2/2)

Penn Page 16 Extended Subsumption Inference Q: What are the sexual discrimination allegations Morgan Stanley will fight against on July 7th? A: Wall Street brokerage Morgan Stanley will defend itself on Wednesday against accusations it denied women promotions, allowed sexual grouping, office strip shows, and other forms of sexual discrimination.

Penn Page 17 Extended Subsumption Arg0 ArgM- TMP fight Morgan Stanley against sexual discrimination on July 7th UNKNOWN allegation Arg0 Arg1 defend Morgan Stanley itself on Wednesday against accusation … ArgM-TMP S 1  S 2 if there exists a PROOF (sequence of rule applications) such that S 1 ’ = PROOF(S 1 )  e S 2 If there are several proofs, choose the “optimal” one This can be formalized as optimizing an objective function min PROOF  r  PROOF c r r Decoupling/interleaving learning & inference

Penn Page 18 Kindle@Penn: Lexical Resources for KindleLexical Resources for Kindle VerbNet PropBank Mapping between PropBank and FrameNet CogComp Tools: Shallow Parser Semantic Parser Question Classification NE I-Track: Identification and tracing of entities Server to IBM Mitre Tools and Collaboration

Penn Page 19 Resources: “Bank Map” Treebank PropBank +frames NomBank +frames Sense tags, Coreference and Ontology links OntoBank VerbNet: Chinese FrameNet WordNet: PropBank2 Events, DTB, … Chinese

Penn Page 20 Summary of Needed KINDLE Resources Knw. Resources/ Processing Levels Available ResourcesProjected Resources SyntacticParaphrase lexicon LexicalWordNet; (some) Causal verbs KB;Grouped WN senses, Lexical KBs; GrammaticalPropBank; (Frame files and taggers) Semantic WordNet; PropBank Frame files, SR Tagger; VerbNet NomLex; NomBank taggers, I-TREC (coreferece tagger); FrameNet; FrameNet taggers, Named Entity Recognizer; PropBank II Extended VerbNet Tom Morton’s coref tagger PB/VN/FrameNet mapping DiscoursePenn Discourse Treebank; World Knw.WordNet; CYC; Omega

Penn Page 21 Frames File example: give > 4000 Framesets for PropBank Roles: Arg0: giver Arg1: thing given Arg2: entity given to Example: double object The executives gave the chefs a standing ovation. Arg0: The executives REL: gave Arg2: the chefs Arg1: a standing ovation

Penn Page 22 NomBank Frames File example: gift (nominalizations, noun predicates, partitives, etc. Roles: Arg0: giver Arg1: thing given Arg2: entity given to Example: double object Nancy’s gift from her cousin was a complete surprise. Arg0: her cousin REL: gave Arg2: Nancy Arg1: gift

Penn Page 23 Frames File example: give w/ Thematic Role Labels Roles: Arg0: giver Arg1: thing given Arg2: entity given to Example: double object The executives gave the chefs a standing ovation. Arg0: Agent The executives REL: gave Arg2: Recipient the chefs Arg1: Theme a standing ovation VerbNet – based on Levin classes

Penn Page 24 Semantic Role Labeling  A0 represents the leaver,  A1 represents the thing left,  A2 represents the benefactor,  AM-LOC is an adjunct indicating the location of the action,  V determines the verb. For each verb in a sentence: identify all constituents that fill a semantic role & determine their roles

Penn Page 25 Approach to Semantic Role Labeling Pre-processing: A heuristic which filters out unwanted constituents with significant confidence Argument Identification A binary SVM classifier which identifies arguments Argument Classification A multi-class SVM classifier which tags arguments as ARG0-5, ARGA, and ARGM

Penn Page 26 Original features Stochastic Model Basic Features: Predicate, (verb) Phrase Type, (NP or S-BAR) Parse Tree Path Position (Before/after predicate) Voice (active/passive) Head Word of constituent Subcategorization frame Gildea & Jurafsky, CL02, Gildea & Palmer, ACL02

Penn Page 27 Results (Gold Standard Parses) DataSystem (feature set)PRF1Cl-Acc 2002G&P (Penn)71646777.0 2002SVM Colorado (basic)83798187.9 2002SVM Penn (basic)---93.1 2002SVM Colorado (rich features)89858791.0 2004SVM Penn (basic)*8988 93.5 2004SVM Colorado (rich features)**908989(91)93.0 *Yi and Palmer, KBCS04, ** Pradhan, et al, NAACL04

Penn Page 28 Discussion Comparisons between Colorado and Penn Both systems are SVM-based Kernel: Col: 2 nd degree polynomial kernel; Penn: 3 rd degree kernel (radial basis function) Multi-classification: Col: one-versus-others approach; Penn: pairwise approach Features: Same basic features Col adds: NE, head word POS, partial path, verb classes, verb sense, head word of PP, first or last word/pos in the constituent, constituent tree distance, constituent relative features, temporal cue words, dynamic class context (Pradhan et al, 2004) Kernels allow the automatic exploration of feature combinations.

Penn Page 29 Examining the classification features Path: the route between the constituent being classified and the predicate Path is not a good feature for classification Doesn’t discriminate constituents at the same level Doesn’t have full view of the subcat frame doesn’t distinguish subject of a transitive verb and and the subject of an intransitive verb Path is the best feature for identification Path accurately captures the syntactic configuration between a constituent and the predicate. Xue & Palmer, EMNLP04

Penn Page 30 S NP 0 /arg0 VP The Supreme court VPD NP 1 / arg2 NP 2 /arg1 gavestates more leeway to restrict abortion Arg1: VPD↑VP↓NP Arg2: VPD↑VP↓NP Same Path – two different args

Penn Page 31 Possible feature combinations? Head word of the constituent POS of head word Phrase type Problem: same head word, POS, or phrase type may play different roles with regard to different verbs Combine with predicate

Penn Page 32 Other features Position + voice due to Colorado: Pradhan et al 2004: first word of the current constituent last word of the current constituent left sibling of the current constituent

Penn Page 33 Results (Gold Standard Parses) DataSystem (feature set)PRF1Cl-Acc 2002G&P71646777.0 2002SVM Colorado (basic)83798187.9 2002SVM Penn (basic)---93.1 2002SVM Colorado (rich features)89858791.0 2004SVM Penn (basic)*8988 93.5 2004SVM Colorado (rich features)**908989(91)93.0 2004MaxEnt Penn (designated features and combinations)*** --88.593.0 *Yi and Palmer, KBCS04, ** Pradhan, et al, NAACL04, ***Xue and Palmer, EMNLP04

Penn 1 Kindle: Knowledge and Inference via Description Logics for Natural Language Dan Roth University of Illinois, Urbana-Champaign Martha Palmer University.

Similar presentations

Presentation on theme: "Penn 1 Kindle: Knowledge and Inference via Description Logics for Natural Language Dan Roth University of Illinois, Urbana-Champaign Martha Palmer University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Penn 1 Kindle: Knowledge and Inference via Description Logics for Natural Language Dan Roth University of Illinois, Urbana-Champaign Martha Palmer University.

Similar presentations

Presentation on theme: "Penn 1 Kindle: Knowledge and Inference via Description Logics for Natural Language Dan Roth University of Illinois, Urbana-Champaign Martha Palmer University."— Presentation transcript:

Similar presentations

About project

Feedback