Penn 1 Kindle: Knowledge and Inference via Description Logics for Natural Language Dan Roth University of Illinois, Urbana-Champaign Martha Palmer University.

Slides:

Advertisements

Similar presentations

Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.

Advertisements

Exploring the Effectiveness of Lexical Ontologies for Modeling Temporal Relations with Markov Logic Eun Y. Ha, Alok Baikadi, Carlyle Licata, Bradford Mott,

SEMANTIC ROLE LABELING BY TAGGING SYNTACTIC CHUNKS

Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.

COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.

Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara.

CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng.

Page 1 SRL via Generalized Inference Vasin Punyakanok, Dan Roth, Wen-tau Yih, Dav Zimak, Yuancheng Tu Department of Computer Science University of Illinois.

Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.

Layering Semantics (Putting meaning into trees) Treebank Workshop Martha Palmer April 26, 2007.

CL Research ACL Pattern Dictionary of English Prepositions (PDEP) Ken Litkowski CL Research 9208 Gue Road Damascus,

E XTRACTING SEMANTIC ROLE INFORMATION FROM UNSTRUCTURED TEXTS Diana Trandab ă 1 and Alexandru Trandab ă 2 1 Faculty of Computer Science, University “Al.

A Linear Programming Formulation for Global Inference in Natural Language Tasks Dan RothWen-tau Yih Department of Computer Science University of Illinois.

Max-Margin Matching for Semantic Role Labeling David Vickrey James Connor Daphne Koller Stanford University.

计算机科学与技术学院 Chinese Semantic Role Labeling with Dependency-driven Constituent Parse Tree Structure Hongling Wang, Bukang Wang Guodong Zhou NLP Lab, School.

Semantic Role Labeling Abdul-Lateef Yussiff

A Joint Model For Semantic Role Labeling Aria Haghighi, Kristina Toutanova, Christopher D. Manning Computer Science Department Stanford University.

Towards Parsing Unrestricted Text into PropBank Predicate- Argument Structures ACL4 Project NCLT Seminar Presentation, 7th June 2006 Conor Cafferkey.

Steven Schoonover.  What is VerbNet?  Levin Classification  In-depth look at VerbNet  Evolution of VerbNet  What is FrameNet?  Applications.

Page 1 Learning and Global Inference for Information Access and Natural Language Understanding Dan Roth Department of Computer Science University of Illinois.

Page 1 SRL via Generalized Inference Vasin Punyakanok, Dan Roth, Wen-tau Yih, Dav Zimak, Yuancheng Tu Department of Computer Science University of Illinois.

SRL using complete syntactic analysis Mihai Surdeanu and Jordi Turmo TALP Research Center Universitat Politècnica de Catalunya.

Page 1 Generalized Inference with Multiple Semantic Role Labeling Systems Peter Koomen, Vasin Punyakanok, Dan Roth, (Scott) Wen-tau Yih Department of Computer.

Boosting Applied to Tagging and PP Attachment By Aviad Barzilai.

Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.

Semantic Role Labeling using Maximum Entropy Model Joon-Ho Lim NLP Lab. Korea Univ.

STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.

Robert Hass CIS 630 April 14, 2010 NP NP↓ Super NP tagging JJ ↓

Global Inference in Learning for Natural Language Processing Vasin Punyakanok Department of Computer Science University of Illinois at Urbana-Champaign.

1 CS546: Machine Learning and Natural Language Preparation to the Term Project: - Dependency Parsing - Dependency Representation for Semantic Role Labeling.

Automatic Extraction of Opinion Propositions and their Holders Steven Bethard, Hong Yu, Ashley Thornton, Vasileios Hatzivassiloglou and Dan Jurafsky Department.

Learning Narrative Schemas Nate Chambers, Dan Jurafsky Stanford University IBM Watson Research Center Visit.

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.

Interpreting Dictionary Definitions Dan Tecuci May 2002.

Ling 570 Day 17: Named Entity Recognition Chunking.

A Language Independent Method for Question Classification COLING 2004.

AQUAINT Workshop – June 2003 Improved Semantic Role Parsing Kadri Hacioglu, Sameer Pradhan, Valerie Krugler, Steven Bethard, Ashley Thornton, Wayne Ward,

INSTITUTE OF COMPUTING TECHNOLOGY Forest-based Semantic Role Labeling Hao Xiong, Haitao Mi, Yang Liu and Qun Liu Institute of Computing Technology Academy.

A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart

Modelling Human Thematic Fit Judgments IGK Colloquium 3/2/2005 Ulrike Padó.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,

Minimally Supervised Event Causality Identification Quang Do, Yee Seng, and Dan Roth University of Illinois at Urbana-Champaign 1 EMNLP-2011.

Combining Lexical Resources: Mapping Between PropBank and VerbNet Edward Loper,Szu-ting Yi, Martha Palmer September 2006.

Supertagging CMSC Natural Language Processing January 31, 2006.

Commonsense Reasoning in and over Natural Language Hugo Liu, Push Singh Media Laboratory of MIT The 8 th International Conference on Knowledge- Based Intelligent.

11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.

ARDA Visit 1 Penn Lexical Semantics at Penn: Proposition Bank and VerbNet Martha Palmer, Dan Gildea, Paul Kingsbury, Olga Babko-Malaya, Bert Xue, Karin.

NLP. Introduction to NLP Last week, Min broke the window with a hammer. The window was broken with a hammer by Min last week With a hammer, Min broke.

Global Inference via Linear Programming Formulation Presenter: Natalia Prytkova Tutor: Maximilian Dylla

Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.

Overview of Statistical NLP IR Group Meeting March 7, 2006.

Page 1 July 2008 ICML Workshop on Prior Knowledge for Text and Language Constraints as Prior Knowledge Ming-Wei Chang, Lev Ratinov, Dan Roth Department.

Static model noOverlaps :: ArgumentCandidate[] candidates -> discrete[] types for (i : (0.. candidates.size() - 1)) for (j : (i candidates.size()

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

Relation Extraction (RE) via Supervised Classification See: Jurafsky & Martin SLP book, Chapter 22 Exploring Various Knowledge in Relation Extraction.

Natural Language Processing Vasile Rus

COSC 6336: Natural Language Processing

English Proposition Bank: Status Report

Coarse-grained Word Sense Disambiguation

Part 2 Applications of ILP Formulations in Natural Language Processing

Parsing in Multiple Languages

INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering.

(Entity and) Event Extraction CSCI-GA.2591

Improving a Pipeline Architecture for Shallow Discourse Parsing

Statistical NLP: Lecture 9

Dan Roth Department of Computer Science

Progress report on Semantic Role Labeling

Statistical NLP : Lecture 9 Word Sense Disambiguation

Presentation transcript:

Penn 1 Kindle: Knowledge and Inference via Description Logics for Natural Language Dan Roth University of Illinois, Urbana-Champaign Martha Palmer University of Pennsylvania kindle: encourage, stimulate, promote, inspire Cross Cutting/Enabling Technologies

Penn Page 2 Progress in Natural Language Understanding requires the ability to learn, represent and reason with respect to structured and relational data. Learning, Representing and Reasoning take part at several levels in the understanding process. A unified knowledge representation of the text, that provides an hierarchical encoding of the structural, relational and semantic properties of the given text is integrated with learning mechanisms that can be used to induce such information from newly observed raw text, and that is equipped with an inferential mechanism that can be used to support inferences with respect to such representations. Fundamental Claim

Penn Page 3 Given: Q: Who acquired Overture? Determine: A: Eyeing the huge market potential, currently led by Google, Yahoo took over search company Overture Services Inc last year. Fundamental Task Eyeing the huge market potential, currently led by Google, Yahoo took over search company Overture Services Inc last year Yahoo acquired Overture Entails Subsumed by  (and distinguish from other candidates)

Penn Page 4 General Strategy Given a sentence (answer) Find the optimal set of transformations that maps one sentence to the target sentence. Given a KB of semantic; structural and pragmatic transformations (rules). Given a sentence (question) ee Represent as a concept graph Embellish the representation

Penn Page 5 Generating a Representation Learning Test (extended) subsumption Inference: between two sentence representations Match rules to current representation Matching a substructure (rule’s body) to representation Find optimal mapping (to allow choosing best candidate) Inference (Optimization) Processes + Feature extraction (subsumption)+ Inference Learning + Learning

Penn Page 6 Representation Progress on the representation language; parsing and syntactic sugaring From sentences to concept graphs Real time mapping of sentences to a concept graph representation (read: description logic representation) (a learning+inference task) : a demonstration Resources and Inference Rules Identification of required resources and types of rules Learning and Inference for Extended Subsumption This talk: Resources; progress on semantic parsing A brief introduction to our Integer Learning Programming (ILP) inference framework (and some about learning issues in it). (general framework; broader appeal) Tools and Collaborations Progress

Penn Page 7 Scenario: Global decisions in which several local decisions/components play a role, but there are mutual dependencies on their outcome. Assume: Possible to learned classifiers for different sub-problems Constraints on classifiers’ labels (may be known during training or only at evaluation time). Goal: Incorporate classifiers’ predictions, along with the constraints, in making coherent decisions – decisions that respect the classifiers as well as domain/context specific constrains. Formally: Global inference for best assignment to all variables of interest. Using an Integer Linear Programming formulation to study coherent inferences that respect of domain and task specific constraint. (Learning Decoupled from or Interleaved with Inference) Inference with Classifiers

Penn Page 8 Learning and Inference Problems in NLP Pipelining is a crude approximation; interactions occur across levels and down stream decisions often interact with previous decisions. Leads to propagation of errors Occasionally, later stage problems are easier but upstream mistakes will not be corrected. We propose: Global inference over the outcomes of different (learned) predictors as a way to break away from this paradigm. Supports general constraint structure (not amenable to dynamic programming) Allows a flexible way to incorporate linguistic and structural constraints. POS TaggingPhrasesSemantic EntitiesRelations Vehicle for the study of non-pipeline approaches Most problems are not single classification problems ParsingWSDSemantic Role Labeling

Penn Page 9 Learning structured representations: Learning a semantic parse by learning to make “local” decisions: Candidate arguments; Types of arguments and constraints among them: # or type of arguments the verb likes; arguments the can live together, etc. [Punyakanok,Roth,Yih,Zimak; COLING’04] Simultaneous identification of semantic categories and relations among them. Learn semantic categories (entities); learn relations among them; Use natural constraints [Leaves-In( Person, Location)], to find global solution [Roth, Yih CoNLL’04] Determine how to map one structure to another Which of the applicable (transformations) rules to apply Exploit constraints to determine how to prefer one set of rules over another Many many others All are combinatorial optimization problems of the same type. Applications

Penn Page 10 Learning can be interleaved with inference Problem Setting x4x4 x5x5 x6x6 x7x7 x8x8 x1x1 x2x2 x3x3 z C(x 1,x 4 ) C(x 2,x 3,x 6,x 7,x 8 ) Random Variables X: Conditional Distributions P (learned by classifiers) Constraints C – any Boolean function defined on partial assignments (possible weights W on constraints) Goal: Find the “best” assignment The assignment that achieves the highest global accuracy. This is an Integer Programming Problem X*=argmax X P  X subject to constraints C (+ W  C)

Penn Page 11 Semantic Role Labeling Assign type-likelihood (“My Pearls = Arg1 |Arg2”) How likely is it that arg a is type t? For all potential arguments a  POTARG, t  T P (argument a = type t ) I left my nice pearls to her [ [ [ [ [ ] ] ] ] ] I left my nice pearls to her A0 C-A1A1Ø

Penn Page 12 Inference Maximize the expected number of correct argument predictions T* = argmax T  i P( a i = t i ) Subject to some constraints Structural and Linguistic (R-A1  A1) I left my nice pearls to her Cost = = 1.6Non-OverlappingCost = = 1.4 Blue  Red & N-O Cost = = 1.8Independent Max

Penn Page 13 LP Formulation – Linear Cost Cost function  a  P OT A RG P(a=t) =  a  P OT A RG, t  T P(a=t) x {a=t} Indicator variables x {a1= A0 }, x {a1= A1 }, …, x {a4= AM-LOC }, x {P4=  }  {0,1} Total Cost = p (a1= A0 ) · x (a1= A1 ) + p (a1=  ) · x (a1=  ) +… + p (a4=  ) · x (a4=  )

Penn Page 14 Binary values  a  P OT A RG, t  T, x { a = t }  {0,1} Unique labels  a  P OT A RG,  t  T x {a = t} = 1 No overlapping or embedding a1 and a2 overlap  x {a1= Ø } + x {a2= Ø }  1 Linear Constraints (1/2)

Penn Page 15 No duplicate argument classes  a  P OT A RG x { a = A0 }  1 R-XXX  a2  P OT A RG,  a  P OT A RG x { a = A0 }  x { a2 = R-A0 } C-XXX  a2  P OT A RG,  (a  P OT A RG )  (a is before a2 ) x { a = A0 }  x { a2 = C-A0 } Exactly one argument of type Z (e.g, verb) Given a verb, what argument types may appear. Any Boolean Rule can be encoded as a linear constraint. Experimental advantages already shown is several problems Linear Constraints (2/2)

Penn Page 16 Extended Subsumption Inference Q: What are the sexual discrimination allegations Morgan Stanley will fight against on July 7th? A: Wall Street brokerage Morgan Stanley will defend itself on Wednesday against accusations it denied women promotions, allowed sexual grouping, office strip shows, and other forms of sexual discrimination.

Penn Page 17 Extended Subsumption Arg0 ArgM- TMP fight Morgan Stanley against sexual discrimination on July 7th UNKNOWN allegation Arg0 Arg1 defend Morgan Stanley itself on Wednesday against accusation … ArgM-TMP S 1  S 2 if there exists a PROOF (sequence of rule applications) such that S 1 ’ = PROOF(S 1 )  e S 2 If there are several proofs, choose the “optimal” one This can be formalized as optimizing an objective function min PROOF  r  PROOF c r r Decoupling/interleaving learning & inference

Penn Page 18 Lexical Resources for KindleLexical Resources for Kindle VerbNet PropBank Mapping between PropBank and FrameNet CogComp Tools: Shallow Parser Semantic Parser Question Classification NE I-Track: Identification and tracing of entities Server to IBM Mitre Tools and Collaboration

Penn Page 19 Resources: “Bank Map” Treebank PropBank +frames NomBank +frames Sense tags, Coreference and Ontology links OntoBank VerbNet: Chinese FrameNet WordNet: PropBank2 Events, DTB, … Chinese

Penn Page 20 Summary of Needed KINDLE Resources Knw. Resources/ Processing Levels Available ResourcesProjected Resources SyntacticParaphrase lexicon LexicalWordNet; (some) Causal verbs KB;Grouped WN senses, Lexical KBs; GrammaticalPropBank; (Frame files and taggers) Semantic WordNet; PropBank Frame files, SR Tagger; VerbNet NomLex; NomBank taggers, I-TREC (coreferece tagger); FrameNet; FrameNet taggers, Named Entity Recognizer; PropBank II Extended VerbNet Tom Morton’s coref tagger PB/VN/FrameNet mapping DiscoursePenn Discourse Treebank; World Knw.WordNet; CYC; Omega

Penn Page 21 Frames File example: give > 4000 Framesets for PropBank Roles: Arg0: giver Arg1: thing given Arg2: entity given to Example: double object The executives gave the chefs a standing ovation. Arg0: The executives REL: gave Arg2: the chefs Arg1: a standing ovation

Penn Page 22 NomBank Frames File example: gift (nominalizations, noun predicates, partitives, etc. Roles: Arg0: giver Arg1: thing given Arg2: entity given to Example: double object Nancy’s gift from her cousin was a complete surprise. Arg0: her cousin REL: gave Arg2: Nancy Arg1: gift

Penn Page 23 Frames File example: give w/ Thematic Role Labels Roles: Arg0: giver Arg1: thing given Arg2: entity given to Example: double object The executives gave the chefs a standing ovation. Arg0: Agent The executives REL: gave Arg2: Recipient the chefs Arg1: Theme a standing ovation VerbNet – based on Levin classes

Penn Page 24 Semantic Role Labeling  A0 represents the leaver,  A1 represents the thing left,  A2 represents the benefactor,  AM-LOC is an adjunct indicating the location of the action,  V determines the verb. For each verb in a sentence: identify all constituents that fill a semantic role & determine their roles

Penn Page 25 Approach to Semantic Role Labeling Pre-processing: A heuristic which filters out unwanted constituents with significant confidence Argument Identification A binary SVM classifier which identifies arguments Argument Classification A multi-class SVM classifier which tags arguments as ARG0-5, ARGA, and ARGM

Penn Page 26 Original features Stochastic Model Basic Features: Predicate, (verb) Phrase Type, (NP or S-BAR) Parse Tree Path Position (Before/after predicate) Voice (active/passive) Head Word of constituent Subcategorization frame Gildea & Jurafsky, CL02, Gildea & Palmer, ACL02

Penn Page 27 Results (Gold Standard Parses) DataSystem (feature set)PRF1Cl-Acc 2002G&P (Penn) SVM Colorado (basic) SVM Penn (basic) SVM Colorado (rich features) SVM Penn (basic)* SVM Colorado (rich features)**908989(91)93.0 *Yi and Palmer, KBCS04, ** Pradhan, et al, NAACL04

Penn Page 28 Discussion Comparisons between Colorado and Penn Both systems are SVM-based Kernel: Col: 2 nd degree polynomial kernel; Penn: 3 rd degree kernel (radial basis function) Multi-classification: Col: one-versus-others approach; Penn: pairwise approach Features: Same basic features Col adds: NE, head word POS, partial path, verb classes, verb sense, head word of PP, first or last word/pos in the constituent, constituent tree distance, constituent relative features, temporal cue words, dynamic class context (Pradhan et al, 2004) Kernels allow the automatic exploration of feature combinations.

Penn Page 29 Examining the classification features Path: the route between the constituent being classified and the predicate Path is not a good feature for classification Doesn’t discriminate constituents at the same level Doesn’t have full view of the subcat frame doesn’t distinguish subject of a transitive verb and and the subject of an intransitive verb Path is the best feature for identification Path accurately captures the syntactic configuration between a constituent and the predicate. Xue & Palmer, EMNLP04

Penn Page 30 S NP 0 /arg0 VP The Supreme court VPD NP 1 / arg2 NP 2 /arg1 gavestates more leeway to restrict abortion Arg1: VPD↑VP↓NP Arg2: VPD↑VP↓NP Same Path – two different args

Penn Page 31 Possible feature combinations? Head word of the constituent POS of head word Phrase type Problem: same head word, POS, or phrase type may play different roles with regard to different verbs Combine with predicate

Penn Page 32 Other features Position + voice due to Colorado: Pradhan et al 2004: first word of the current constituent last word of the current constituent left sibling of the current constituent

Penn Page 33 Results (Gold Standard Parses) DataSystem (feature set)PRF1Cl-Acc 2002G&P SVM Colorado (basic) SVM Penn (basic) SVM Colorado (rich features) SVM Penn (basic)* SVM Colorado (rich features)**908989(91) MaxEnt Penn (designated features and combinations)*** *Yi and Palmer, KBCS04, ** Pradhan, et al, NAACL04, ***Xue and Palmer, EMNLP04