Prototype-Driven Grammar Induction Aria Haghighi and Dan Klein Computer Science Division University of California Berkeley.

Slides:



Advertisements
Similar presentations
Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.
Advertisements

Expectation Maximization Dekang Lin Department of Computing Science University of Alberta.
Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.
SEMANTIC ROLE LABELING BY TAGGING SYNTACTIC CHUNKS
Albert Gatt Corpora and Statistical Methods Lecture 11.
Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš.
Learning and Inference for Hierarchically Split PCFGs Slav Petrov and Dan Klein.
Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.
Prototype-Driven Grammar Induction Aria Haghighi and Dan Klein Computer Science Division University of California Berkeley.
Modeling the Evolution of Product Entities Priya Radhakrishnan 1, Manish Gupta 1,2, Vasudeva Varma 1 1 Search and Information Extraction Lab, IIIT-Hyderabad,
Part-of-speech tagging. Parts of Speech Perhaps starting with Aristotle in the West (384–322 BCE) the idea of having parts of speech lexical categories,
Iowa State University Department of Computer Science, Iowa State University Artificial Intelligence Research Laboratory Center for Computational Intelligence,
Probabilistic Parsing Chapter 14, Part 2 This slide set was adapted from J. Martin, R. Mihalcea, Rebecca Hwa, and Ray Mooney.
Learning with Probabilistic Features for Improved Pipeline Models Razvan C. Bunescu Electrical Engineering and Computer Science Ohio University Athens,
Shallow Parsing CS 4705 Julia Hirschberg 1. Shallow or Partial Parsing Sometimes we don’t need a complete parse tree –Information extraction –Question.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
A C OMPUTATIONAL M ODEL OF A CCELERATED F UTURE L EARNING THROUGH F EATURE R ECOGNITION Nan Li Computer Science Department Carnegie Mellon University Building.
A Survey of Unsupervised Grammar Induction Baskaran Sankaran Senior Supervisor: Dr Anoop Sarkar School of Computing Science Simon Fraser University.
6/29/051 New Frontiers in Corpus Annotation Workshop, 6/29/05 Ann Bies – Linguistic Data Consortium* Seth Kulick – Institute for Research in Cognitive.
Implemented two graph partition algorithms 1. Kernighan/Lin Algorithm Input: $network, a Clair::Network object Produce bi-partition to undirected weighted.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Shallow Parsing.
Learning Accurate, Compact, and Interpretable Tree Annotation Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms עידו.
Approximate Factoring for A* Search Aria Haghighi, John DeNero, and Dan Klein Computer Science Division University of California Berkeley.
CS224N Interactive Session Competitive Grammar Writing Chris Manning Sida, Rush, Ankur, Frank, Kai Sheng.
Introduction to Data Mining Engineering Group in ACL.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
Statistical NLP Winter 2009 Lecture 17: Unsupervised Learning IV Tree-structured learning Roger Levy [thanks to Jason Eisner and Mike Frank for some slides]
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE.
Learning PCFGs: Estimating Parameters, Learning Grammar Rules Many slides are taken or adapted from slides by Dan Klein.
Hand video 
LING/C SC/PSYC 438/538 Lecture 27 Sandiway Fong. Administrivia 2 nd Reminder – 538 Presentations – Send me your choices if you haven’t already.
Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars Kewei TuVasant Honavar Departments of Statistics and Computer Science University.
Ling 570 Day 17: Named Entity Recognition Chunking.
SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
1 Semi-Supervised Approaches for Learning to Parse Natural Languages Rebecca Hwa
Methods for the Automatic Construction of Topic Maps Eric Freese, Senior Consultant ISOGEN International.
1 CS546: Machine Learning and Natural Language Latent-Variable Models for Structured Prediction Problems: Syntactic Parsing Slides / Figures from Slav.
Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.
CS626: NLP, Speech and the Web Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 15, 17: Parsing Ambiguity, Probabilistic Parsing, sample seminar 17.
INSTITUTE OF COMPUTING TECHNOLOGY Forest-based Semantic Role Labeling Hao Xiong, Haitao Mi, Yang Liu and Qun Liu Institute of Computing Technology Academy.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-17: Probabilistic parsing; inside- outside probabilities.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE.
Conversion of Penn Treebank Data to Text. Penn TreeBank Project “A Bank of Linguistic Trees” (as of 11/1992) University of Pennsylvania, LINC Laboratory.
Prototype-Driven Learning for Sequence Models Aria Haghighi and Dan Klein University of California Berkeley Slides prepared by Andrew Carlson for the Semi-
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-16: Probabilistic parsing; computing probability of.
CSA2050 Introduction to Computational Linguistics Parsing I.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-14: Probabilistic parsing; sequence labeling, PCFG.
Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)
NLP. Introduction to NLP Background –From the early ‘90s –Developed at the University of Pennsylvania –(Marcus, Santorini, and Marcinkiewicz 1993) Size.
Part-of-speech tagging
Hand video 
1 Semi-Supervised Approaches for Learning to Parse Natural Languages Slides are from Rebecca Hwa, Ray Mooney.
DERIVATION S RULES USEDPROBABILITY P(s) = Σ j P(T,S) where t is a parse of s = Σ j P(T) P(T) – The probability of a tree T is the product.
NLP. Introduction to NLP Time flies like an arrow –Many parses –Some (clearly) more likely than others –Need for a probabilistic ranking method.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-15: Probabilistic parsing; PCFG (contd.)
NLP. Parsing ( (S (NP-SBJ (NP (NNP Pierre) (NNP Vinken) ) (,,) (ADJP (NP (CD 61) (NNS years) ) (JJ old) ) (,,) ) (VP (MD will) (VP (VB join) (NP (DT.
Part-of-Speech Tagging & Sequence Labeling Hongning Wang
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.
Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25– Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March,
LING 581: Advanced Computational Linguistics Lecture Notes February 24th.
Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:
Statistical NLP Winter 2009
CS 388: Natural Language Processing: Statistical Parsing
Prototype-Driven Learning for Sequence Models
CS : Speech, NLP and the Web/Topics in AI
Presentation transcript:

Prototype-Driven Grammar Induction Aria Haghighi and Dan Klein Computer Science Division University of California Berkeley

Grammar Induction Grammar Engineer EM Data Output 

Central Questions How do we specify what we want to learn? How do we fix observed errors? NPs are things like DT NN That’s not quite it!

Experimental Set-up Binary Grammar { X 1, X 2, … X n } plus POS tags Data WSJ-10 [7k sentences] Evaluate on Labeled F 1 Grammar Upper Bound: 86.1 XjXj X i XkXk

Unconstrained PCFG Induction Learn PCFG with EM Inside-Outside Algorithm Lari & Young [90] Results 0 i j n  (Inside)  (Outside)

Encoding Knowledge What’s an NP? Semi-Supervised Learning

Encoding Knowledge What’s an NP? For instance, DT NN JJ NNS NNP Prototype Learning

Prototype List

DT The NN koala How to use prototypes? NP PP VP VBD sat IN in DT the NN tree ¦ ¦ DT The NN koala JJ hungry NP PhrasePrototype NPDT NN PPIN DT NN VPVBD IN DT NN

Distributional Similarity  (DT NN)  (JJ NNS)  (NNP NNP) NP  (VBD DT NN)  (MD VB NN)  (VBN IN NN) VP  (IN NN)  (IN PRP)  (TO CD CD) PP  (DT JJ NN) { ¦ __ VBD : 0.3, VBD __ ¦ : 0.2, IN __ VBD: 0.1, ….. } proto=NP Clark ‘01 Klein & Manning ‘01

Distributional Similarity  (DT NN)  (JJ NNS)  (NNP NNP) NP  (VBD DT NN)  (MD VB NN)  (VBN IN NN) VP  (IN NN)  (IN PRP)  (TO CD CD) PP  (IN DT) proto=NONE

DT The NN koala VBD sat IN in DT the NN tree ¦ ¦ JJ hungry Prototype CFG + Model proto=NPproto=NONE

Prototype CFG + Model NP PP VP DT The NN koala VBD sat IN in DT the NN tree ¦ ¦ S JJ hungry NP P (DT NP | NP) P (proto=NP | NP) CFG Rule Proto Feature P CFG + (T,F) =  X(i,j)!  2 T P(  | X) P(p i,j | X)

Prototype CFG + Induction Experimental Set-Up Add Prototypes BLIPP corpus Results Unlabeled F

Constituent-Context Model DT The NN koala VBD sat IN in DT the NN tree ¦ ¦ JJ hungry Klein & Manning ‘ P(VBD IN DT NN | +) P(NN __ ¦ | +) Yield Context P(NN VBD| - ) P(JJ __ IN | - )

Constituent-Context Model Bracketing Matrix P CCM (B, F’) =  (i,j) P(B i,j ) P(y i,j | B i,j ) P(c i,j | B i,j ) DT The NN koala VBD sat IN in DT the NN tree ¦ ¦ JJ hungry YieldContext

Intersected Model Different Aspects of Syntax Intersected EM [Klein 2005, Liang et. al. ‘06] P(T, F, F’) = P CCM (B(T), F’) P CFG + (T,F) CCMCFG PP NP

Grammar Induction Experiments Intersected CFG + and CCM Add CCM brackets Results

Reacting to Errors Possessive NPs Our TreeTarget Tree

Reacting to Errors Add Category: NP-POS NN POS New Analysis

Error Analysis Modal VPs Our TreeTarget Tree

Reacting to Errors Add Category: VP-INF VB NN New Analysis

Fixing Errors Supplement Prototypes NP-POS and VP-INF Results Unlabeled F

Results Summary PCFG PROTO CCM Best Upper Bound Labeled F1 ¼ 50% Error reduction From PCFG to BEST

Conclusion Prototype-Driven Learning Flexible Weakly Supervised Framework Merged distributional clustering techniques with structured models

Thank You!

Lots of numbers

More numbers

Yet more numbers