In Search of a More Probable Parse: Experiments with DOP* and the Penn Chinese Treebank Aaron Meyers Linguistics 490 Winter 2009.

Slides:



Advertisements
Similar presentations
Feature Forest Models for Syntactic Parsing Yusuke Miyao University of Tokyo.
Advertisements

Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov.
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
Albert Gatt Corpora and Statistical Methods Lecture 11.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Universität des Saarlandes Seminar: Recent Advances in Parsing Technology Winter Semester Jesús Calvillo.
10. Lexicalized and Probabilistic Parsing -Speech and Language Processing- 발표자 : 정영임 발표일 :
1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,
Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
1/13 Parsing III Probabilistic Parsing and Conclusions.
Grammar induction by Bayesian model averaging Guy Lebanon LARG meeting May 2001 Based on Andreas Stolcke’s thesis UC Berkeley 1994.
1/17 Probabilistic Parsing … and some other approaches.
Language Model. Major role: Language Models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. A lot of.
Learning syntactic patterns for automatic hypernym discovery Rion Snow, Daniel Jurafsky and Andrew Y. Ng Prepared by Ang Sun
Parsing the NEGRA corpus Greg Donaker June 14, 2006.
Breaking the Resource Bottleneck for Multilingual Parsing Rebecca Hwa, Philip Resnik and Amy Weinberg University of Maryland.
Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
BİL711 Natural Language Processing1 Statistical Parse Disambiguation Problem: –How do we disambiguate among a set of parses of a given sentence? –We want.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
Spring /22/071 Beyond PCFGs Chris Brew Ohio State University.
Tree-adjoining grammar (TAG) is a grammar formalism defined by Aravind Joshi and introduced in Tree-adjoining grammars are somewhat similar to context-free.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
10/12/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini.
SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.
1 Semi-Supervised Approaches for Learning to Parse Natural Languages Rebecca Hwa
Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop Nizar Habash and Owen Rambow Center for Computational Learning.
Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.
Semi-supervised Training of Statistical Parsers CMSC Natural Language Processing January 26, 2006.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Page 1 Probabilistic Parsing and Treebanks L545 Spring 2000.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE.
Chapter 23: Probabilistic Language Models April 13, 2004.
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
Methods for Automatic Evaluation of Sentence Extract Summaries * G.Ravindra +, N.Balakrishnan +, K.R.Ramakrishnan * Supercomputer Education & Research.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
A Repetition Based Measure for Verification of Text Collections and for Text Categorization Dmitry V.Khmelev Department of Mathematics, University of Toronto.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Evaluating Models of Computation and Storage in Human Sentence Processing Thang Luong CogACLL 2015 Tim J. O’Donnell & Noah D. Goodman.
Supertagging CMSC Natural Language Processing January 31, 2006.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
December 2011CSA3202: PCFGs1 CSA3202: Human Language Technology Probabilistic Phrase Structure Grammars (PCFGs)
NLP. Introduction to NLP Time flies like an arrow –Many parses –Some (clearly) more likely than others –Need for a probabilistic ranking method.
Stochastic Methods for NLP Probabilistic Context-Free Parsers Probabilistic Lexicalized Context-Free Parsers Hidden Markov Models – Viterbi Algorithm Statistical.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25– Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March,
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.
LING 581: Advanced Computational Linguistics Lecture Notes March 2nd.
Chapter 12: Probabilistic Parsing and Treebanks Heshaam Faili University of Tehran.
Natural Language Processing Vasile Rus
CSC 594 Topics in AI – Natural Language Processing
CSC 594 Topics in AI – Natural Language Processing
Authorship Attribution Using Probabilistic Context-Free Grammars
CS 388: Natural Language Processing: Statistical Parsing
Probabilistic and Lexicalized Parsing
LING/C SC 581: Advanced Computational Linguistics
Probabilistic and Lexicalized Parsing
LING 581: Advanced Computational Linguistics
CSCI 5832 Natural Language Processing
Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.
David Kauchak CS159 – Spring 2019
Presentation transcript:

In Search of a More Probable Parse: Experiments with DOP* and the Penn Chinese Treebank Aaron Meyers Linguistics 490 Winter 2009

Syntax 101 Given a sentence, produce a syntax tree (parse) Example: ‘Mary like books’ Software which does this known as a parser

Grammars Context-Free Grammar (CFG) ▫Simple rules describing potential configurations ▫From example:  S → NP VP  NP → Mary  VP → V NP  V → likes  NP → books Problems with ambiguity

Tree Substitution Grammar (TSG) Incorporates larger tree fragments Substitution operator (◦) combines fragments Context-free grammar is a trivial TSG ◦◦ =

Treebanks Database of sentences and corresponding syntax trees ▫Trees are hand-annotated Penn Treebanks among most commonly used Grammars can be created automatically from a treebank (training) ▫Extract rules (CFG) or fragments (TSG) directly from trees

Learning Grammar from Treebank Many rules or fragments will occur repeatedly ▫Incorporate frequencies into grammar ▫Probabilistic Context-Free Grammar (PCFG), Stochastic Tree Substitution Grammar (STSG) Data-Oriented Parsing (DOP) model ▫DOP1 (1992): Type of STSG ▫Describes how to extract fragments from a treebank for inclusion in grammar (model) ▫Generally limit fragments to a certain max depth

Penn Chinese Treebank Latest version 6.0 (2007) ▫Xinhua newswire (7339 sentences) ▫Sinorama news magazine (7106 sentences) ▫Hong Kong news (519 sentences) ▫ACE Chinese broadcast news (9246 sentences)

Penn Chinese Treebank and DOP Latest version 6.0 (2007) ▫Xinhua newswire (7339 sentences) ▫Sinorama news magazine (7106 sentences) ▫Hong Kong news (519 sentences) ▫ACE Chinese broadcast news (9246 sentences) Previous experiments (2004) with Penn Chinese Treebank and DOP1 ▫1473 trees selected from Xinhua newswire ▫Fragment depth limited to three levels or less

An improved DOP model: DOP* Challenges with DOP1 model ▫Computationally inefficient (exponential increase in number of fragments extracted) ▫Statistically inconsistent A new estimator: DOP* (2005) ▫Limits fragment extraction by estimating optimal fragments using subsets of training corpus  Linear rather than exponential increase in fragments ▫Statistically consistent (accuracy increases as size of training corpus increases)

Research Question & Hypothesis Will a DOP* parser applied to the Penn Chinese Treebank show significant improvement in accuracy for a model incorporating fragments up to depth five compared to a model incorporating only fragments up to depth three? Hypothesis: Yes, accuracy will significantly increase ▫Deeper fragments allow parser to capture non- local dependencies in syntax usage/preference

Selecting training and testing data Subset of Xinhua newswire (2402 sentences) ▫Includes only IP trees (no headlines or fragments) Excluded sentences of average or greater length Remaining 1402 sentences divided three times into random training/test splits ▫Each test split has 140 sentences ▫Other 1262 sentences used for training

Preparing the trees Penn Treebank converted to dopdis format Chinese characters converted to alphanumeric codes Standard tree normalizations ▫Removed empty nodes ▫Removed A over A and X over A unaries ▫Stripped functional tags Original: (IP (NP-PN-SBJ (NR 上海 ) (NR 浦东 )) (VP … Converted: (ip,[(np,[(nr,[(hmeiahodpp_,[])]),(nr,[(hodoohmejc_,[])])]),(vp, …

Training & testing the parser DOP* parser is created by training a model with the training trees The parser is then tested by processing the test sentences ▫Parse trees returned by parser are compared with original parse trees from treebank Standard evaluation metrics computed: labeled recall, labeled precision, and f-score (mean) Repeated for each depth level, test/training split

Parsing Results DepthLabeled Recall Labeled Precision F-score %58.14%58.57% %67.42%69.47% %67.80%69.96%

Other interesting statistics Depth#Fragments Extracted Total Training Time (hours) Total Testing Time (hours) Seconds / Sentence 16, , , Training time at depth-3 and depth-5 is similar, even though depth-5 has much higher fragment count Testing time though at depth-5 is ten times higher than testing time at depth-3!

Conclusion Obtain parsing results for other two testing / training splits, if similar: Increasing fragment extraction depth from three to five does not significantly improve accuracy for a DOP* parser over the Penn Chinese Treebank ▫Determine statistical significance ▫Practical benefit is negated by increased parsing time

Future Work Increase size of training corpus ▫DOP* estimation consistency: accuracy should increase as larger training corpus used Perform experiment with DOP1 model ▫Accuracy obtained with DOP* lower than previous experiments using DOP1 (Hearne & Way 2004) Qualitative analysis ▫What constructions are captured more accurately?

Future Work Perform experiments with other corpora ▫Other sections of Chinese Treebank ▫Other treebanks: Penn Arabic Treebank, … Increase capacity and stability of dopdis system ▫Encountered various failures on larger runs, crashing after as long as 36 hours ▫Efficiency could be increased by larger memory support (64-bit architecture), storage and indexing using a relational database system