NAACL 2007 Treebank-Based Acquisition of Multilingual LFG Resources1 Treebank-Based Acquisition of Multilingual LFG Resources for Parsing, Generation and.

Slides:



Advertisements
Similar presentations
Feature Forest Models for Syntactic Parsing Yusuke Miyao University of Tokyo.
Advertisements

CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.
Lexical Functional Grammar : Grammar Formalisms Spring Term 2004.
Lexical Functional Grammar History: –Joan Bresnan (linguist, MIT and Stanford) –Ron Kaplan (computational psycholinguist, Xerox PARC) –Around 1978.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Grammar Development Platform Miriam Butt October 2002.
Dependency-Based Automatic Evaluation for Machine Translation Karolina Owczarzak, Josef van Genabith, Andy Way National Centre for Language Technology.
Grammatical Relations and Lexical Functional Grammar Grammar Formalisms Spring Term 2004.
Universität des Saarlandes Seminar: Recent Advances in Parsing Technology Winter Semester Jesús Calvillo.
1 A Comparative Evaluation of Deep and Shallow Approaches to the Automatic Detection of Common Grammatical Errors Joachim Wagner, Jennifer Foster, and.
LFG Slides based on slides by: Kersti Börjars & Nigel Vincent {k.borjars, University of Manchester Winter school in LFG July
Towards Parsing Unrestricted Text into PropBank Predicate- Argument Structures ACL4 Project NCLT Seminar Presentation, 7th June 2006 Conor Cafferkey.
Annotating language data Tomaž Erjavec Institut für Informationsverarbeitung Geisteswissenschaftliche Fakultät Karl-Franzens-Universität Graz Tomaž Erjavec.
Paris 2008 Treebank-Based LFG Resources 1 Treebank-Based Wide Coverage Probabilistic LFG Resources Josef van Genabith, Aoife Cahill, Grzegorz Chrupala,
Kakia Chatsiou GreekGram: Building a parallel grammar for Modern Greek LAC day GreekGram Building a parallel grammar for Modern Greek Kakia.
Kakia Chatsiou Modern Greek Grammar fragment Implementation using XLE FLATLANDS GreekGram Reporting on the progress of the implementation.
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources1 Advanced Course: Treebank-Based Acquisition of LFG, HPSG and CCG Resources Josef.
LEXICAL FUNCTIONAL GRAMMAR (LFG) Anca-Diana BIBIRI 1 st semester
Generation Miriam Butt January The Two Sides of Generation 1) Natural Language Generation (NLG) Systems which take information from some database.
Introduction to treebanks Session 1: 7/08/
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
Partial Prebracketing to Improve Parser Performance John Judge NCLT Seminar Series 7 th December 2005.
C SC 620 Advanced Topics in Natural Language Processing Lecture 21 4/13.
1 Josef van Genabith & Andy Way TransBooster ( ) LaDEva: Labelled Dependency-Based MT Evaluation ( ) GramLab ( ) Previous MT Work.
1/13 Parsing III Probabilistic Parsing and Conclusions.
1 Kakia Chatsiou Department of Language and Linguistics University of Essex XLE Tutorial & Demo LG517. Introduction to LFG Introduction.
Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19.
Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.
The Use of Corpora for Automatic Evaluation of Grammar Inference Systems Andrew Roberts & Eric Atwell Corpus Linguistics ’03 – 29 th March Computer Vision.
April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks and Parsing Jan Hajič Institute of Formal and Applied Linguistics School of.
Breaking the Resource Bottleneck for Multilingual Parsing Rebecca Hwa, Philip Resnik and Amy Weinberg University of Maryland.
Computational Grammars Azadeh Maghsoodi. History Before First 20s 20s World War II Last 1950s Nowadays.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
EMPOWER 2 Empirical Methods for Multilingual Processing, ‘Onoring Words, Enabling Rapid Ramp-up Martha Palmer, Aravind Joshi, Mitch Marcus, Mark Liberman,
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
1 Data-Driven Dependency Parsing. 2 Background: Natural Language Parsing Syntactic analysis String to (tree) structure He likes fish S NP VP NP VNPrn.
Natural Language Processing Group Department of Computer Science University of Sheffield, UK Improving Semi-Supervised Acquisition of Relation Extraction.
Grammatical Machine Translation Stefan Riezler & John Maxwell.
LING 696G Computational Linguistics Seminar Lecture 2 2/2/04.
Kakia Chatsiou A brief introduction to XLE LG617 - XLE Lab1 LG617 A brief introduction to XLE Kakia Chatsiou Dept of Language.
10/12/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini.
THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)
Tree-based Machine Translation using syntax and semantics
The Prague (Czech-)English Dependency Treebank Jan Hajič Charles University in Prague Computer Science School Institute of Formal and Applied Linguistics.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
Grammar Engineering: What is it good for? Miriam Butt (University of Konstanz) and Martin Forst (NetBase Solutions) Colombo 2014.
Deep Processing for Restricted Domain QA Yi Zhang Universit ä t des Saarlandes
Albert Gatt Corpora and Statistical Methods Lecture 11.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 4.
Head-driven Phrase Structure Grammar (HPSG)
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
CPE 480 Natural Language Processing Lecture 4: Syntax Adapted from Owen Rambow’s slides for CSc Fall 2006.
1 A Comparative Evaluation of Deep and Shallow Approaches to the Automatic Detection of Common Grammatical Errors Joachim Wagner, Jennifer Foster, and.
CPSC 503 Computational Linguistics
Supertagging CMSC Natural Language Processing January 31, 2006.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
Natural Language Processing Lecture 14—10/13/2015 Jim Martin.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
SALSA-WS 09/05 Approximating Textual Entailment with LFG and FrameNet Frames Aljoscha Burchardt, Anette Frank Computational Linguistics Department Saarland.
Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.
Natural Language Processing Vasile Rus
PRESENTED BY: PEAR A BHUIYAN
Lecture – VIII Monojit Choudhury RS, CSE, IIT Kharagpur
Semantic Parsing for Question Answering
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27
LING/C SC 581: Advanced Computational Linguistics
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 26
Presentation transcript:

NAACL 2007 Treebank-Based Acquisition of Multilingual LFG Resources1 Treebank-Based Acquisition of Multilingual LFG Resources for Parsing, Generation and Transfer Josef van Genabith, National Centre for Language Technology (NCLT), Dublin City University, Ireland Treebank Workshop NAACL 2007

NAACL 2007 Treebank-Based Acquisition of Multilingual LFG Resources2 “Shallow” grammar: defines language (set of strings) “Deep” Grammar: as above + maps strings to “meaning” representation: predicate-argument structure, dependencies, simple logical form …, usually involves some form of long-distance dependency (LDD) resolution Deep grammars (HPSG, LFG, CCG, TAG …) usually hand-crafted Very difficult & expensive to scale to unrestricted text Motivation for treebank-based deep grammar acquisition (LFG/CCG/HPSG/TAG/DepGr/…)!! LFG: [Kaplan and Bresnan, 82; Dalrymple, 2001; Bresnan, 2001] Constraint-based (“unification”), lexicalised c(onstituent)-str & f(unctional) structure c-str: surface configuration (CFG trees) f-str: abstract grammatical functions/relations (SUBJ, OBJ, OBL, COMP, XCOMP, ADJN, POSS, APP, …) f-str: AVM (feature-structure) encoding of dependencies/pred-arg. Lexical-Functional Grammar (LFG)

NAACL 2007 Treebank-Based Acquisition of Multilingual LFG Resources3 Lexical-Functional Grammar LFG

NAACL 2007 Treebank-Based Acquisition of Multilingual LFG Resources4 Lexical-Functional Grammar LFG Treebank: trees How do we get from trees to f-structures? What’s missing is the equations! Automatic f-structure annotation algorithm Traverses tree and assigns LFG equations Principle-based c-str/f-str interface

NAACL 2007 Treebank-Based Acquisition of Multilingual LFG Resources5 F-Structure Annotation Algorithm Algorithm exploits: –Categorial information (NP, VP, VBZ, …) –Configurational information: Local head, left/right of head Leftmost NP sister to right of V(erbal) head: (  OBJ)=  –Morphological information: Him: (  OBJ)=  –“Functional” tag information: -LGS (  PASSIVE)=+, -SBJ, -CLR, … –Trace/co-indexation information Translate traces + co-indexation to corresponding re-entrancies at f- str.

NAACL 2007 Treebank-Based Acquisition of Multilingual LFG Resources6 F-Structure Annotation Algorithm Left-Right Context Annotation Principles Coordination Annotation Principles Catch-All and Clean-Up Traces Proto F-Structures Proper F-Structures Head-Lexicalization [Magerman,1994] Lemmatization + Macros Lexical Entries Defaults – “Functional Tags”

NAACL 2007 Treebank-Based Acquisition of Multilingual LFG Resources7 Treebank Annotation: Control & Wh-Rel. LDD

NAACL 2007 Treebank-Based Acquisition of Multilingual LFG Resources8 Multilingual Treebank-Based LFG Resources English + Penn-II: parsers (+ LDD resolution), generators, subcat-frame extraction, bootstrapping of new TB-resources (QuestionBank), transfer Pilots/proof of concept: multilingual treebank-based LFG acquisition: –German: TIGER (Cahill et al 2003, 2005) –Chinese: CTB (Burke et al 2004) –Spanish: Cast3LB (O’Donovan et al 2005), (Chrupala and van Genabith 2006) GramLab Project ( ): Chinese, Japanese, Arabic, Spanish, French and German

NAACL 2007 Treebank-Based Acquisition of Multilingual LFG Resources9 Multilingual Treebank-Based LFG Resources LanguageTreebank EnglishPenn-II Chinese CTB 5.1 JapaneseKTC 4.0 GermanTIGER 2.0 German TűBa-D/Z SpanishCast3LB ArabicATB FrenchP7T SizeCoding/Data 50,000CFG+traces+FT 18,000CFG+traces+FT 38,000Dep (+traces) 50,000Graphs+CFG+Dep 22,000CFG+Dep+f-traces 3,500CFG+Dep+f-traces 300,000 (words) 20,000CFG+Dep+f-traces  > 200,000

NAACL 2007 Treebank-Based Acquisition of Multilingual LFG Resources10 Q2 What was missing in TB resource? –F-structures, pred-argument structure, dependencies => f-structure annotation algorithm –Limited domain in Penn-II (most treebanks …) => bootstrap grammar and QuestionBank (4000 questions from TREC and CCG) –GFs, active/passive, decl/interrog/imp, control, raising, LDDs, pro-drop, zero- anaphora, tense/aspect, … What was done by hand? –F-structure annotation algorithm (principle-based c-/f-str interface) –No restructuring, no clean-up of TB (unlike CCG/HPSG/TAG – but see P7T) –No manual additions (unlike CCG/HPSG/TAG) –Future work …

NAACL 2007 Treebank-Based Acquisition of Multilingual LFG Resources11 Q3 Methodological Issues - Quality Assurance: Evaluation against hand-crafted/corrected Gold Standard DepBanks –PARC 700 –CBS 500 –PropBank –Own Gold standard DepBanks for: English, Chinese, Japanese, German, Arabic, Spanish, French ( ) CCG-style evaluation against automatically annotated Gold (Silver-) Standard DepBanks based on WSJ Sec. 23 trees (CCG, HPSG) Quality of annotation process and parsing resources: treebank-based LFG parsing statistically significantly outperform XLE and RASP (PARC 700 & CBS 500)

NAACL 2007 Treebank-Based Acquisition of Multilingual LFG Resources12 Q4 Phrase Structure or Dependencies? Both!!! Why?: Phrase Structure good for parsing and generation => tab into lots of mature, efficient & well understood technology (but see dependency parsing) Dependencies close to f-structure/predicate-argument structures … –Penn-II: CFG-trees + traces/co-indexation + “functional” labels/tags –TIGER: graphs + CFG-categories + grammatical function labels + LDDs through crossing edges –Cast3LB/P7T/TűBa-DZ: CFG trees + grammatical function labels + LDDs through GF paths

NAACL 2007 Treebank-Based Acquisition of Multilingual LFG Resources13 Q5 & Q6 Pros/Cons Formalism-Specific Treebank? –Formalism-Specific Treebank? Bad!  Limits usefulness/user group/… –Better to have generic TB with CFG + Dep Label + LDDs + other feature labels (as required). And then extract LFG/HPSG/CCG/TAG/Dependency Grammars Grammar First vs. Treebank First? –Depends on what you want to do … –If you want high-quality, wide-coverage resources (that can parse unrestricted text) then its definitely better to do treebanking-first (or use bootstrapping) –Problem: many traditionally trained linguists see TreeBanking as menial task –Highly qualified and interesting task: empirical linguistics: confront/rather than invent data –Sociological task: how to make treebanking/bootstrapping sexy?

NAACL 2007 Treebank-Based Acquisition of Multilingual LFG Resources14 Some Resources ESSLLI 2006 course material: Treebank-Based Acquisition of LFG, HPSG and CCG Resources. J. van Genabith, Y. Miyao and J. Hockenmaier LFG parser demo: A. Cahill and J. Van Genabith, Robust PCFG-Based Generation using Automatically Acquired LFG-Approximations, COLING/ACL 2006, Sydney, Australia J. Judge, A. Cahill and J. van Genabith, QuestionBank: Creating a Corpus of Parse-Annotated Questions, COLING/ACL 2006, Sydney, Australia R. O'Donovan, M. Burke, A. Cahill, J. van Genabith and A. Way. Large-Scale Induction and Evaluation of Lexical Resources from the Penn-II and Penn-III Treebanks, Computational Linguistics, 2005 A. Cahill, M. Forst, M. Burke, M. McCarthy, R. O'Donovan, C. Rohrer, J. van Genabith and A. Way. Treebank-Based Acquisition of Multilingual Unification Grammar Resources; Journal of Research on Language and Computation; Kluwer Academic Press, 2005 R. O'Donovan, A. Cahill, J. van Genabith, and A. Way. Automatic Acquisition of Spanish LFG Resources from the CAST3LB Treebank; In Proceedings of the Tenth International Conference on LFG, Bergen, Norway, 2005 M. Burke, O. Lam, A. Cahill, R. Chan, R. O'Donovan, A. Bodomo, J. van Genabith and A. Way; Treebank-Based Acquisition of a Chinese Lexical-Functional Grammar; Proceedings of the PACLING-18 Conference, Waseda University, Tokyo, Japan, pages , 2004 A. Cahill, M. Burke, R. O'Donovan, J. van Genabith, and A. Way. Long-Distance Dependency Resolution in Automatically Acquired Wide-Coverage PCFG-Based LFG Approximations, In Proceedings of ACL-04, pp , Barcelona, Spain, 2004 Cahill A, M. McCarthy, J. van Genabith and A. Way. Parsing with PCFGs and Automatic F-Structure Annotation, In M. Butt and T. Holloway-King (eds.): LFG’02, Athens, Greece, CSLI Publications, Stanford, CA., pp