1 Partial Dependency Parsing for Irish Elaine Uí Dhonnchadha & Josef Van Genabith.

Slides:



Advertisements
Similar presentations
CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
Advertisements

Chapter 4 Syntax.
Syntactic analysis using Context Free Grammars. Analysis of language Morphological analysis – Chairs, Part Of Speech (POS) tagging – The/DT man/NN left/VBD.
Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.
Grammar Development Platform Miriam Butt October 2002.
Dr. Abdullah S. Al-Dobaian1 Ch. 2: Phrase Structure Syntactic Structure (basic concepts) Syntactic Structure (basic concepts)  A tree diagram marks constituents.
Statistical NLP: Lecture 3
MORPHOLOGY - morphemes are the building blocks that make up words.
1 Words and the Lexicon September 10th 2009 Lecture #3.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Word Classes and English Grammar.
Stemming, tagging and chunking Text analysis short of parsing.
Matakuliah: G0922/Introduction to Linguistics Tahun: 2008 Session 11 Syntax 2.
NLP and Speech 2004 English Grammar
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
1 CONTEXT-FREE GRAMMARS. NLE 2 Syntactic analysis (Parsing) S NPVP ATNNSVBD NP AT NNthechildrenate thecake.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Outline of English Syntax.
Context-Free Grammar CSCI-GA.2590 – Lecture 3 Ralph Grishman NYU.
Dr. Ansa Hameed Syntax (4).
Syntax Nuha AlWadaani.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Constituents  Sentence has internal structure  The structures are represented in our mind  Words in a sentence are grouped into units, and these units.
Introduction to English Syntax Level 1 Course Ron Kuzar Department of English Language and Literature University of Haifa Chapter 2 Sentences: From Lexicon.
Parsing Estonian with Constraint Grammar Kaili Müürisep Institute of Cybernetics at Tallinn Technical University.
Lecture 9: The Gerund.  The English gerund is an intriguing structure which causes a particular problem for X-bar theory  [His constantly complaining.
IV. SYNTAX. 1.1 What is syntax? Syntax is the study of how sentences are structured, or in other words, it tries to state what words can be combined with.
Natural Language Processing Lecture 6 : Revision.
CS : Language Technology for the Web/Natural Language Processing Pushpak Bhattacharyya CSE Dept., IIT Bombay Constituent Parsing and Algorithms (with.
Chapter 5 Syntax English Linguistics: An Introduction.
Grammar Engineering: What is it good for? Miriam Butt (University of Konstanz) and Martin Forst (NetBase Solutions) Colombo 2014.
NLP. Introduction to NLP Is language more than just a “bag of words”? Grammatical rules apply to categories and groups of words, not individual words.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
GrammaticalHierarchy in Information Flow Translation Grammatical Hierarchy in Information Flow Translation CAO Zhixi School of Foreign Studies, Lingnan.
PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
Culture , Language and Communication
Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin.
Review of basic concepts.  The knowledge of sentences and their structure.  Syntactic rules include: ◦ The grammaticality of sentences ◦ Word order.
CPE 480 Natural Language Processing Lecture 4: Syntax Adapted from Owen Rambow’s slides for CSc Fall 2006.
Rules, Movement, Ambiguity
CSA2050 Introduction to Computational Linguistics Parsing I.
1 Context Free Grammars October Syntactic Grammaticality Doesn’t depend on Having heard the sentence before The sentence being true –I saw a unicorn.
C HAPTER 11 Grammar Fundamentals. T HE P ARTS OF S PEECH AND T HEIR F UNCTIONS Nouns name people, places things, qualities, or conditions Subject of a.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.
SYNTAX.
◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.
Chunk Parsing. Also called chunking, light parsing, or partial parsing. Method: Assign some additional structure to input over tagging Used when full.
CSA3050: NLP Algorithms Sentence Grammar NLP Algorithms.
Basic Syntactic Structures of English CSCI-GA.2590 – Lecture 2B Ralph Grishman NYU.
NATURAL LANGUAGE PROCESSING
NLP. Introduction to NLP #include int main() { int n, reverse = 0; printf("Enter a number to reverse\n"); scanf("%d",&n); while (n != 0) { reverse =
ENGLISH 5050: English Syntax and Morphology All quotations, unless otherwise noted, are from Chapter 2 of The Grammar Book, 2nd edition. Robert F. van.
Lecture 1 Sentences Verbs.
SYNTAX.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 King Faisal University.
Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:
Descriptive Grammar – 2S, 2016 Mrs. Belén Berríos Droguett
Beginning Syntax Linda Thomas
Lecture – VIII Monojit Choudhury RS, CSE, IIT Kharagpur
Statistical NLP: Lecture 3
Basic Parsing with Context Free Grammars Chapter 13
Chapter Eight Syntax.
Part I: Basics and Constituency
Syntax.
CS 388: Natural Language Processing: Syntactic Parsing
BBI 3212 ENGLISH SYNTAX AND MORPHOLOGY
Chapter Eight Syntax.
Introduction to Linguistics
Chunk Parsing CS1573: AI Application Development, Spring 2003
Dependency Grammar & Stanford Dependencies
Presentation transcript:

1 Partial Dependency Parsing for Irish Elaine Uí Dhonnchadha & Josef Van Genabith

2 Aims of the Research To be able to parse and/or chunk unrestricted Irish text To account for as much of the syntactic phenomena of Irish as possible in an efficient and principled way To use open-source software a far as possible

3 Outline of the Talk Background Stages of Development for Dependency Parser Chunker Future Work

4 Irish Language – some facts Celtic Language  Goidelic (Irish, Manx, Scottish Gaelic)  Brittonic (Breton, Cornish, Welsh) Verb – Subject – Object sentence word order  Chaith Seán an liathróid. Threw Seán the ball. V S O ‘Seán threw the ball’ Fixed word order

5 Irish Language Inflectional language  gender: fem/masc  case: common/genitive/vocative  verbs inflected for number and person chuala mé, I heard (analytic) chualamar, we heard (synthetic) Initial mutation of words  cailín ‘girl’, an chailín ‘the girl’  arán ‘bread’, an t-arán ‘the bread’  seachtain ‘week’, an tseachtain ‘the week’  bord ‘table’, ar an mbord ‘on the table’

6 Irish Language Prepositions inflected for person and number.  Labhair sé liom faoi Spoke he with-me about-it ‘He spoke to me about it’  Tabhair dom é Give to-me it ‘Give it to me’ Full paradigm for every preposition  liom‘with-me’ leat‘with-you’ leis‘with-him/it ETC. ETC.

7 Irish Language Verbal noun - used in progressives, perfects, infinitives, etc.  De-verbal nouns: bris(v) ‘break’, briseadh(vn) ‘breaking  De-agentive nouns: feirmeoir(n) ‘farmer’, feirmeoireacht(vn) ‘farming’ Progressive  Tá mé ag oscailt an dorais Is he at opening(vn) the door(gen) ‘He is opening the door’ After Perfect  Tá mé tar_éis an doras a oscailt Is me after the door PRT opening(vn) ‘I am after opening the door’

8 Parsing Methodology Dependency Analysis & Constituency Analysis Dependency Analysis  Relationships between pairs of words  Grammatical Functions and Head-Modifier dependencies  Root and terminal nodes Constituency Analysis  Phrase Structure Rules, e.g. S = NP VP  Hierarchical structure; root, phrase categories, leaf/terminal nodes

9 Dependency Analysis Issues in the theoretical syntax of Irish on which there is no clear concensus …  The non-adjacency of verb and object in a VSO language, i.e. difficulties with VP  Some periphrastic aspectual constructions in Irish, e.g. progressive aspect has more nominal than verbal characteristics … Dependency Analysis includes semantic as well as synactic information

10 Dependency Parsing A dependency analysis looks at dependencies between pairs of words (which do not have to be adjacent) in a sentence The tokens present in the input string are annotated without introducing any abstract categories (e.g. phrasal nodes)  i.e. dependency analysis consists of a root, and leaf nodes, without intermediate levels Grammatical functions such as subject, object, predicate, as well as various types of prepositional phrase, e.g. adverbial, aspectual, predicative, etc. are annotated Clauses and head-modifier dependencies are identified

11 Dependency Parsing Surface-oriented, bottom-up parsing Dependency relations between pairs of tokens  Grammatical functions  Head-modifier relations Tokens not necessarily adjacent. V Det N Det N Bhris an fear a rúitín Broke the man his ankle ‘The man broke his ankle’ DO S

12 Previous NLP Work Tokenization & Morphological Analysis  Finite-State Morphology: (Karttunen, Beesley, 1999; 2003)  Finite-State Morphological Analyser & Generator for Irish: (Uí Dhonnchadha, 2002) POS Tagging and Parsing  Constraint Grammar (CG): Karlsson et al (1995),  Constraint Grammar Parser CG-2 (Tapanainen, 1996),  VISL CG3 (Bick et al, ) Chunking:  Partial Parsing via Finite-State Cascades (Abney, 1996)

13 Stages of Development Define the Syntactic Phenomena to include Gather Test Data Decide on Parsing Methodology Decide a Tag-Set for dependency and grammatical relations Develop Linguistic Rules for dependency analysis Test the rules Evaluate the results

14 Syntactic Phenomena Sources of Information  Grammar books  Previous research on aspects of Irish Syntax Simple declarative sentences (incl. neg. and interrogative) Relative clauses Copular constructions Non-finite complements Adjuncts

15 Test Data (Gold Standard) Sample Sentences  Short invented grammatical sentences (225) based on grammar books etc.  Automatically POS tagged and manually checked and corrected  Dependency tagged and manually checked and corrected  Chunked and manually checked and corrected Corpus Data  Corpus data – 250 real sentences randomly selected from the 3000 sentence Gold Standard POS Tagged Corpus Dependency tagged, chunked and manually checked and corrected

16 Tag Set Grammatical @CLB, etc. Unlabelled @P<, etc. Start with symbol, by convention, to distinguish them from morphosyntatic tags  “Fuair”faigh This tagset follows the style of tags described for English (Karlsson, 1995), and for Danish (Bick, 2003),[1][1] However, there is not a prescribed list of tags for CG, which allows us to tailor the tagset to the language. [1][1] Other languages are also detailed on the VISL website:

17 Dependency Tags: Verbs and finite main verbrith finite main verb including subjectritheamar 'we relative finite main verba chuala mé, 'that I relative finite main verb incl. subjecta chualamar, 'that we finite auxiliary verbTá sé ag cócaireacht 'He is finite auxiliary verb including subjecttáimid 'we relative finite auxiliary verbatá siad 'that/which they relative finite auxiliary verb including subject atáimid 'that/which we copula including subjectSeo an fear...'This is the interrogative copulacé leis an leabhar 'whose is the bare infinitiveBa mhaith liom fanacht 'I would like to stay'

18 Dependency Tags: Grammatical subjectChonaic Seán Máire, 'Seán saw subject of aspectual phrasebhí sé ag obair 'he was subject of infinitive (intrans)an obair a bheith déanta 'the work to be subject or obj. of relative clausea chonaic an bhean, 'that the woman saw' OR 'that saw the subject of relative clausea rinne sé 'that he objectChonaic Seán Máire, 'Seán saw object of aspectualag déanamh oibre, 'doing object of infinitivebainne a ól, 'to drink predicateTá sé mór 'It is unlabelled noun head, e.g. list item, apposition, or fragment 1) dathuithe, 2) leasaithigh, '1) colours, 2) co-ordinating conjunctionagus clause boundarye.g. agus ‘and’ when followed by a verb, and subordinating conjs.etc.

19 Dependency Tags: Head Modifiers (Unlabelled adverbial particle dependent on the adjective to the right go ciúin pre-modifier dependent on the first noun to the rightan pre-verbal particle dependent on a verb to the rightní adverbial post noun post-modifierteach mór 'big noun dependent on the preceding prep.ag an doras 'at the noun dependent on compound preposition is in genitive case tar éis na Nollag, after pronoun post-mod.é féin dependent on predicateIs deas an lá é 'It is a nice day' i.e. Is nice the day adverbialanocht augment pronoun dependent on subj. to the rightIs é Seán …, It/He, Seán is…

20 Dependency Tags: Prepositional head adverbial adjunctag an doras 'at the head of an aspectualag rith '(at) ‘at X’ meaning ‘X has’ag Seán, 'Seán has' i.e. at Seán negativegan dul 'without oblique PP headdo Mháire ‘to prep + subj pronounD'éirigh liom, 'I succeeded' i.e. success was with PredicativeIs liom é 'It is mine' i.e. Is with me it stativeina rí 'is a king' i.e. 'in his king(hood)'

21 Parsing Methodology: Constraint Grammar Aims (Karlsson et al., 1995)  assign the appropriate morphological and syntactic information according to the context of each token or larger structure in the text;  assign an analysis to every string in the input, bearing in mind that unrestricted text will contain typographical errors, non-sentential fragments, dialectal and colloquial material;  if an ambiguity cannot be resolved, the alternative analyses are retained rather than forcing a (possibly incorrect) choice

22 Constraint Grammar Principles Differences between CG and other parsing methodologies (Karlsson, 1995, p37).  Unlike a context-free grammar, a Constraint Grammar does not attempt to define the set of grammatical sentences in a language.  ‘... everything is licensed which is not explicitly ruled out’  makes it more robust in handling unrestricted text  Does not aim to produce a minimal set of general rules – a CG grammar can contain many specific lexically-specific rules to handle special cases.  Doesn’t attempt to determine constituency structure.

23 CG Dependency Rules MAP TARGET (POS) IF (CONDITIONS); e.g.  MAP TARGET (Verb) IF (NOT 0 VSYNTH OR AUX) (NOT -1 RELPART) (NOT -2 RELPART); SETS  LIST VSYNTH = (Verb 1P) (Verb 2P) (Verb 3P) (Verb Auto) ;  LIST AUX = ("bí") ("téigh") ("tosaigh") ("tosnaigh") ("féad") ("caith") ("féach");  LIST RELPART = (Vb Rel) (Prep Rel) ;

24 Order of Implementation of Rules Dependency Analysis is carried out in the following order:  Clause Boundaries  Verbs and/or Copulas  Preposition Heads  All Dependent Modifiers  Subject  Predicates of Copular Constructions  Object(s)  Adverbials  Other

25 Example (1) Fuair faigh+Verb+VT+PastInd sé sé+Pron+Pers+3P+Sg+Masc+Sbj leabhar leabhar+Noun+Masc+Com+Sg ins i+Prep+Art+Sg an an+Art+Sg+Def siopa siopa+Noun+Masc+Com+Sg+DefArt Fuair sé leabhar ins an siopa Got he book in the shop V Pro N Prep ’He got a book in the shop’

26 Example (1) Fuair sé leabhar ins an siopa Got he book in the shop V Pro N Prep ’He got a book in the shop’

27 Example (1) root Fuair sé leabhar ins an siopa Got he book in the shop V Pro N Prep ’He got a book in the shop’

28 Example (2) Chonaic Máire an fear a bhí ag ithe Saw Máire the man that was at eating V N Det N Rel V ‘Máire saw the man that was eating’ ag ithe ‘eating’

29 Development/Test Cycle CG Mapping Rules Test against Gold Std. Dependency Analysis POS Tagged Text

30 Evaluation of Dependency Analysis Sample Sentences: 225 short grammatical sentences Precision (Test Suite): Gold Standard Dependency Analysis Corpus  250 sentences randomly selected from the 3,000 sentence Gold Standard POS Tagged Corpus Gold Standard Development Set (150 Sentences) Tot TokensPunct. TokensTokensCorrectIncorrectF-Score Gold Standard Test Set (150 Sentences) Tot TokensPunct. TokensTokensCorrectIncorrectF-Score

31 Chunking Using the Dependency Annotations and a Regular Expression Grammar (implemented using Xerox Finite-State Tools[1]) we can identify phrase-like structures, described by Abney (1991) as 'chunks'.[1] [1] For details see [1]

32 Implementation Regular expressions and Xerox FST Chunks  [NP.. ], [V.. ] etc. PP with embedded NP [PP.. [NP.. ] ] Conjunction with embedded conjoint  [CJ2.. [?] ] [NP úlla ] [CJ2 agus [NP oráistí NP] ] ‘apples and oranges’ Aspectual phrases  [ASP [PP-ASP.. [NP..] ] ([OA..]) ] [ASP [PP-ASP ag [NP dúnadh ] ] [OA an dorais] ]] ‘closing the door’

33 Example (3) " ""bí" Verb VI " ""sé" Pron Pers 3P Sg Masc " ""ag" Prep " ""rith" Verbal Noun running ‘He is running’ [S [V Tá ] [NP sé NP] [ASP [PP-ASP ag [NP rith NP] PP-ASP] ASP] S]

34 Regular Expession Chunker ########################################################### # Verb Chunk Dependency Tags ########################################################### define VTag define VSTag define PreVTag # Verb Pre & Post Modifiers define PreVStr [TokLemMTag PreVTag SP]; # Verb Chunk define VStr [TokLemMTag VTag SP]; define VChunk [PreVStr* VStr]; define VChunkBr "[V "... " ] "]; # Verb_Subject Chunk define VSStr [TokLemMTag VSTag SP]; define VSChunk [PreVStr* VSStr]; define VSChunkBr "[VS "... " ] "];

35 Example (4) I ‘I am making a cake’ [S [V Tá V] [NP mé NP] [ASP [PP-ASP ag [NP déanamh NP] PP-ASP] [OA cáca OA] ASP]..+Punct+Fin S]

36 Corpus Data Ach sin an toradh is measa a fhéadfadh tarlú don pháirtí agus déarfaidís leat nár cóir an iomad airde a thabhairt do na pobalbhreitheanna nach raibh riamh fabhrach do na páirtithe beaga. 'But that is the worst possible result for the party and they would say to you that it is not right to pay too much attention to the opinion polls that were never favourable to small parties.‘

37 Dependency Analysis [S [CONJAch ] [COPSin ] [NPan toradh is measa NP] [VPa fhéadfadh ] [INFtarlú INF] [PPdon [NPpháirtí NP] PP] [CBagus ] [VSdéarfaidís [PP leat PP] [COP nár ] [PREDcóir ] [INF an iomad airde [I a thabhairt I] INF] [PP do [NP na pobalbhreitheanna NP]PP] [V nach raibh ] [PRED riamh ] fabhrach ] [PP do [NP na páirtithe beaga NP] PP]. +Punct+Fin S]

38 Evaluation of Chunker Evalb program used to evaluate bracketing of 250 sens. 150 Development Set Sentences ALL SENTENCESSENTENCES Len<40 Number of sentence150Number of sentence120 Bracketing Recall96.26Bracketing Recall97.31 Bracketing Precision98.15Bracketing Precision98.57 Bracketing F-Measure97.20Bracketing F-Measure97.94 ALL SENTENCESSENTENCES Len<40 Number of sentence100Number of sentence85 Bracketing Recall92.89Bracketing Recall94.09 Bracketing Precision94.12Bracketing Precision94.09 Bracketing FMeasure93.50Bracketing FMeasure Test Set Sentences

39 Future Work Partial Parsing to date as we have not addressed  Co-ordination He packed his [clothes] and [shoes] [He packed his clothes] and [left]  PP-attachment [He] [stabbed] [the man with the knife] [He] [stabbed] [the man] [with the knife]  PP-function locative vs. stative adjunct v.s. indirect object adding additional info in the FS Lexicons, e.g. noun sub- classes, subcategorisation frames for verbs Irish Text Processing Tools: