Presentation on theme: "1 Partial Dependency Parsing for Irish Elaine Uí Dhonnchadha & Josef Van Genabith."— Presentation transcript:
1 Partial Dependency Parsing for Irish Elaine Uí Dhonnchadha & Josef Van Genabith
2 Aims of the Research To be able to parse and/or chunk unrestricted Irish text To account for as much of the syntactic phenomena of Irish as possible in an efficient and principled way To use open-source software a far as possible
3 Outline of the Talk Background Stages of Development for Dependency Parser Chunker Future Work
4 Irish Language – some facts Celtic Language Goidelic (Irish, Manx, Scottish Gaelic) Brittonic (Breton, Cornish, Welsh) Verb – Subject – Object sentence word order Chaith Seán an liathróid. Threw Seán the ball. V S O ‘Seán threw the ball’ Fixed word order
5 Irish Language Inflectional language gender: fem/masc case: common/genitive/vocative verbs inflected for number and person chuala mé, I heard (analytic) chualamar, we heard (synthetic) Initial mutation of words cailín ‘girl’, an chailín ‘the girl’ arán ‘bread’, an t-arán ‘the bread’ seachtain ‘week’, an tseachtain ‘the week’ bord ‘table’, ar an mbord ‘on the table’
6 Irish Language Prepositions inflected for person and number. Labhair sé liom faoi Spoke he with-me about-it ‘He spoke to me about it’ Tabhair dom é Give to-me it ‘Give it to me’ Full paradigm for every preposition liom‘with-me’ leat‘with-you’ leis‘with-him/it ETC. ETC.
7 Irish Language Verbal noun - used in progressives, perfects, infinitives, etc. De-verbal nouns: bris(v) ‘break’, briseadh(vn) ‘breaking De-agentive nouns: feirmeoir(n) ‘farmer’, feirmeoireacht(vn) ‘farming’ Progressive Tá mé ag oscailt an dorais Is he at opening(vn) the door(gen) ‘He is opening the door’ After Perfect Tá mé tar_éis an doras a oscailt Is me after the door PRT opening(vn) ‘I am after opening the door’
8 Parsing Methodology Dependency Analysis & Constituency Analysis Dependency Analysis Relationships between pairs of words Grammatical Functions and Head-Modifier dependencies Root and terminal nodes Constituency Analysis Phrase Structure Rules, e.g. S = NP VP Hierarchical structure; root, phrase categories, leaf/terminal nodes
9 Dependency Analysis Issues in the theoretical syntax of Irish on which there is no clear concensus … The non-adjacency of verb and object in a VSO language, i.e. difficulties with VP Some periphrastic aspectual constructions in Irish, e.g. progressive aspect has more nominal than verbal characteristics … Dependency Analysis includes semantic as well as synactic information
10 Dependency Parsing A dependency analysis looks at dependencies between pairs of words (which do not have to be adjacent) in a sentence The tokens present in the input string are annotated without introducing any abstract categories (e.g. phrasal nodes) i.e. dependency analysis consists of a root, and leaf nodes, without intermediate levels Grammatical functions such as subject, object, predicate, as well as various types of prepositional phrase, e.g. adverbial, aspectual, predicative, etc. are annotated Clauses and head-modifier dependencies are identified
11 Dependency Parsing Surface-oriented, bottom-up parsing Dependency relations between pairs of tokens Grammatical functions Head-modifier relations Tokens not necessarily adjacent. V Det N Det N Bhris an fear a rúitín Broke the man his ankle ‘The man broke his ankle’ DO S
12 Previous NLP Work Tokenization & Morphological Analysis Finite-State Morphology: (Karttunen, Beesley, 1999; 2003) Finite-State Morphological Analyser & Generator for Irish: (Uí Dhonnchadha, 2002) POS Tagging and Parsing Constraint Grammar (CG): Karlsson et al (1995), Constraint Grammar Parser CG-2 (Tapanainen, 1996), VISL CG3 (Bick et al, 2003...) http://visl.sdu.dkhttp://visl.sdu.dk Chunking: Partial Parsing via Finite-State Cascades (Abney, 1996)
13 Stages of Development Define the Syntactic Phenomena to include Gather Test Data Decide on Parsing Methodology Decide a Tag-Set for dependency and grammatical relations Develop Linguistic Rules for dependency analysis Test the rules Evaluate the results
14 Syntactic Phenomena Sources of Information Grammar books Previous research on aspects of Irish Syntax Simple declarative sentences (incl. neg. and interrogative) Relative clauses Copular constructions Non-finite complements Adjuncts
15 Test Data (Gold Standard) Sample Sentences Short invented grammatical sentences (225) based on grammar books etc. Automatically POS tagged and manually checked and corrected Dependency tagged and manually checked and corrected Chunked and manually checked and corrected Corpus Data Corpus data – 250 real sentences randomly selected from the 3000 sentence Gold Standard POS Tagged Corpus Dependency tagged, chunked and manually checked and corrected
16 Tag Set Grammatical Functions @SUBJ, @OBJ, @FMV, @FAUX @CLB, etc. Unlabelled depedencies @>N, @N<, @P<, etc. Start with the @ symbol, by convention, to distinguish them from morphosyntatic tags “Fuair”faigh +Verb+VTI+PastInd+Len+@FMV This tagset follows the style of tags described for English (Karlsson, 1995), and for Danish (Bick, 2003), However, there is not a prescribed list of tags for CG, which allows us to tailor the tagset to the language.  Other languages are also detailed on the VISL website: http://visl.sdu.dk/corpus_linguistics.html http://visl.sdu.dk/corpus_linguistics.html
17 Dependency Tags: Verbs and Copulas @FMV finite main verbrith 'run' @FMV_SUBJ finite main verb including subjectritheamar 'we ran' @FMV_REL relative finite main verba chuala mé, 'that I heard' @FMV_REL_SUBJ relative finite main verb incl. subjecta chualamar, 'that we heard' @FAUX finite auxiliary verbTá sé ag cócaireacht 'He is cooking' @FAUX_SUBJ finite auxiliary verb including subjecttáimid 'we are' @FAUX_REL relative finite auxiliary verbatá siad 'that/which they are' @FAUX_REL_SUBJ relative finite auxiliary verb including subject atáimid 'that/which we are' @COP copulaIs @COP_SUBJ copula including subjectSeo an fear...'This is the man...' @COP_WH interrogative copulacé leis an leabhar 'whose is the book' @INF bare infinitiveBa mhaith liom fanacht 'I would like to stay'
18 Dependency Tags: Grammatical Relations @SUBJ subjectChonaic Seán Máire, 'Seán saw Máire' @SUBJ_ASP subject of aspectual phrasebhí sé ag obair 'he was working' @SUBJ_INF subject of infinitive (intrans)an obair a bheith déanta 'the work to be done' @SUBJ_OR_OBJ subject or obj. of relative clausea chonaic an bhean, 'that the woman saw' OR 'that saw the woman' @SUBJ_REL subject of relative clausea rinne sé 'that he made' @OBJ objectChonaic Seán Máire, 'Seán saw Máire' @OBJ_ASP object of aspectualag déanamh oibre, 'doing work' @OBJ_INF object of infinitivebainne a ól, 'to drink milk' @PRED predicateTá sé mór 'It is big' @NP unlabelled noun head, e.g. list item, apposition, or fragment 1) dathuithe, 2) leasaithigh, '1) colours, 2) additives' @CC co-ordinating conjunctionagus 'and' @CLB clause boundarye.g. agus ‘and’ when followed by a verb, and subordinating conjs.etc.
19 Dependency Tags: Head Modifiers (Unlabelled Dependencies) @>ADJ adverbial particle dependent on the adjective to the right go ciúin 'quietly' @>N pre-modifier dependent on the first noun to the rightan 'the' @>V pre-verbal particle dependent on a verb to the rightní 'not' @ADVL< adverbial post modifier @N< noun post-modifierteach mór 'big house' @P< noun dependent on the preceding prep.ag an doras 'at the door' @PC< noun dependent on compound preposition is in genitive case tar éis na Nollag, after Christmas @PN< pronoun post-mod.é féin 'himself' @PRED< dependent on predicateIs deas an lá é 'It is a nice day' i.e. Is nice the day it @ADVL adverbialanocht 'tonight' @AUG>SUBJ augment pronoun dependent on subj. to the rightIs é Seán …, It/He, Seán is…
20 Dependency Tags: Prepositional Phrases @PP_ADVL head adverbial adjunctag an doras 'at the door' @PP_ASP head of an aspectualag rith '(at) running' @PP_HAS ‘at X’ meaning ‘X has’ag Seán, 'Seán has' i.e. at Seán (possession) @PP_NEG negativegan dul 'without going' @PP_OBL oblique PP headdo Mháire ‘to Máire’ @PP_SUBJ prep + subj pronounD'éirigh liom, 'I succeeded' i.e. success was with me' @PP_PRED PredicativeIs liom é 'It is mine' i.e. Is with me it (ownership) @PP_STAT stativeina rí 'is a king' i.e. 'in his king(hood)'
21 Parsing Methodology: Constraint Grammar Aims (Karlsson et al., 1995) assign the appropriate morphological and syntactic information according to the context of each token or larger structure in the text; assign an analysis to every string in the input, bearing in mind that unrestricted text will contain typographical errors, non-sentential fragments, dialectal and colloquial material; if an ambiguity cannot be resolved, the alternative analyses are retained rather than forcing a (possibly incorrect) choice
22 Constraint Grammar Principles Differences between CG and other parsing methodologies (Karlsson, 1995, p37). Unlike a context-free grammar, a Constraint Grammar does not attempt to define the set of grammatical sentences in a language. ‘... everything is licensed which is not explicitly ruled out’ makes it more robust in handling unrestricted text Does not aim to produce a minimal set of general rules – a CG grammar can contain many specific lexically-specific rules to handle special cases. Doesn’t attempt to determine constituency structure.
23 CG Dependency Rules MAP (@TAG) TARGET (POS) IF (CONDITIONS); e.g. MAP (@FMV) TARGET (Verb) IF (NOT 0 VSYNTH OR AUX) (NOT -1 RELPART) (NOT -2 RELPART); SETS LIST VSYNTH = (Verb 1P) (Verb 2P) (Verb 3P) (Verb Auto) ; LIST AUX = ("bí") ("téigh") ("tosaigh") ("tosnaigh") ("féad") ("caith") ("féach"); LIST RELPART = (Vb Rel) (Prep Rel) ;
24 Order of Implementation of Rules Dependency Analysis is carried out in the following order: Clause Boundaries Verbs and/or Copulas Preposition Heads All Dependent Modifiers Subject Predicates of Copular Constructions Object(s) Adverbials Other
25 Example (1) Fuair faigh+Verb+VT+PastInd sé sé+Pron+Pers+3P+Sg+Masc+Sbj leabhar leabhar+Noun+Masc+Com+Sg ins i+Prep+Art+Sg an an+Art+Sg+Def siopa siopa+Noun+Masc+Com+Sg+DefArt Fuair sé leabhar ins an siopa Got he book in the shop V Pro N Prep Det N @FMV @SUBJ @OBJ @PP_ADVL @>N @
"name": "25 Example (1) Fuair faigh+Verb+VT+PastInd sé sé+Pron+Pers+3P+Sg+Masc+Sbj leabhar leabhar+Noun+Masc+Com+Sg ins i+Prep+Art+Sg an an+Art+Sg+Def siopa siopa+Noun+Masc+Com+Sg+DefArt Fuair sé leabhar ins an siopa Got he book in the shop V Pro N Prep Det N @FMV @SUBJ @OBJ @PP_ADVL @>N @
26 Example (1) Fuair sé leabhar ins an siopa Got he book in the shop V Pro N Prep Det N @FMV @SUBJ @OBJ @PP_ADVL @>N @P< ’He got a book in the shop’
27 Example (1) root Fuair sé leabhar ins an siopa Got he book in the shop V Pro N Prep Det N @FMV @SUBJ @OBJ @PP_ADVL @>N @
"name": "27 Example (1) root Fuair sé leabhar ins an siopa Got he book in the shop V Pro N Prep Det N @FMV @SUBJ @OBJ @PP_ADVL @>N @
28 Example (2) Chonaic Máire an fear a bhí ag ithe Saw Máire the man that was at eating V N Det N Rel V Prep VN @FMV @SUBJ @>N @SUBJ_REL @>V @FAUX @PP_ASP @
"name": "28 Example (2) Chonaic Máire an fear a bhí ag ithe Saw Máire the man that was at eating V N Det N Rel V Prep VN @FMV @SUBJ @>N @SUBJ_REL @>V @FAUX @PP_ASP @
N @SUBJ_REL @>V @FAUX @PP_ASP @
29 Development/Test Cycle CG Mapping Rules Test against Gold Std. Dependency Analysis POS Tagged Text
30 Evaluation of Dependency Analysis Sample Sentences: 225 short grammatical sentences Precision (Test Suite): Gold Standard Dependency Analysis Corpus 250 sentences randomly selected from the 3,000 sentence Gold Standard POS Tagged Corpus Gold Standard Development Set (150 Sentences) Tot TokensPunct. TokensTokensCorrectIncorrectF-Score 44034443959370625393.60 Gold Standard Test Set (150 Sentences) Tot TokensPunct. TokensTokensCorrectIncorrectF-Score 25552822273214313094.28
31 Chunking Using the Dependency Annotations and a Regular Expression Grammar (implemented using Xerox Finite-State Tools) we can identify phrase-like structures, described by Abney (1991) as 'chunks'.  For details see http://www.cis.upenn.edu/~cis639/docs/xfst.html 
32 Implementation Regular expressions and Xerox FST Chunks [NP.. ], [V.. ] etc. PP with embedded NP [PP.. [NP.. ] ] Conjunction with embedded conjoint [CJ2.. [?] ] [NP úlla ] [CJ2 agus [NP oráistí NP] ] ‘apples and oranges’ Aspectual phrases [ASP [PP-ASP.. [NP..] ] ([OA..]) ] [ASP [PP-ASP ag [NP dúnadh ] ] [OA an dorais] ]] ‘closing the door’
33 Example (3) " ""bí" Verb VI PresInd @FAUXIs " ""sé" Pron Pers 3P Sg Masc Sbj @SUBJhe " ""ag" Prep Simp @PP_ASPat " ""rith" Verbal Noun VTI @P< running ‘He is running’ [S [V Tá bí+Verb+VI+PresInd+@FAUX ] [NP sé sé+Pron+Pers+3P+Sg+Masc+Sbj+@SUBJ NP] [ASP [PP-ASP ag ag+Prep+Simp+@PP_ASP [NP rith rith+Verbal+Noun+VTI+@P< NP] PP-ASP] ASP] S]
35 Example (4) Tábí+Verb+VI+PresInd+@FAUXIs mémé+Pron+Pers+1P+Sg+@SUBJ_ASP I agag+Prep+Simp+@PP_ASPat déanamhdéanamh+Verbal+Noun+VTI+@P
"name": "35 Example (4) Tábí+Verb+VI+PresInd+@FAUXIs mémé+Pron+Pers+1P+Sg+@SUBJ_ASP I agag+Prep+Simp+@PP_ASPat déanamhdéanamh+Verbal+Noun+VTI+@P
36 Corpus Data Ach sin an toradh is measa a fhéadfadh tarlú don pháirtí agus déarfaidís leat nár cóir an iomad airde a thabhairt do na pobalbhreitheanna nach raibh riamh fabhrach do na páirtithe beaga. 'But that is the worst possible result for the party and they would say to you that it is not right to pay too much attention to the opinion polls that were never favourable to small parties.‘
37 Dependency Analysis [S [CONJAch ach+Conj+Subord+@CLB ] [COPSin sin+Cop+Pro+Dem+@COP_SUBJ ] [NPan an+Art+Sg+Def+@>N toradh toradh+Noun+Msc+Com+Sg+DefArt+@PRED is is+Part+Sup+@>ADJ measa olc+Adj+Comp+@N< NP] [VPa a+Part+Vb+Rel+Direct+@CLB fhéadfadh féad+Verb+VTI+Cond+Len+@FAUX_REL ] [INFtarlú tarlú+Verbal+Noun+VTI+@INF INF] [PPdon do+Prep+Art+Sg+@PP_ADVL [NPpháirtí páirtí+Noun+Masc+Com+Sg+Len+@P< NP] PP] [CBagus agus+Conj+Coord+@CLB ] [VSdéarfaidís abair+Verb+VTI+Cond+3P+Pl+@FMV+SUBJ] [PP leat le+Pron+Prep+2P+Sg+@PP_ADVL PP] [COP nár is+Cop+Past+Rel+Neg+@CLB ] [PREDcóir cóir+Adj+Base+@PRED ] [INF an an+Art+Sg+Def+@>N iomad iomad+Subst+Noun+Sg@OBJ_INF airde aird+Noun+Fem+Gen+Sg+@N< [I a a+Prep+Simp+@PP_INF thabhairt tabhairt+Verbal+Noun+VTI+Len+@P< I] INF] [PP do do+Prep+Simp+@PP_ADVL [NP na na+Art+Pl+Def+@>N pobalbhreitheanna pobalbhreith+Noun+Fem+Com+Pl+@P< NP]PP] [V nach nach+Part+Vb+Neg+Rel+@CLB raibh bí+Verb+PastInd+Neg+Len+@FMV_REL ] [PRED riamh riamh+Adv+Its+@>ADJ ] fabhrach fabhrach+Adj+Base+@PRED ] [PP do do+Prep+Simp+@PP_ADVL [NP na na+Art+Pl+Def+@>N páirtithe páirtí+Noun+Masc+Com+Pl+DefArt+@P< beaga beag+Adj+Com+NotSlen+Pl+@N< NP] PP]. +Punct+Fin S]
38 Evaluation of Chunker Evalb program used to evaluate bracketing of 250 sens. 150 Development Set Sentences ALL SENTENCESSENTENCES Len<40 Number of sentence150Number of sentence120 Bracketing Recall96.26Bracketing Recall97.31 Bracketing Precision98.15Bracketing Precision98.57 Bracketing F-Measure97.20Bracketing F-Measure97.94 ALL SENTENCESSENTENCES Len<40 Number of sentence100Number of sentence85 Bracketing Recall92.89Bracketing Recall94.09 Bracketing Precision94.12Bracketing Precision94.09 Bracketing FMeasure93.50Bracketing FMeasure94.09 100 Test Set Sentences
39 Future Work Partial Parsing to date as we have not addressed Co-ordination He packed his [clothes] and [shoes] [He packed his clothes] and [left] PP-attachment [He] [stabbed] [the man with the knife] [He] [stabbed] [the man] [with the knife] PP-function locative vs. stative adjunct v.s. indirect object adding additional info in the FS Lexicons, e.g. noun sub- classes, subcategorisation frames for verbs Irish Text Processing Tools: http://www.scss.tcd.ie/Elaine.UiDhonnchadha/irish.utf8.htm http://www.scss.tcd.ie/Elaine.UiDhonnchadha/irish.utf8.htm
Your consent to our cookies if you continue to use this website.