Download presentation
Presentation is loading. Please wait.
Published byHelen Ball Modified over 6 years ago
1
Linguist’s Assistant: An Analysis of the Required Modifications when Converting a Computational Tagalog Grammar into an Ayta Mag-Indi Grammar The 13th National NLP Research Symposium April 21st, 2017 Dr. Tod Allman Graduate Institute of Applied Linguistics
2
Presentation Overview
1) Introduction to Linguist’s Assistant (LA) 2) Results of the Ayta Mag-Indi Project 3) Surface Structure Similarities between Tagalog and Ayta Mag-Indi 4) Deep Structure Similarities between Tagalog and Ayta Mag-Indi 5) Tests to determine the Quality of LA’s texts 2
3
Linguist’s Assistant: A Multilingual Natural Language Generator
Generates initial draft translations that are: easily understandable grammatically perfect semantically equivalent to the source documents at approximately a fifth grade reading level Employs linguistic techniques rather than stochastic techniques The texts quadruple the productivity of experienced mother-tongue translators 3
4
Model of Linguist’s Assistant
Show video after this slide 4
5
Five Components of every NLG System
1) Semantic Representations 2) Ontology 3) Transfer Grammar 4) Synthesizing Grammar 5) Lexicon 5
6
LA’s Semantic Representational System
Semantically simple concepts in structurally simple sentences. 6
7
LA’s Feature System - Nouns
Number Singular, Dual, Trial, Quadrial, Plural, Paucal Participant Tracking First Mention, Routine, Generic, Interrogative, Frame Inferable, … Polarity Affirmative, Negative Proximity Near Speaker and Listener, Near Speaker, Near Listener, Remote within Sight, Remote out of Sight Person First, Second, Third, First Inclusive, First Exclusive 7
8
LA’s Ontology Concepts are Precisely Defined
break-A someone breaks something (John broke the window.) break-B to break a bone (John broke his leg.) break-C to break or disobey a law (John broke the law.) break-D something breaks (intransitive) The window broke. break-E to break a promise (John broke his promise.) 8
9
LA’s Transfer Grammar 9
10
LA’s Transfer Grammar 1) Insert Complex Concepts ‘to sign,’ ‘foreigner,’ ‘blind,’ etc. 2) Collocation Correction ‘ganda’ -> ‘buti’ (good book, person, food) 3) Theta Grid Adjustment Rules X respects Y -> X lifts up Y’s name X loves Y -> X sits happily with Y X obeys Y -> X hears Y’s talk 4) Structural Adjustment Rules ‘ganda’ is the word usually used to translate the English word ‘good.’ But for a ‘good book’ or ‘good food’ or ‘good man,’ we use ‘buti.’ 10
11
LA’s Synthesizing Grammar
11
12
LA’s Synthesizing Grammar
1) Spellout Rules Insert Case Markers (ang/si, ng/ni, sa/kay) 2) Phrase Structure Rules Put constituents in their proper order 3) Pronoun Rules Identify where pronouns can be used 4) Morphophonemic Rules Change the Relativizer ‘na’ to ‘-ng’ “The man that saw John …” “lalaking nakakita kay Juan …” 12
13
Ayta Mag-Indi Approximately 5,000 speakers Spoken near Pampanga
Language status: stable and developing Lexical similarity with Filipino: 38% Lexical similarity with Kapampangan: 51% Ethnologue.com 13
14
Number of Meetings Tagalog Ayta Mag-Indi Story #1 (2 pages) 38 6
50 7 Note: For Ayta Mag-Indi I didn’t have to change even one rule in the transfer grammar. The only rules I changed were in the synthesizing grammar, particularly the spellout rules and the morphophonemic rules. I’ll discuss these later. 14
15
Transfer Grammar Rules
Edited New Complex Concept Insertion Rules Feature Adjustment Rules Styles of Direct Speech Target Tense/Aspect/Mood Rules Relative Clause Strategies Collocation Correction Rules Genitival Noun-Noun Relationships Theta Grid Adjustment Rules Structural Adjustment Rules 15
16
Example of a Transfer Rule
16
17
Synthesizing Grammar Rules
Edited New Feature Copying Rules Spellout Rules 5 Clitic Rules Movement Rules Phrase Structure Rules 1 Pronoun Identification Rules Pronoun Spellout Rules Morphophonemic Rules 34 Find/Replace Rules 17
18
Example of a Synthesizing Rule
18
19
Tagalog / Ayta Mag-Indi Case Markers
Common Proper Ergative ng / un ni / -n Absolutive ang / ya si / si Oblique sa / sa kay / kan The Tagalog stem is ‘suntok’ and the Ayta stem is ‘dugê’. English: John hit Bill. Tagalog: Sinuntok ni Juan si Bill. Ayta Mag-Indi: Dinugun Juan si Bill. 19
20
LA Rule that inserts Tagalog Case Markers
20
21
LA Rule that inserts Ayta Mag-Indi Case Markers
21
22
Tagalog / Ayta Mag-Indi Personal Pronouns
Absolutive Ergative Oblique 1st Sg. ako / aku ko / ku akin / kangku 1st Incl. tayo / kitamu natin / tamu atin / kantamu 1st Excl. kami / kay namin / yan amin / kanyan 2nd Sg. ka / ka mo / mu iyo / kamu 2nd Pl. kayo / kaw ninyo / yu inyo / kamuyu 3rd Sg. siya / ya niya / na kanya / kana 3rd Pl. sila / sila nila / la kanila / kalla 22
23
Tagalog / Ayta Mag-Indi Possessive Pronouns
1st Sg. aking/ku 1st Incl. ating/tamu 1st Excl. aming/yan 2nd Sg. iyong/mu 2nd Pl. inyong/yu 3rd Sg. kanyang/na 3rd Pl. kanilang/la 23
24
Tagalog Pronoun Rule 24
25
Tagalog Possessive Pronouns
“I saw John’s book.” “Nakita ko ang aklat ni Juan.” “I saw his book.” “Nakita ko ang kanyang aklat.” I saw John’s book. Nakita ko ang aklat ni John. I saw his book. Nakita ko ang kanyang aklat. If we put the possessive pronoun after the noun, it becomes Nakita ko ang aklat niya. I saw my book. Nakita ko ang aking aklat. or Nakita ko ang aklat ko. 25
26
Tagalog Possessive Pronouns
26
27
Ayta Mag-Indi Possessive Pronouns
“I saw John’s book.” “Nakit kuy libron Juan.” “I saw his book.” “Nakit kuy libron na.” 27
28
Ayta Mag-Indi Possessive Pronouns
28
29
Ayta Mag-Indi Morphophonemic Rule
29
30
Tagalog and Ayta Mag-Indi Particles
Relativizer na ya Complementizer Possessive Marker ni (mata ni Juan) -n (mata-n Juan) Adjectivizer ma- (ma-linis) Adverbializer (ma-buti) (ma-ngêd) Verb Phrase Ligature -ng (Pumarito ka-ng mabilis …) (Maku ka-n tambêng …) ‘mata ni Juan’ John’s eyes ‘malinis’ clean ‘mabuti’ thoroughly ‘pumarito kang mabilis’ ‘come quickly …’ 30
31
Similarities at Surface Structure
English: Title: Melissa’s Eyes are Sore Tagalog: Pamagat: Makirot ang mga mata ni Melissa. Ayta Mag-Indi: Pamagat: Makirot ya mani matan Melissa. 31
32
Similarities at Surface Structure
English: But Melissa was not happy because her eyes were very sore. Tagalog: Ngunit hindi masaya si Melissa dahil napakakirot nang kanyang mga mata. Ayta Mag-Indi: Nuwa asê masaya si Melissa gawan napakakirot un mani mata na. But Melissa was not happy because her eyes were very sore. Tagalog can put a possessive pronoun after the noun, but it’s a different form. This sentence would become “… nang mga mata niya.” 32
33
Similarities at Deep Structure Deletion of Verb Phrase Ligature
English: Melissa shouted, “Alex, come into my house.” Tagalog: Sumigaw si Melissa, "Alex, pumarito ka sa loob ng aking bahay.” Ayta Mag-Indi: Nan-angaw si Melissa, "Alex, maku ka sa lalên bali ku.” Melissa shouted, “Alex, come into my house.” Both languages delete their verbal connectors ‘-ng/-n’ if the next word is a preposition that begins with ‘sa’. 33
34
Similarities at Deep Structure Pronominal Length
English: He gave a book to you. Tagalog: Binigyan ka niya ng libro. gave you he Ergative book. Ayta Mag-Indi: Binyan na ka-n libru. gave he you-Ergative book. He gave the book to you. In both languages, single syllable pronouns precede multi-syllable pronouns. In this Ayta Mag-Indi example, both pronouns are single syllable, so the subject pronoun precedes the indirect object pronoun. 34
35
Malayo-Polynesian Language Family Tree
The next language I might work in is Rinconada, which is very closely related to Bikol. So it should be much more similar to Tagalog than Ayta Mag-Indi. 35
36
Things LA Cannot Do gising “to wake up” gumising - nagising
takas “to escape” tumakas - nakatakas ‘gumising’ means to wake up because you’ve had enough sleep, but ‘nagising’ means to wake up because of some disturbance such as a loud Jeepney, a dream, an earthquake, etc. ‘tumaka’ means a premeditated escape, but ‘nakatakas’ means a spur of the moment escape. Our semantic representations don’t include this kind of information, so the software can’t generate these forms. So we choose whichever one we think is the most common. 36
37
Experiments for Testing the Content and Quality of the Texts
Backtranslation Experiments Comprehension Questions Productivity Experiments Quality Experiments 37
38
Quality Experiments for Jula
Manual Equal 12 11 17 38
39
Quality Experiments for Korean
LA Manual Equal 88 71 33 39
40
Quality Experiments for Tagalog
LA Manual Equal 53 60 56 24 control questionnaires – 2 outliers 40
41
Linguist’s Assistant: An Analysis of the Required Modifications when Converting a Computational Tagalog Grammar into an Ayta Mag-Indi Grammar The 13th National NLP Research Symposium April 21st, 2017 Dr. Tod Allman Graduate Institute of Applied Linguistics
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.