Presentation is loading. Please wait.

Presentation is loading. Please wait.

Development of a German- English Translator Felix Zhang.

Similar presentations


Presentation on theme: "Development of a German- English Translator Felix Zhang."— Presentation transcript:

1 Development of a German- English Translator Felix Zhang

2 Summary of Quarter 1 Rule-based part of speech tagging Morphological analysis Created dictionary Completely avoided statistical methods

3 Scope Expanded Now includes statistical methods –Part of speech tagging using corpus –Rule-based only as backup

4 Statistical language processing State-of-the-art –Find chances that n-grams will translate into something else Method is much simpler than current techniques –Context-free –Based on frequency of occurrence

5 New Components Lemmatizer Noun-verb agreement Inflection Lookup Noun-phrase chunking Statistical part of speech tagging

6 Lemmatizer Break down words into root form Takes info from morphological analysis Does not consider stop words Sample input: “Der Mann macht die Kinder” (“the man makes the children”)‏ [['Mann', ['Mann']], ['macht', ['machen']], ['Kinder', ['Kinder', 'Ki', 'Kinde', 'Kind']]]

7 Dictionary Lookup All pronouns and definite articles Small sample of nouns and verbs for testing Looks up lemmatized words [['der', 'the'], ['Mann', 'man'], ['macht', 'make'], ['die', 'the'], ['Kinder', 'child']]

8 Noun phrase chunking Group noun phrases into “chunks” “The old man greets young children.” Groupings: [The old man], [greets], [young children] Use for parse trees and noun-verb agreement

9 Statistical Tagging Monolingual corpus – TIGER Corpus in German Based on frequency of tag occurrence

10 Noun-verb Agreement Disambiguation Der Mann sieht die Kinder. (The man sees the children)‏ –Der Mann: feminine singular indirect object or masculine singular subject –Die Kinder: feminine subject / direct object or plural subject / direct object –Sieht: singular, third person; or plural, second person Der Mann “agrees” with verb – Same number, person if masculine singular subject

11 Inflection Simple in English Plurals – Add –s or –es Singular verb – Add –s or –es Not yet added: Past tense

12 Full run of program fzhang@kilauea ~/research $ python proj.py Part of speech tags: [['der', 'art'], ['Mann', 'nou'], ['macht', 'ver'], ['die', 'art'], ['Kinder', 'nou']] Morphological analysis: [[['Mann', 'nou'], [['nom', 'mas'], ['dat', 'fem']]], [['macht', 'ver'], [['3', 'sing'], ['2', 'pl']], 'pres'], [['Kinder', 'nou'], [['nom', 'fem'], ['akk', 'fem'], ['nom', 'pl'], ['akk', 'pl']]]] Disambiguated after noun-verb agreement: [[['Mann', 'nou'], [['nom', 'nou']]], [['macht', 'ver'], [['3', 'sing']], 'pres'], [['Kinder', 'nou'], [['nom', 'fem'], ['akk', 'fem'], ['nom', 'pl'], ['akk', 'pl']]]] Lemmatized: [['Mann', ['Mann']], ['macht', ['machen']], ['Kinder', ['Kinder', 'Ki', 'Kinde', 'Kind']]] Root translated: [['der', 'the'], ['Mann', 'man'], ['macht', 'make'], ['die', 'the'], ['Kinder', 'child']] Inflected: man makes child child childs childs

13 Results NV-agreement can disambiguate and determine subject, reduce to 2-3 possibilities Statistical methods are NOT too complex to implement –Tagging should reach 90% accuracy

14 Problems Irregular verbs – stem changes Singular conjugation –Expected: Lesen  er lest –Actual: Lesen  er liest Strong verbs vs. Weak verbs – Past tenses –Weak: Machen  gemacht –Strong: Gehen  gegangen Must include past tense in dictionary

15 Problems Corpus file is huge – 42 megabytes –Impractical, takes long to run

16 Future research Implement more statistical methods –Morphological info –Actual translation – bilingual corpus Create parse tree – Actual grammar Method for predicting stem changes in strong verbs


Download ppt "Development of a German- English Translator Felix Zhang."

Similar presentations


Ads by Google