Download presentation
Presentation is loading. Please wait.
Published byNathan Simpson Modified over 9 years ago
1
Development of a German- English Translator Felix Zhang Period 5 2007-2008 Thomas Jefferson High School for Science and Technology Computer Systems Research Lab
2
Summary of Quarter 2 NP Chunking Lemmatization Dictionary Lookup Inflection Noun-verb agreement
3
Scope for this quarter Focus less on statistical methods Get rudimentary grammar system working Fix all the bugs I’ve made since September
4
New and Modified Components More info stored in NP chunking Better noun-verb agreement Grammar –Element Assignment –Priority Number Assignment
5
Noun-verb agreement Simple method to eliminate more ambiguities def eliminateother(attribs, sub, closest): for x in attribs: if x[0][1] == "nou" and x != sub: for y in x[1]: if y[0]== "nom": attribs[attribs.index(x)][1].remove(y) return attribs
6
Noun phrase chunking Now used for English sentences Stores more info for later methods “the man make the children” NP Chunked English: [[['the', 'art'], ['man', 'nou', [['akk', 'mas'], ['dat', 'pl']]]], ['make', 'ver', [['3', 'pl'], 'pres']], [['the', 'art'], ['small', 'adj'], ['child', 'nou', [['nom', 'pl']]]]]
7
Element Assignment Based on linguistic information If case is nominative, chunk is subject If accusative, chunk is direct object [[['the', 'art'], ['man', 'nou', [['akk', 'mas'], ['dat', 'pl']]], 'dobj'], ['make', 'ver', [['3', 'pl'], 'pres'], 'mverb'], [['the', 'art'], ['small', 'adj'], ['child', 'nou', [['nom', 'pl']]], 'sub']]
8
Priority Assignment Each sentence element is assigned priority number Based on position in English sentence Assignments: –sub 1 –mverb 2 –auxverb 3 –iobj 4 –dobj 5 Sort by number for English grammar
9
Full run of program input: “den Mann machen die kleinen Kinder” The small children make the man fzhang@ltsp1 ~/research $ python proj.py Part of speech tags: [['den', 'art'], ['Mann', 'nou'], ['machen', 'ver'], ['die', 'art'], ['kleinen', 'adj'], ['Kinder', 'nou']] Morphological analysis: [[['Mann', 'nou'], [['akk', 'mas'], ['dat', 'pl']]], [['machen', 'ver'], [['1', 'pl'], ['3', 'pl'], 'pres']], [['kleinen', 'adj'], [['nom', 'pl'], ['akk', 'pl']]], [['Kinder', 'nou'], [['nom', 'pl'], ['akk', 'pl']]]] Disambiguated after noun-verb agreement: [[['Mann', 'nou'], [['akk', 'mas'], ['dat', 'pl']]], [['machen', 'ver'], [['3', 'pl'], 'pres']], [['kleinen', 'adj'], [['nom', 'pl'], ['akk', 'pl']]], [['Kinder', 'nou'], [['nom', 'pl']]]] Lemmatized: [['Mann', ['Mann', 'Man']], ['machen', ['machen']], ['kleinen', ['klein']], ['Kinder', ['Kind']]] Root translated: [['den', 'the'], ['Mann', 'man'], ['machen', 'make'], ['die', 'the'], ['kleinen', 'small'], ['Kinder', 'child']] NP Chunked English: [[['the', 'art'], ['man', 'nou', [['akk', 'mas'], ['dat', 'pl']]]], ['make', 'ver', [['3', 'pl'], 'pres']], [['the', 'art'], ['small', 'adj'], ['child', 'nou', [['nom', 'pl']]]]] Inflected (only works before chunking): ['the', 'the'] ['man', ['akk', 'mas'], 'man'] ['man', ['dat', 'pl'], 'mans'] ['make', ['3', 'pl'], 'make'] ['the', 'the'] ['small', 'small'] ['child', ['nom', 'pl'], 'childs'] Assigned an element type: [[['the', 'art'], ['man', 'nou', [['akk', 'mas'], ['dat', 'pl']]], 'dobj'], ['make', 'ver', [['3', 'pl'], 'pres'], 'mverb'], [['the', 'art'], ['small', 'adj'], ['child', 'nou', [['nom', 'pl']]], 'sub']] Assigned priority: [['5', ['the', 'art'], ['man', 'nou', [['akk', 'mas'], ['dat', 'pl']]], 'dobj'], ['2', 'make', 'ver', [['3', 'pl'], 'pres'], 'mverb'], ['1', ['the', 'art'], ['small', 'adj'], ['child', 'nou', [['nom', 'pl']]], 'sub']] Rearranged to English structure: [['1', ['the', 'art'], ['small', 'adj'], ['child', 'nou', [['nom', 'pl']]], 'sub'], ['2', 'make', 'ver', [['3', 'pl'], 'pres'], 'mverb'], ['5', ['the', 'art'], ['man', 'nou', [['akk', 'mas'], ['dat', 'pl']]], 'dobj']]
10
Problems Ambiguities (again) –One ambiguity can change the entire structure of the sentence –“I gave a horse the hat” vs. “I gave the hat a horse” –Attempt at all permutations possible User disambiguation
11
Problems Inflexible –Grammar can only be rearranged in one specific way –Subject – Main verb – Indirect – Direct – Auxiliary Verb –Does not accommodate for prepositions, conjunctions, etc.
12
Future research Implement more statistical methods –Morphological info –Actual translation – bilingual corpus Create better parse tree – Dependency grammar
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.