Presentation on theme: "1 Phrase alignment of Estonian-German parallel treebanks Heli Uibo and Krista Liin, University of Tartu Martin Volk, Stockholm University."— Presentation transcript:
1 Phrase alignment of Estonian-German parallel treebanks Heli Uibo and Krista Liin, University of Tartu Martin Volk, Stockholm University
2 Aim and motivation Aim – the alignment of the phrases of two corpora that are each others' translations Motivation: –Example-Based Machine Translation (EBMT) –Cross-language and translation studies
3 Phrase alignment: example
4 Existing resource – The Sofie Parallel Treebank (password protected)http://omilia.uio.no/sofie/ 9 European languages, including German and Estonian initiated by the Nordic Treebank Network chapters 1-2 of Jostein Gaarder’s novel “Sophie’s World” sentences aligned syntactic structure and functions annotated, but different annotation schemes used: –German – TIGER (http://www.ims.uni- stuttgart.de/projekte/TIGER/ ) –Estonian – VISL (http://beta.visl.sdu.dk)
5 Automatic alignment of Estonian-German NPs This is the first automatic alignment of Estonian-X parallel corpora below the sentence level. We started from the automatic alignment of NPs, because –an important part of the sentence's meaning is represented by noun phrases; –NPs are the most frequent phrase types in these languages.
6 The NP alignment method 1. Find all noun phrases in the parallel sentences. Sofie legte dann immer einen dicken Stapel Post auf den Küchentisch, ehe sie auf ihr Zimmer ging, um ihre Aufgaben zu machen. Tavaliselt pani ta paksu pataka posti köögilauale, enne kui läks üles oma tuppa koolitöid tegema.
7 The NP alignment method 2. Find all correspondences between the noun phrases. Sofie legte dann immer einen dicken Stapel Post auf den Küchentisch, ehe sie auf ihr Zimmer ging, um ihre Aufgaben zu machen. Tavaliselt pani ta paksu pataka posti köögilauale, enne kui läks üles oma tuppa koolitöid tegema. 3. Remove overlapping correspondences.
8 The NP alignment method To accomplish we used online dictionaries (ET- EN and DE-EN) and annotation information: 2. To set the correspondences between Estonian and German NPs –Translate all NP heads to English; –Find the intersections of translations; –If a pair of NPs are related by translation, then set a correspondence between them. 3. To remove overlapping correspondences –Use proper names as milestones; –Look at the locations of the NPs in the sentence.
9 Results 53 sentence pairs 134 possible NP matches were found, out of which 75 matches were selected. precision 84% recall 53%
10 Sources of errors Different tree structures (German – deeper) Translation problems. We used English as an intermediary language to find German-Estonian word correspondences (there is no free German- Estonian electronic dictionary). An NP in one language may correspond to a different phrase type or to a part of an NP in the other language. A PP in German often corresponds to an NP in Estonian –A lot of grammatical information that is expressed by prepositions in German or English is expressed by grammatical cases in Estonian.
11 Alternative approach – statistical An alternative to using bilingual electronic dictionaries is the use of statistical word alignment methods. This method has been evaluated by Samuelsson (2004) for the phrase alignment of a German- Swedish parallel treebank. We intend to test this method also for a German-Estonian treebank, although we are aware of the structural differences between German and Estonian which make automatic word alignment more difficult.
12 Treebank tools There exist tools for monolingual treebanks: –editors, e.g. Annotate –treebank query tools ( tgrep, TIGERSearch ) Special software tools for building and using of parallel treebanks are needed. We have developed an alignment viewer based on SVG (Scalable Vector Graphics). Need to implement: –alignment editor (currently being developed at Stockholm University) –phrase alignment test tool
13 Alignment visualization: Index file in HTML Tree overview  EEDENI AED...[0, 1] Der Garten Eden  lõppude lõpuks pidi miski kunagi tekkima mittemillestki. [1, 2]... schließlich und endlich mußte doch irgendwann irgend etwas aus null und nichts entstanden sein...  Alguses tuli ta koos Jorunniga. [3, 4] Das erste Stück war sie mit Jorunn zusammen gegangen.  Nad olid rääkinud robotitest. [4, 5] Sie hatten sich über Roboter unterhalten.  Jorunn arvas, et inimaju on nagu keerukas elektronarvuti. [5, 6] Jorunn hielt das menschliche Gehirn für einen komplizierten Computer.
14 After a click…a SVG picture
15 Conclusion and perspectives Our first attempt to align the noun phrases in the Estonian-German parallel treebank led to satisfactory results. The results could be improved if –different phrase types would be taken into consideration; –a more exact dictionary could be used; –Estonian syntactic trees would be deepened, making their annotation depth more similar to that of the German trees.