Presentation is loading. Please wait.

Presentation is loading. Please wait.

Soft Cross-lingual Syntax Projection for Dependency Parsing

Similar presentations


Presentation on theme: "Soft Cross-lingual Syntax Projection for Dependency Parsing"— Presentation transcript:

1 Soft Cross-lingual Syntax Projection for Dependency Parsing
Zhenghua Li, Min Zhang, Wenliang Chen {zhli13, minzhang, Soochow University, China Good afternoon, everyone. I am Zhenghua Li. We come from Soochow University, China.

2 Dependency parsing A bilingual example I1 eat2 the3 fish4 with5 a6
folk7 root $0 subj pmod obj det 我1 用2 叉子3 吃4 鱼5 vv Dependency parsing captures the syntax of a sentence by a set of bilexical dependencies. eat fish

3 Big picture (semi-supervised)
English Treebank Chinese Treebank Bitext English Parser I love this game 我 爱 这 运动 Larger training data This is the big picture Belongs to semi-supervised parsing The idea is to get more high-quality labeled data for Chinese from bilingual texts by projecting English parse trees into Chinese, And then combine the manually labeled data with the automatically collected data to train new parsing models, In order to improve state-of-the-art Chinese dependency parsers. Chinese labeled data with partial tree Project English parse trees into Chinese

4 Syntax projection I1 eat2 the3 fish4 with5 a6 folk7 $0 我1 用2 叉子3 吃4 鱼5
What is syntax projection? Simply speaking, syntax projection means project a syntactic tree from one language to the other one on bilingual text, with the help of word alignments. For dependency strucutres, one basic operation is to project the dependencies from one language to the other eat fish

5 Challenges Syntactic non-isomorphism across languages
Different annotation choices (guideline) Partial (incomplete) parse trees resulted from projection Parsing errors on the source side Word alignment errors However, there exist a few obstacles for this research line First, syntactic non-isomorphism across languages is the most severe and challenging problem, especially when the source language and the target language belong to different language families, such as English and Chinese. The second challenge is that even for the same language, there are many different annotation choices to make during treebanking. One typical example is the noun coordination phrase. Should the first conjuncting noun or the last noun be the head of the whole phrase Where should the conjunct word be attached? Third, if we project a tree of the source-language sentence into the target-language sentence via word alignments, we usually get a partial tree. Then, the problem is how to make use of the target-language sentences with partial trees. Also, the parsing errors on the source side and the word alignment errors make the bilingual constraints and projected structures contain a lot of noises, and therefore become unreliable

6 Cross-language non-isomorphism
eat2 the3 fish4 with5 a6 folk7 $0 我1 用2 叉子3 吃4 鱼5 use (verb) Here is an example of cross-language non-isomorphism In the English side, the preposition phrase “with a folk” modifies the verb eat, However, in the Chinese side, “with” is translated to a verb, and the verb becomes the root word of the sentence. And eat actually modifies the word in the Chinese parse tree. Therefore, the dependency relation is opposite for the corresponding two word in the two languages. Actually, there are many different kinds of such phenomena. I use fork to eat fish eat

7 Different annotation choices
Coordination structure as an example fish and bird fish and bird For one language, the same syntactic structures can be annotated differently, which is up to the annotation guideline maker. Here is an example of coordination structure. We can see that for this simple noun coordination phrases, there are five different annotation styles. Therefore, different languages may adopt different annotation guidelines for the same syntactic structures, Thus also leading to wrong projections. fish and bird fish and bird fish and bird

8 Challenges All these factors can lead to bad projections!
Syntactic non-isomorphism across languages Different annotation choices (guideline) Partial (incomplete) parse trees resulted from projection Parsing errors on the source side Word alignment errors All these factors can lead to bad projections!

9 Why called soft projection
Project less but reliable dependencies, put quality before quantity Careful/gentle/conservative projection Wrong projection -> training noise The idea is soft projection, which means we only project less but reliable dependency Can also be understood as careful or gentle projection We don’t want wrong projections, since they will become noise during training.

10 Big picture (semi-supervised)
English Treebank Chinese Treebank Bitext English Parser I love this game 我 爱 这 运动 Chinese Parser Larger training data filtering But how to decide which projections are bad? The idea is to use a baseline Chinese parser to verify each projected dependencies, and filter obviously bad dependencies based on the marginal probabilities. Chinese labeled data with partial trees Project English parse trees into Chinese

11 Step 1: word alignment and English parsing on bitext
Treebank Bitext English Parser I love this game 我 爱 这 运动 In detail, our method works in four steps. Step 1 $0 I1 eat2 the3 fish4 with5 a6 folk7 $0 我1 用2 叉子3 吃4 鱼5

12 Step 2: project English tree into Chinese (direct correspondence assumption)
Treebank Bitext English Parser I love this game 我 爱 这 运动 Here multilingual DP means tries to improve target-language dependency parsing by making use of bilingual constraints or Source-language resources or parsers. There are a few motivations behind this research line. First, a difficult syntactic ambiguity in one language, … For example, PP attachment problem is very challenging in English; however, in Chinese, it is quite easy to attach a Preposition phrase to the correct position. Second, a more … For example, English dep parsers can achieve 90% on UAS, whereas Chinese parsers can only achieve 80%. Such accuracy gap may be due to the intrinsic difficulty of languages or scale of available labeled data. Therefore, it is reasonable to suppose that the English parser can help Chinese parser on bilingual text. Third, when the target language has no labeled training data, it is a natural and interesting idea to transfer labeled resources in source language in order to build the target-language parser. Intrinsic complexity of languages (Chinese vs. English) Scale of labeled resources More slides Give a few examples For example, English <-> Chinese PP attachment, PP boundary for Chinese Chinese DE structure, attribute clause Chinese labeled data with partial tree Project English parse trees into Chinese

13 Step 2: project English tree into Chinese (direct correspondence assumption)
$0 I1 eat2 the3 fish4 with5 a6 folk7 $0 我1 用2 叉子3 吃4 鱼5 Here multilingual DP means tries to improve target-language dependency parsing by making use of bilingual constraints or Source-language resources or parsers. There are a few motivations behind this research line. First, a difficult syntactic ambiguity in one language, … For example, PP attachment problem is very challenging in English; however, in Chinese, it is quite easy to attach a Preposition phrase to the correct position. Second, a more … For example, English dep parsers can achieve 90% on UAS, whereas Chinese parsers can only achieve 80%. Such accuracy gap may be due to the intrinsic difficulty of languages or scale of available labeled data. Therefore, it is reasonable to suppose that the English parser can help Chinese parser on bilingual text. Third, when the target language has no labeled training data, it is a natural and interesting idea to transfer labeled resources in source language in order to build the target-language parser. Intrinsic complexity of languages (Chinese vs. English) Scale of labeled resources More slides Give a few examples For example, English <-> Chinese PP attachment, PP boundary for Chinese Chinese DE structure, attribute clause

14 Step 3: filter projected structures with baseline Chinese Parser
English Treebank Chinese Treebank Bitext English Parser I love this game 我 爱 这 运动 Chinese Parser filtering Chinese labeled data with partial tree Project English parse trees into Chinese

15 Relationship between prob and accuracy
Here multilingual DP means tries to improve target-language dependency parsing by making use of bilingual constraints or Source-language resources or parsers. There are a few motivations behind this research line. First, a difficult syntactic ambiguity in one language, … For example, PP attachment problem is very challenging in English; however, in Chinese, it is quite easy to attach a Preposition phrase to the correct position. Second, a more … For example, English dep parsers can achieve 90% on UAS, whereas Chinese parsers can only achieve 80%. Such accuracy gap may be due to the intrinsic difficulty of languages or scale of available labeled data. Therefore, it is reasonable to suppose that the English parser can help Chinese parser on bilingual text. Third, when the target language has no labeled training data, it is a natural and interesting idea to transfer labeled resources in source language in order to build the target-language parser. Intrinsic complexity of languages (Chinese vs. English) Scale of labeled resources More slides Give a few examples For example, English <-> Chinese PP attachment, PP boundary for Chinese Chinese DE structure, attribute clause

16 Step 3: filter projected structures with baseline Chinese Parser
eat2 the3 fish4 with5 a6 folk7 $0 我1 用2 叉子3 吃4 鱼5 According to the annotation guideline, if a sentence has two predicates, then the first one should be the head. use eat prob=0.01 Chinese Parser

17 Step 3: filter projected structures with baseline Chinese Parser
$0 I1 eat2 the3 fish4 with5 a6 folk7 use $0 我1 用2 叉子3 吃4 鱼5 Similarly, we filter out two bad projected dependencies, marked with grey color (based on some threshold) Actually, the blue dependency is also wrong. However, since its prob is higher than the threshold, the dependency will survive. This bad dependency will certainly influence the training process. We also propose a simple strategy to alleviate such noise, which will be discussed later. 0.01 eat 0.2 0.02

18 Step 3: filter projected structures with baseline Chinese Parser
$0 I1 eat2 the3 fish4 with5 a6 folk7 use $0 我1 用2 叉子3 吃4 鱼5 Here is the structure after filtering Partial tree eat

19 Step 4: combine the data to train a new Chinese Parser
English Treebank Chinese Treebank Bitext English Parser I love this game 我 爱 这 运动 Chinese Parser Larger training data filtering Combine, larger training data To train new parser Chinese labeled data with partial tree Project English parse trees into Chinese

20 How to handle data with partial tree annotation
Convert partial tree annotation into forest annotation (ambiguous labelings) For an unattached word, add links from all other words to it. ` $0 我1 用2 叉子3 吃4 鱼5 use eat

21 How to handle data with partial tree annotation
Maximize the mixed likelihood of manually labeled data with tree annotation and auto- collected data with forest annotation Tree annotation can be understood as a special case of forest annotation How to train a parser using data with forest annotation?

22 Train with ambiguous labelings
Refer to Tackstrom+ 13 and several earlier papers Maximize the likelihood of the data Maximize the probability of a forest Maximize the sum probability of all the trees in the forest The training problem can be solved with the inside-outside algorithm

23 Experiments Data statistics Parser
Second-order dependency parser (McDonald & Pereira 06) (CRF-based, probabilistic) SGD training (20K + 1M training data)

24 Relationship between prob and accuracy
Here multilingual DP means tries to improve target-language dependency parsing by making use of bilingual constraints or Source-language resources or parsers. There are a few motivations behind this research line. First, a difficult syntactic ambiguity in one language, … For example, PP attachment problem is very challenging in English; however, in Chinese, it is quite easy to attach a Preposition phrase to the correct position. Second, a more … For example, English dep parsers can achieve 90% on UAS, whereas Chinese parsers can only achieve 80%. Such accuracy gap may be due to the intrinsic difficulty of languages or scale of available labeled data. Therefore, it is reasonable to suppose that the English parser can help Chinese parser on bilingual text. Third, when the target language has no labeled training data, it is a natural and interesting idea to transfer labeled resources in source language in order to build the target-language parser. Intrinsic complexity of languages (Chinese vs. English) Scale of labeled resources More slides Give a few examples For example, English <-> Chinese PP attachment, PP boundary for Chinese Chinese DE structure, attribute clause

25 Effect of filtering threshold
Proj ratio: 44% 31% 26% This figure shows the effect of using different filtering threshold the number in parenthesis, we can ignore the second parameter 1.0 and we also show the projection ratio. When 0.0, means no filtering, which is also called DCA, the ratio is the highest, but performance is very bad We adopt 0.1 Projection ratio means the proportion of words that get head words after projection.

26 Supplement the projected structures with baseline Chinese parser
Even after filtering, the projected structures may still contain wrong dependencies Use the baseline Chinese Parser to add more high- prob dependencies (multiple heads for a single word, decrease potential noise)

27 Supplement the projected structures with baseline Chinese parser
$0 I1 eat2 the3 fish4 with5 a6 folk7 $0 我1 用2 叉子3 吃4 鱼5 0.2 use eat

28 Supplement the projected structures with baseline Chinese parser
$0 I1 eat2 the3 fish4 with5 a6 folk7 $0 我1 用2 叉子3 吃4 鱼5 0.7 0.2 use eat

29 Effect of supplement threshold
1.0 means no supplement

30 Effect of supplement threshold

31 Effect of supplement threshold
The supplement process makes the training curve more stable

32 Final results on CTB5 test
DCA is bad

33 Comparison with (Jiang+ 10) on CTB5X test
Baseline much stronger More improvement significant

34 Recent works on multilingual dependency parsing
Semi-supervised Bilingual word reordering info (Huang & Sagae 09) Project to build a local classifier (Jiang & Liu 10) Unsupervised Projection (Ganchev+ 09) Delexicalized (McDonald+ 11; Tackstrom+ 12, 13) Hybrid (McDonald+ 11; Ma & Xia 14)

35 Conclusions We propose a simple semi-supervised framework to derive high-quality labeled training data from bitext Use target-language marginal probabilities to control the quality of the projected structures (quite simple and effective) Use forest based training method to make use of partial annotations (a very general framework)

36 Future directions Project more dependencies from source- language parse trees? When two target-langauge words align to the same source-langauge word? More complex correspondences between source- target trees? So far, only consider direct projection, considering single dependencies in the source language, what about considering more complex structures in the source language Do some structural transformation We have tried to automatically collect the structural correspondences between subtree structures in the two langauges based on POS tags, then use such patterns to project more dependencies into the target langauge, h

37 Future directions More elegant ways to handle
word alignment errors (word alignment prob?) source-language parsing errors (parsing prob?) cross-lingual non-isomorphism (very difficult!) annotation guideline differences Universal dependency parsing? (earlier invited talk by Prof. Nivre) Joint word alignment and bilingual dependency parsing? handle all of the above issues in a unified framework

38 Thanks for your time! Questions?

39 Build local classifiers via projection (Jiang & Liu 10)
Semi-supervised; project edges Step 1: projection to obtain dependency/non-dependency classification instances Step 2: build a target-language local dependency/non- dependency classifier Step 3: feed the outputs of the classifier into a supervised parser as extra weights during test phase. Jiang and Liu, 2010 propose a projection method. First, they project the source-language parse tree into the target language to obtain some dependency/non-dependency instances Second, they build ... Finally, they integrate the outputs of the classifier into a supervised target-language parser as extra weights during test phase. In this way, the target-language parser is enhanced to achieve higher accuracy. The example show that projection process. The blue arcs are dependency instances, and the red arcs are non-dependency instances. They only use such dependency/non-dependency instances to build a local classifier. Therefore, they avoid the partial tree problem P [bib]: Kai Liu; Yajuan Lü; Wenbin Jiang; Qun LiuBilingually-Guided Monolingual Dependency Grammar Induction Simplify their work and give a simple example

40 Supplement the projected structures with baseline Chinese parser
If: a word obtain a head from projection (also survives from filtering) and the baseline Chinese parser suggests another high-prob candidate head Then: insert the head candidate into the projected structure. OK, let me summarize part D. In the part, we focus on the idea of using source-language treebank and/or bitext to help target-language parsing. We distinguish the methods from different perspectives. Semi-supervised or unsupervised Projection vs delexicalized method Project hard edges, transfer edge weights, or use bilingual constraints

41 Multilingual dependency parsing becomes a hot topic
Pioneered by Hwa+ 05 Motivations A more accurate parser on one language may help a less accurate one on another language (this paper) A difficult syntactic ambiguity in one language may be easy to resolve in another language Rich labeled resources in one language can be transferred to build parsers of another language (unsupervised) This work belongs to a broader research line, namely multilingual dependency parsing. The goal is to improve target-language dependency parsing by making use of bilingual constraints or Source-language resources. There are a few motivations behind this research line. First, a difficult syntactic ambiguity in one language, … For example, PP attachment problem is very challenging in English; however, in Chinese, it is quite easy to attach a Preposition phrase to the correct position. Second, a more … For example, English dep parsers can achieve 90% on UAS, whereas Chinese parsers can only achieve 80%. Such accuracy gap may be due to the intrinsic difficulty of languages or scale of available labeled data. Therefore, it is reasonable to suppose that the English parser can help Chinese parser on bilingual text. Third, when the target language has no labeled training data, it is a natural and interesting idea to transfer labeled resources in source language in order to build the target-language parser. Intrinsic complexity of languages (Chinese vs. English) Scale of labeled resources More slides Give a few examples


Download ppt "Soft Cross-lingual Syntax Projection for Dependency Parsing"

Similar presentations


Ads by Google