Presentation is loading. Please wait.

Presentation is loading. Please wait.

Exploiting Reducibility in Unsupervised Dependency Parsing David Mareček and Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University.

Similar presentations


Presentation on theme: "Exploiting Reducibility in Unsupervised Dependency Parsing David Mareček and Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University."— Presentation transcript:

1 Exploiting Reducibility in Unsupervised Dependency Parsing David Mareček and Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University in Prague EMNLP conference July 12, 2012, Jeju Island, Korea

2 Outline Unsupervised Dependency Parsing  Motivations Reducibility  What is reducibility?  Computing reducibility scores Employing reducibility in unsupervised dependency parsing  Dependency model  Inference: Gibbs sampling of projective dependency trees  Results

3 Motivations for Unsupervised dependeny parsing Parsing without using aby treebank or any language-specific rules For under-resourced languages or domains?  Every new treebank is expensive and time-consuming  However, semi-supervised methods are probably more useful than completely unsupervised methods Universality across languages  Linguistic theory independent parser  Treebanks differ in a ways of capturing various linguistic phenomena Unsupervised parser might find more suitable structures than we have in treebanks  It might work better in final applications (MT, QA,... )  GIZA++ is also unsupervised and it is universal and widely used  Dependency parsing is similar to the word alignment task

4 REDUCIBILITY

5 Reducibility Definition: A word (or a sequence of words) is reducible if we can remove it from the sentence without violating the grammaticality of the rest of the sentence. Some conference participants missed the last bus yesterday. Some participants missed the last bus yesterday. Some conference participants the last bus yesterday. REDUCIBLENOT REDUCIBLE

6 Hypothesis If a word (or sequence of words) is reducible in a particular sentence, it is a leaf (or a subtree) in the dependency structure. Someconference participants missed thelast busyesterday

7  It mostly holds across languages  Problems occur mainly with function words PREPOSITIONAL PHRASES: They are at the conference. DETERMINERS: I am in the pub. AUXILIARY VERBS: I have been sitting there. Let’s try to recognize reducibile words automatically... Hypothesis If a word (or sequence of words) is reducible in a particular sentence, it is a leaf (or a subtree) in the dependency structure.

8 Recognition of reducible words We remove the word from the sentence. But how can we automatically recognize whether the rest of the sentence is grammatical or not?  Hardly... (we don’t have any grammar yet) If we have a large corpus, we can search for the needed sentence.  it is in the corpus -> it is (possibly) grammatical  it is not in the corpus -> we do not know We will find only a few words reducible...  very low recall

9 Other possibilities? Could we take a smaller context than the whole sentence?  Does not work at all for free word-order languages. Why don’t use part-of-speech tags instead of words?  DT NN VBS IN DT NN.  DT NN VBS DT NN. ... but the preposition IN should not be reducible Solution:  We use a very sparse reducible words in the corpus for estimating “reducibility scores” for PoS tags (or PoS tag sequence)

10 Computing reducibility scores For each possible PoS unigram, bigram and trigram:  Find all its occurrences in the corpus  For each such occurence, remove the respective words and search for the rest of the sentence in the corpus.  If it occurs at least once elsewhere in the corpus, the occurence is proclaimed as reducible. Reducibility of PoS n-gram = relative number of reducible occurences PRP VBD PRP IN DT NN. I saw her. She was sitting on the balcony and wearing a blue dress. I saw her in the theater. PRP VBD VBG IN DT NN CC VBG DT JJ NN. PRP VBD PRP. R(“IN DT NN”) = 1 2

11 Computing reducibility scores r(g)... number of reducible occurences c(g)... number of all the occurences For each possible PoS unigram, bigram and trigram:  Find all its occurrences in the corpus  For each such occurence, remove the respective words and search for the rest of the sentence in the corpus.  If it occurs at least once elsewhere in the corpus, the occurence is proclaimed as reducible. Reducibility of PoS n-gram = relative number of reducible occurences

12 Examples of reducibility scores Reducibility scores of the English PoS tags  induced from the English Wikipedia corpus

13 DEPENDENCY TREE MODEL

14 Dependency tree model Consists of four submodels  edge model  fertility model  distance model  reducibility model Simplification  we use only PoS tags, we don’t use word forms  we induce projective trees only

15 Edge model P(dependent tag | edge direction, parent tag)  “Rich get richer” principle on dependency edges

16 Fertility model P(number left and right children | parent tag)  “Rich get richer” principle

17 Distance model Longer edges are less probable.

18 Reducibility model Probability of a subtree is proportinal to its reducibility score.

19 Probability of treebank The probability of the whole treebank, which we want to maximize  Multiplication over all models and words in the corpus

20 Gibbs sampling – bracketing notation Each projective dependency tree can be expressed by a unique bracketing.  Each bracket pair belongs to one node and delimits its descendants from the rest of the sentence.  Each bracketed segment contains just one word that is not embedded deeper; this node is the head of the segment. rootNNIN VB NNDT JJRB (((DT) NN) VB (RB) (IN ((DT) (JJ) NN)))

21 Gibbs sampling – small change Choose one non-root node and remove its bracket Add another bracket which does not violate the projectivity ( ((DT) NN) VB (RB) IN ((DT) (JJ) NN))( ) (IN ((DT) (JJ) NN)) ((RB) IN ((DT) (JJ) NN)) ((RB) IN) (((DT) NN) VB (RB)) (((DT) NN) VB) (VB (RB)) (VB) 0.0012 0.0009 0.0011 0.0023 0.0018 0.0004 0.0016 (IN) 0.0006

22 Gibbs sampling - decoding After 200 iterations  We run MST algorithm  Edge weights = occurrences of individual edges in the treebank during the last 100 sampling iterations  The output trees may be possibly non-projective

23 Evaluation CoNLL 2006/2007 test data  all the sentences (all lengths)  punctuation was removed before the evaluation  directed attachment score Wikipedia corpus for estimating reducibility scores  85 mil. tokens for English ...  3 mil. tokens for Japanese Impact of the reducibility model ReducibilityEnglishGermanCzech  30.726.222.0 46.836.547.2

24 Results Directed attachment scores on CoNLL 2006/2007 test data  Spitkovsky 2012 vs. Mareček 2012 CoNLLSpi 2012Mar 2012 Arabic 0610.926.5 Arabic 0744.927.9 Basque 0733.326.8 Bulgarian 0765.246.0 Catalan 0762.147.0 Chinese 0663.2- Chinese 0757.0- Czech 0655.149.5 Czech 0754.248.0 Danish 0622.238.6 Dutch 0646.644.2 English 0729.649.2 CoNLLSpi 2012Mar 2012 German 0639.144.8 Greek 0626.920.2 Hungarian 0758.251.8 Italian 0740.743.3 Japanese 0622.750.8 Portuguese 0672.450.6 Slovenian 0635.218.1 Spanish 0628.251.9 Swedish 0650.748.2 Turkish 0634.4- Turkish 0744.815.7 Average:42.940.0

25 Conclusions I have introduced reducibility feature, which is useful in unsupervised dependency parsing The reducibility scores for individual PoS tag n-grams are computed on a large corpus and then used in the induction algorithm on a smaller corpus State-of-the-art?  It might have been in January 2012 Future work:  Employ lexicalized models  Improve reducibility – another dealing with function words

26 Thank you for your attention.


Download ppt "Exploiting Reducibility in Unsupervised Dependency Parsing David Mareček and Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University."

Similar presentations


Ads by Google