Presentation is loading. Please wait.

Presentation is loading. Please wait.

How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley.

Similar presentations


Presentation on theme: "How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley."— Presentation transcript:

1 How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley

2 Everybody loves word embeddings few most that the a each this every [Collobert 2011] [Collobert 2011, Mikolov 2013, Freitag 2004, Schuetze 1995, Turian 2010]

3 What might embeddings bring? Cathleen complained about the magazine’s shoddy editorial quality. Mary executive average

4 Today’s question Can word embeddings trained with surface context improve a state-of-the-art constituency parser? (no)

5 Embeddings and parsing Pre-trained word embeddings are useful for a variety of NLP tasks Can they improve a constituency parser? – (not very much) [Cite XX, Cite XX, Cite XX]

6 Three hypotheses Vocabulary expansion (good for OOV words) Statistic pooling (good for medium-frequency words) Embedding structure (good for features) Cathleen Mary average editorial executive transitivity tense

7 Vocabulary expansion: Embeddings help handling of out-of-vocabulary words Cathleen Mary

8 Vocabulary expansion John Mary Pierre yellow enormous hungry Cathleen

9 Vocabulary expansion John Mary Pierre yellow enormous hungry Cathleen complained about the magazine’s shoddy editorial quality. Cathleen Mary

10 Vocab. expansion results Baseline +OOV

11 Vocab. expansion results Baseline +OOV

12 Vocab. expansion results Baseline +OOV (300 sentences)

13 Statistic pooling hypothesis: Embeddings help handling of medium-frequency words average editorial executive

14 Statistic pooling executive kind giant editorial average {NN, JJ} {NN} {NN, JJ} {JJ} {NN}

15 Statistic pooling executive kind giant editorial average {NN, JJ} {JJ, NN} {NN, JJ}

16 Statistic pooling executive kind giant editorial average {NN, JJ} {NN} {NN, JJ} {JJ} {NN} editorial NN editorial NN

17 Statistic pooling results Baseline +Pooling

18 Vocab. expansion results Baseline +Pooling (300 sentences)

19 Embedding structure hypothesis: The organization of the embedding space directly encodes useful features transitivity tense

20 Embedding structure vanished dined vanishing dining devoured assassinated devouring assassinating “transitivity” “tense” dined VBD [Huang 2011]

21 Embedding structure vanished dined vanishing dining devoured assassinated devouring assassinating “transitivity” “tense” dined VBD [Huang XX]

22 Embedding structure results Baseline +Features

23 Embedding structure results Baseline +Features (300 sentences)

24 To summarize (300 sentences)

25 Combined results Baseline +OOV +Pooling

26 Vocab. expansion results Baseline (300 sentences) +OOV +Pooling

27 What about… Domain adaptation? (no significant gain) French? (no significant gain) Other kinds of embeddings? (no significant gain)

28 Why didn’t it work? Context clues often provide enough information to reason around words with incomplete / incorrect statistics Parser already has a robust OOV, small count models Sometimes “help” from embeddings is worse than nothing: bifurcate Soap homered Paschi tuning unrecognized

29 What about other parsers? Dependency parsers (continuous repr. as syntactic abstraction) Neural networks (continuous repr. as structural requirement) [Henderson 2004, Socher 2013] [Henderson 2004, Socher 2013, Koo 2008, Bansal 2014]

30 What didn’t we try? Hard clustering (some evidence that this is useful for morphologically rich languages) A nonlinear feature-based model Embeddings in higher constituents (e.g. in a CRF parser) [Candito 09]

31 Conclusion Embeddings provide no apparent benefit to state-of-the-art parser for: – OOV handling – Parameter pooling – Lexicon features Code online at http://cs.berkeley.edu/~jda


Download ppt "How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley."

Similar presentations


Ads by Google