Download presentation
Presentation is loading. Please wait.
2
Towards Understanding
Christopher Manning Stanford University Good evening. I’d like to tell you about what computers can now do with material in human languages. We’re finding good ways to get computers to understand human language texts, and a key part of that is having computers knowing enough about the world. Indeed, these two things are related because texts are knowledge. Have a good first sentence! Trinity College Library, Dublin. The long room. Towards understanding
3
1980s Natural Language Processing
VP →{ V (NP:(↑ OBJ)=↓ (NP:(↑ OBJ2)=↓) ) (XP:(↑ XCOMP)=↓) VP VP)}. salmon N SALMON) (↑ PERSON)=3 { (↑ NUM)=SG|(↑ NUM)=PL}. What didn’t work Trying to teach a computer all the rules of grammar Why? Because language is too ambiguous and context dependent
4
Learning language P(late|a, year) = 0.0087 P(NN|DT, a, project) = 0.9
WRB VBZ DT NN VB TO VB DT How does a project get to be a NN JJ : CD NN IN DT NN year late ? … One day at a time . P(late|a, year) = P(NN|DT, a, project) = 0.9 The computer will also learn about the language by reading a lot of text, perhaps annotated with labels/supervision by humans. Statistical NLP gave soft, usage based models of language, and led into the technologies which we use today for speech recognition, spelling correction, question answering, and machine translation. I got into statistical natural language processing. Wrote the first decent book. Made me well-known. But although these models had continuous variables in the probabilities, all the models were built over these categorical, symbolic representations of words, part of speech tags, and other categories.
5
The traditional word representation
motel [ ] Dimensionality: 50K (speech/PTB) – 500K (big vocab) – 13M (Google 1T) motel [ ] AND hotel [ ] = 0
6
Word distributions word representations
Through corpus linguistics, large chunks the study of language and linguistics. The field of linguistics is concerned Written like a linguistics text book Phonology is the branch of linguistics that 0.286 0.792 −0.177 −0.107 0.109 −0.542 0.349 0.271 0.487 linguistics = Word distributions are used to collect lexical statistics and to build super-useful word representations Vector space semantics has just been mega-useful. Up to now, better lexical semantics has clearly given us more operational value than trying to model first order logic in NLP. Meaning representations are at the other end of the spectrum from signal processing, but they share with signals that human beings have at best fairly weak ideas about how to represent the useful information in them. [Bengio et al. 2003, Collobert & Weston 2008, Turian 2010, Mikolov 2013, etc.]
7
Encoding meaning in vector differences [Pennington et al
Encoding meaning in vector differences [Pennington et al., to appear 2014] Crucial insight: Ratios of co-occurrence probabilities can encode meaning components x = solid x = water large x = gas small x = random small large ~1 large small
8
Encoding meaning in vector differences [Pennington et al
Encoding meaning in vector differences [Pennington et al., to appear 2014] Crucial insight: Ratios of co-occurrence probabilities can encode meaning components x = solid x = water 1.9 x 10-4 x = gas x = fashion 6.6 x 10-5 3.0 x 10-3 1.7 x 10-5 2.2 x 10-5 7.8 x 10-4 2.2 x 10-3 1.8 x 10-5 1.36 0.96 8.9 8.5 x 10-2
9
GloVe: A new model for learning word representations [Pennington et al
GloVe: A new model for learning word representations [Pennington et al., to appear 2014] Q: How can we capture this in a word vector space? A: Log-bilinear model Fast training Scalable to huge corpora Good performance even with small corpus, and small vectors
10
Word similarities Nearest words to frog: litoria leptodactylidae rana
eleutherodactylus 1. frogs 2. toad 3. litoria 4. leptodactylidae 5. rana 6. lizard 7. eleutherodactylus
11
Word analogy task [Mikolov, Yih & Zweig 2013a]
Model Dimensions Corpus size Performance (Syn + Sem) CBOW (Mikolov et al. 2013b) 300 1.6 billion 36.1 1000 6 billion 63.7 SVD 7.3 SVD log scaled 60.1 GloVe (this work) 71.7 Capturing attributes of meaning through a word analogy task
12
Machine translation with bilingual neural language models [Devlin et al., ACL 2014]
The low dimensionality distributed representation lets you condition on many things at once. You just can’t
13
Machine translation with bilingual neural language models [Devlin et al., ACL 2014]
NIST 2012 Open MT Arabic Results NNJM on best system NNJM on “Baseline” Arabic “Baseline Hiero” 43.4 + NNJM 49.7 Arabic 1st Place (BBN) 49.5 2nd Place 47.5 … 9th Place 44.0 10th Place 41.2 Arabic Previous best BBN system 49.8 + NNJM 52.8 The low dimensionality distributed representation lets you condition on many things at once. You just can’t + 3.0 BLEU + 6.3 BLEU “Baseline Hiero” Features: (1) Rule probs, (2) lexical smoothing, (3) KN LM, (4) word penalty, (5) concat penalty
14
Sentence structure: Dependency parsing
15
Universal Stanford Dependencies [de Marneffe et al., LREC 2014]
A common dependency representation and label set applicable across languages –
16
Sentence structure: Dependency parsing
17
Deep Learning Dependency Parser [Chen & Manning, forthcoming 2014]
Transition-based, that is shift-reduce dependency parser.
18
Deep Learning Dependency Parser [Chen & Manning, forthcoming 2014]
Parser type Parser LAS (Label & Attach) Sentences / sec Transition-based MaltParser (stackproj) 86.9 469 Our parser 89.6 654 Graph-based MSTParser 87.6 10 TurboParser (full) 89.7 8
19
Grounding language meaning with images [Socher, Karpathy, Le, Manning & Ng, TACL 2014]
Idea: Map sentences and images into a joint space
20
Example dependency tree and image
21
Evaluation Data of [Rashtchian, Young, Hodosh & Hockenmaier 2010]
1000 images, 5 descriptions each; used as 800 train, 100 dev, 100 test
22
Results for image search
Model Mean rank Random 52.1 Recurrent NN 19.2 Constituency Tree Recursive NN 16.1 kCCA 15.9 Bag of Words 14.6 Dependency Tree Recursive NN 12.5 Lower is better!
23
How to represent the meaning of texts [Le and Mikolov, ICML 2014, Paragraph Vector]
Much work in deep learning has favored flat representations or uniform convolutional mappings. This has good results, slightly beating our results on compositional sentiment. I’ve been interested in the compositional structure of language. This has many uses in meaning understanding
24
Political Ideology Detection Using Recursive Neural Networks [Iyyer, Enns, Boyd-Graber & Resnik 2014] Positive versus negative discourse
26
Extracting Semantic Relationships [Socher, Huval, Manning & Ng, EMNLP 2012]
My [apartment]e1 has a pretty large [kitchen] e2 component-whole relationship (e2,e1) Use neural nets over structures to extract semantic relationships. useful for information extraction and thesaurus construction applications
27
We need to unlock the knowledge that is language by doing interpretation of language.
A central part of this is being able to see meaning relationships. We enable this with distributed representations showing meaning components. And we enable this with compositional representations of how meaning are combined. These representations are best when they make the meaning relationships as manifest as possible. It’s an exciting time now in NLP, because we have good tools and exciting new methods to take the next steps towards natural language understanding and knowledge.
28
When you move to single big vector to word relationships, much more powerful.
29
Image credits and permissions
Slide 2 and 27: Trinity College Dublin library Free use picture from Slide 3 From a paper of the author (Manning 1992, Romance is so complex) Slides 4: PR2 robot reading CC BY-SA 3.0 by Troy Straszheim from
30
Image credits and permissions
Slide 10 images from Wikipedia (i) Public domainhttp://en.wikipedia.org/wiki/Litoria#mediaviewer/File:Caerulea_cropped(2).jpg (ii) Brian Gratwick (iii) Grand-Duc, Niabot (iv) Public domain Slides 19, 20, 21, 25 Images from the PASCAL Visual Object Challenge 2008 data set
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.