Latent Semantic Analysis:

Latent Semantic Analysis:
Is it a solution to Plato’s problem? [And 10 other questions & answers.]

10 questions How did this paper change our lives?
What is Plato's problem? Oh no! Not more philosophy? How can Plato’s problem be solved? What kind of solution do we need? What is latent semantic analysis? How is an LSA model constructed? How is the LSA model used? What’s a cosine between vectors? What are some cool empirical findings? Is LSA psychologically plausible?

How did this paper change our lives?
Because I saw a talk by Landauer on this work, I became interested in latent semantic analysis [LSA] Because I was interested in LSA, I became interested Curt Burgess's HAL model. Because I was interested in HAL, I decided to come to Edmonton, where Lori Buchanan was working on it Because I came to Edmonton- here I am teaching Psych 357. If Landauer hadn’t written this paper, we probably wouldn’t have the mutual pleasure of knowing each other as we do.

What is Plato's problem? Meno (in the Platonic dialog named after him) asks: How can one ever investigate what one does not know? He saw two problems: i.) How can you propose what you do not know as the object of your search? ii.) How will you recognize what you do not know as the thing you did not know if you do (by chance) find it? More generally, the problem is that there is a gap between what we experience and what we know, with the latter seeming to be larger than the former is able to support.

Oh no! Not more philosophy?
Not at all (indeed, the opposite) Plato's problem is exactly the poverty of the stimulus/failure of induction problem It is thus central to syntactic knowledge as well as to many other dimensions of linguistic knowledge (wherever we make fine-grained untaught distinctions: e.g. prosody, phonology, and semantics).

How can Plato’s problem be solved?
i.) Plato's solution was recollection of knowledge gained in a previous life, famously demonstrated in the Meno by showing that a slave boy 'knows' the Pythagorean Theorem ii.) Some favour the idea of innate knowledge, the modern equivalent of recollection of a previous life The basic common principle is one we already know and love in Psych357: we need some source of strong additional constraints on the problems (= information) to narrow down the size of the search space.

What kind of solution do we need?
That is: What properties are desirable in a scientifically-acceptable explanation of how constraints on a search space operate? i.) They must be sufficient. ii.) They must be well-defined. iii.) They must be psychologically-plausible

What is latent semantic analysis?
LSA is an algorithmically well-defined way of measuring lexical co-occurrence in some set of text The assumption is that co-occurrence says something about semantics: words about the same things are likely to occur in the same contexts. If we have many words and contexts, small differences in co-occurrence probabilities can be compiled together to give information about semantics. Think of 20 questions: No single question might be sufficient to identify an unknown object, but 20 questions usually are sufficient

How is an LSA model constructed?
i.) Build a matrix with rows representing words and columns representing context (a document or word string) ii.) Enter in each cell (= a word X document intersection) a count of many times that word occurred in that document iii.) Transform the matrix

i.) Build a matrix with rows representing words and columns representing context (a document or word string) Sonnets Learn C A day at the zoo … dog zebra computer

ii.) Enter in each cell (= a word X document intersection) a count of many times that word occurred in that document Sonnets Learn C A day at the zoo … dog 6 1 7 zebra 2 46 computer 123

iii.) Transform the matrix
a.) Control for word frequency The log transform compresses the effects of frequency b.) Control for the number of contexts each word appeared in Words that occur in few contexts are more informative about those contexts (= reduce uncertainty about their context more) than words that appear in many different contexts Eg. Knowing the word ‘computer’ was common places more constraints on what the document is about than knowing the word ‘the’ was common

iii.) Transform the matrix
c.) Singular value decomposition This reduces dimensionality by 'projecting' the tens of thousands of context dimensions onto a smaller number (roughly 300). A mathematical projection is roughly the same as real projection: Think of shining a light through a three dimensional pattern and tracing the shadow it casts to get a two-dimensional projection The 'discarded' dimensions are those that are least informative = have low variance = are redundant (e.g. a word like 'the' occurred in every context or a word like 'anti-disestablishmentarianism' occurred in hardly any contexts).

How is the LSA model used?
To get a measure of how related a word is to another word, measure the distance between the columns containing the two words. This gives you a measure of how different the contexts of the two words were: that is, how often a word occurred a different number of times in each context You can also take the distance between two document vectors to get a measure of how related they are. You can measure distance by taking the cosine between two vectors

Huh? What’s a cosine between vectors?
They probably forget to mention in your Grade 9 trigonometry class (as they did in mine) that cosine is extensible to dimensions above 2 Typical teaching: always the special case, never the general. The dot product of two vectors is the sum of the products of corresponding entries in the two vectors: i.e. (x1*x2) + (y1*y2) + (z1*z2), for two vectors of length 3. The dot product of two vectors is the cosine of the angle between those two vectors, multiplied by the lengths of those vectors. Therefore, cosine is dot product divided by divided by the product of the two vector lengths

What are some cool empirical findings?
i.) LSA models can pass the TOEFL ii.) LSA can learn the meanings of words it has never encountered iii.) LSA can explain some priming effects iv.) LSA replicates human number judgments v.) LSA can mark essays vi.) LSA-like measures predict LD RTs

i.) LSA models can pass the TOEFL
On a 4-possibility multiple choice TOEFL, the model got 51.5% correct (corrected for guessing) Chance score is 25% Real foreigners hoping to attend American universities averaged 52.7%

ii.) LSA can learn the meanings of words it had never encountered
So can children! By substituting words with nonsense words and controlling access, they showed that the model could learn the meanings of words it had never encountered This replicated (and explained) an odd result which had been found in human children- and estimated that most word knowledge was inductive rather than direct. The result is not odd when you consider that the meaning of a word is distributed across all vectors with which it shares contexts. You can learn a lot about lions, even if you have never heard of them before, by knowing they are something like tigers.

iii.) LSA can explain some priming effects
The model can explain some priming work using homographs: i.e. testing for 'mole' (the animal) versus 'mole' (the beauty mark). If context is marked by (either phonological or orthographic word form), then these words will indeed get over-lapping contexts even though they are semantically different

iv.) LSA replicates human number judgments
Previous work has shown that judgments about number size are best represented on the assumption that numbers are represented as their log of their values. That is, people ‘scale down’ large numbers LSA got the same representation using their contextual occurrences.

v.) LSA can mark essays LSA judgments of the quality of sentences correlate at r = 0.81 with expert ratings LSA can judge how good an essay (on a well-defined set topic) is by computing the average distance between the essay to be marked and a set of model essays The correlation are equal to between-human correlations “If you wrote a good essay and scrambled the words you would get a good grade," Landauer said. "But try to get the good words without writing a good essay!”

vi.) LSA-like measures predict LD RTs
An LSA-like measure for single words can predict human RTs in lexical decision We used 10 words each side of the target word as a ‘document’ and got distances between all words Words close to their nearest neighbours are recognized more quickly than words far away from them, after controlling for other known variables

Is LSA psychologically plausible?
Well, the above evidence suggests they might be, and is nicely consistent with much of our talk about mapping between schemas Neuro-philosopher Paul Churchland has written: "Explanatory understanding consists of the activation of a specific prototype vector in a well-trained network. It consists in the apprehension of the problematic case as an instance of a general type, a type for which the creature has a detailed and well-informed representation. Such a representation allows the creature to anticipate aspects of the case so far unperceived, and to deploy practical techniques appropriate to the case at hand." Paul Churchland A Neurocomputational Perspective: The Nature Of Mind and The Structure Of Science

Latent Semantic Analysis:

Similar presentations

Presentation on theme: "Latent Semantic Analysis:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Latent Semantic Analysis:

Similar presentations

Presentation on theme: "Latent Semantic Analysis:"— Presentation transcript:

Similar presentations

About project

Feedback