Presentation is loading. Please wait.

Presentation is loading. Please wait.

Natural Language Processing

Similar presentations


Presentation on theme: "Natural Language Processing"— Presentation transcript:

1 Natural Language Processing
CSE 592 Applications of AI Winter 2003 Information Retrieval Speech Recognition Syntactic Parsing Semantic Interpretation

2 Example Applications Spelling and grammar checkers
Finding information on the WWW Spoken language control systems: banking, shopping Classification systems for messages, articles Machine translation tools

3 The Dream

4 Information Retrieval
(Thanks to Adam Carlson)

5 Motivation and Outline
Background Definitions The Problem 100,000+ pages The Solution Ranking docs Vector space Probabilistic approaches Extensions Relevance feedback, clustering, query expansion, etc.

6 What is Information Retrieval
Given a large repository of documents, how do I get at the ones that I want Examples: Lexus/Nexus, Medical reports, AltaVista Different from databases Unstructured (or semi-structured) data Information is (typically) text Requests are (typically) word-based

7 Information Retrieval Task
Start with a set of documents User specifies information need Keyword query, Boolean expression, high-level description System returns a list of documents Ordered according to relevance Known as the ad-hoc retrieval problem

8 Measuring Performance
tn Precision Proportion of selected items that are correct Recall Proportion of target items that were selected Precision-Recall curve Shows tradeoff fp tp fn System returned these Actual relevant docs Recall Precision

9 Basic IR System Use word overlap to determine relevance
Word overlap alone is inaccurate Rank documents by similarity to query Computed using Vector Space Model

10 CSE 473 Artificial Intelligence
2017/4/17 Vector Space Model Represent documents as a matrix Words are rows Documents are columns Cell i,j contains the number of times word i appears in document j Similarity between two documents is the cosine of the angle between the vectors representing those words The vector space model, due to Gerard Salton, is probably the most commonly used representation for information retrieval.

11 CSE 473 Artificial Intelligence
2017/4/17 Vector Space Example a: System and human system engineering testing of EPS b: A survey of user opinion of computer system response time c: The EPS user interface management system d: Human machine interface for ABC computer applications e: Relation of user perceived response time to error measurement f: The generation of random, binary, ordered trees g: The intersection graph of paths in trees h: Graph minors IV: Widths of trees and well-quasi-ordering i: Graph minors: A survey

12 Vector Space Example cont.
CSE 473 Artificial Intelligence 2017/4/17 Vector Space Example cont. interface user c b system a Transition out: We now return you to our regularly scheduled generals talk

13 Similarity in Vector Space
Measures word overlap Other metrics exist Normalizes for different length vectors

14 Answering a Query Using Vector Space
Represent query as vector Compute distances to all documents Rank according to distance Example “computer system”

15 Common Improvements The vector space model Possible fixes
Doesn’t handle morphology (eat, eats, eating) Favors common terms Possible fixes Stemming Convert each word to a common root form Stop lists Term weighting

16 Handling Common Terms Stop list Term weighting List of words to ignore
“a”, “and”, “but”, “to”, etc. Term weighting Words which appear everywhere aren’t very good discriminators – give higher weight to rare words

17 tf * idf

18 Inverse Document Frequency
IDF provides high values for rare words and low values for common words For a collection of documents

19 Probabilistic IR Vector space model robust in practice
Mathematically ad-hoc How to generalize to more complex queries? (intel or microsoft) and (not stock) Alternative approach: model problem as finding documents with highest probability of being relevant to the query Requires making some simplifying assumptions about underlying probability distributions In certain cases can be shown to yield same results as vector space model

20 Probability Ranking Principle
For a given query Q, find the documents D that maximize the odds that the document is relevant (R):

21 Probability Ranking Principle
For a given query Q, find the documents D that maximize the odds that the document is relevant (R): Probability of document relevance to any query – i.e., the inherent quality of the document

22 Probability Ranking Principle
For a given query Q, find the documents D that maximize the odds that the document is relevant (R): Probability that if document is indeed relevant, then the query is in fact Q But where do we get that number?

23 Bayesian nets for text retrieval
CSE 473 Artificial Intelligence 2017/4/17 Bayesian nets for text retrieval d1 Documents d2 Document Network w1 w2 w3 Words c1 c2 c3 Concepts Query Network q1 q2 Query operators (AND/OR/NOT) q0 Information need

24 Bayesian nets for text retrieval
CSE 473 Artificial Intelligence 2017/4/17 Bayesian nets for text retrieval Computed once for entire collection d1 Documents d2 Document Network w1 w2 w3 Words c1 c2 c3 Concepts Query Network q1 q2 Query operators (AND/OR/NOT) q0 Information need

25 Bayesian nets for text retrieval
CSE 473 Artificial Intelligence 2017/4/17 Bayesian nets for text retrieval d1 Documents d2 Document Network w1 w2 w3 Words Computed for each query c1 c2 c3 Concepts Query Network q1 q2 Query operators (AND/OR/NOT) q0 Information need

26 Conditional Probability Tables
P(d) = prior probability document d is relevant Uniform model: P(d) = 1 / Number docs In general, document quality P(r | d) P(w | d) = probability that a random word from document d is w Term frequency P(c | w) = probability that a given document word w has same meaning as a query word c Thesarus P(q | c1, c2, …) = canonical form of operators AND, OR, NOT, etc.

27 CSE 473 Artificial Intelligence
2017/4/17 Example Hamlet Macbeth Document Network reason trouble double reason trouble two Query Network OR NOT AND User Query

28 Details Set head q0 of user query to “true”
Compute posterior probability P(D | q0) “User information need” doesn’t have to be a query - can be a user profile, e.g., other documents user has read Instead of just words, can include phrases, inter-document links Link matrices can be modified over time. User feedback The promise of “personalization”

29 Extensions Meet demands of web-based systems
Modified ranking functions for the web Relevance feedback Query expansion Document clustering Latent Semantic Indexing Other IR tasks

30 IR on the Web Query AltaVista with “Java” Avoiding latency Solution
Almost 107 pages found Avoiding latency User wants (initial) results fast Solution Rank documents using word-overlap Use special data structure - inverted index

31 Improved Ranking on the Web
Not just arbitrary documents Can use HTML tags and other properties Query term in <TITLE></TITLE> Query term in <IMG>, <HREF>, etc. tag Check date of document (prefer recent docs) PageRank (Google)

32 PageRank Idea: Good pages link to other good pages
Round 1: count in-links Problems? Round 2: sum weighted in-links Round 3: and again, and again… Implementation: Repeated random walk on snapshot of the web weight  frequency visited

33 Relevance Feedback System returns initial set of documents
User identifies relevant documents System refines query to get documents more like those identified by user Add words common to relevant docs Reposition query vector closer to relevant docs Lather, rinse, repeat…

34 Query Expansion Given query, add words to improve recall Example
Workaround for synonym problem Example boat ® boat OR ship Can involve user feedback or not Can use thesaurus or other online source WordNet

35 Document Clustering Group similar documents
Similar means “close in vector space” If a document is relevant, return whole cluster Can be combined with relevance feedback GROUPER

36 Clustering Algorithms
K-means Hierarchical Agglomerative Clustering Initialize k cluster centers Loop Assign all document to closest center Move cluster centers to better fit assignment Until little movement Clusters Cluster centers Initialize each document to a singleton cluster Loop Merge two closest clusters Until k clusters exist Many ways to measure distance between clusters

37 Latent Semantic Indexing
Creates modified vector space Captures transitive co-occurrence information If docs A & B don’t share any words, with each other, but both share lots of words with doc C, then A & B will be considered similar Simulates query expansion and document clustering (sort of)

38 Variations on a Theme Text Categorization Routing & Filtering
Assign each document to a category Example: automatically put web pages in Yahoo hierarchy Routing & Filtering Match documents with users Example: news service that allows subscribers to specify “send news about high-tech mergers”

39 Speech Recognition TO BE COMPLETED

40 Syntactic Parsing Semantic Interpretation
TO BE COMPLETED

41 NLP Research Areas Morphology: structure of words
Syntactic interpretation (parsing): create a parse tree of a sentence. Semantic interpretation: translate a sentence into the representation language. Pragmatic interpretation: incorporate current situation into account. Disambiguation: there may be several interpretations. Choose the most probable

42 Some Difficult Examples
From the newspapers: Squad helps dog bite victim. Helicopter powered by human flies. Levy won’t hurt the poor. Once-sagging cloth diaper industry saved by full dumps. Ambiguities: Lexical: meanings of ‘hot’, ‘back’. Syntactic: I heard the music in my room. Referential: The cat ate the mouse. It was ugly.

43 Parsing Context-free grammars: (2 + X) * (17 + Y) is in the grammar.
EXPR -> NUMBER EXPR -> VARIABLE EXPR -> (EXPR + EXPR) EXPR -> (EXPR * EXPR) (2 + X) * (17 + Y) is in the grammar. (2 + (X)) is not. Why do we call them context-free?

44 Using CFG’s for Parsing
Can natural language syntax be captured using a context-free grammar? Yes, no, sort of, for the most part, maybe. Words: nouns, adjectives, verbs, adverbs. Determiners: the, a, this, that Quantifiers: all, some, none Prepositions: in, onto, by, through Connectives: and, or, but, while. Words combine together into phrases: NP, VP

45 An Example Grammar S -> NP VP VP -> V NP NP -> NAME
NP -> ART N ART -> a | the V -> ate | saw N -> cat | mouse NAME -> Sue | Tom

46 Example Parse The mouse saw Sue.

47 “Sue bought the cat biscuits”
Ambiguity “Sue bought the cat biscuits” S -> NP VP VP -> V NP VP -> V NP NP NP -> N NP -> N N NP -> Det NP Det -> the V -> ate | saw | bought N -> cat | mouse |biscuits | Sue | Tom

48 Example: Chart Parsing
Three main data structures: a chart, a key list, and a set of edges Chart: Name of terminal or non-terminal 4 length 3 2 1 1 2 3 4 Starting points

49 Key List and Edges Key list: Push down stack of chart entries
“the” “box” “floats” Edges: rules that can be applied to chart entries to build up larger entries 4 length 3 box 2 floats 1 the 1 2 3 4 detthe o

50 Chart Parsing Algorithm
Loop while entries in key list 1. Remove entry from key list 2. If entry already in chart, Add edge list Break 3. Add entry from key list to chart 4. For all rules that begin with entry’s type, add an edge for that rule 5. For all edges that need the entry next, add an extended edge (see algorithm on right) 6. If the edge is finished, add an entry to the key list with type, start point, length, and edge list To extend an edge with chart entry c Create a new edge e’ Set start (e’) to start (e) Set end(e’) to end(e) Set rule(e’) to rule(e) with “o” moved beyond c. Set the righthandside(e’) to the righthandside(e)+c

51 Try it S -> NP VP VP -> V NP -> Det N Det -> the
N -> box V -> floats

52 Semantic Interpretation
Our goal: to translate sentences into a logical form. But: sentences convey more than true/false: It will rain in Seattle tomorrow. Will it rain in Seattle tomorrow? A sentence can be analyzed by: propositional content, and speech act: tell, ask, request, deny, suggest

53 Propositional Content
We develop a logic-like language for representing propositional content: Word-sense ambiguity Scope ambiguity Proper names --> objects (John, Alon) Nouns --> unary predicates (woman, house) Verbs --> transitive: binary predicates (find, go) intransitive: unary predicates (laugh, cry) Quantifiers: most, some Example: Mary: Loves(John, Mary)

54 From Syntax to Semantics
ADD SLIDES ON SEMANTIC INTERPRETATION

55 Word Sense Disambiguation
ADD SLIDES!

56 Statistical NLP Consider the problem of tagging part-of-speech:
“The box floats” “The”  Det; “Box”  N; “Floats”  V; Given a sentence w(1,n), where w(i) is the i-th word, we want to find tags t(i) assigned to each word w(i)

57 The Equations Find the t(1,n) that maximizes Assume that
P[t(1,n)|w(1,n)]=P[w(1,n)|t(1,n)]/P(w(1,n)) So, only need to maximize P[w(1,n)|t(1,n)] Assume that A word depends only on previous tag A tag depends only on previous tag We have: P[w(j)|w(1,j-1),t(1,j)]=P[w(j)|t(j)], and P[t(j)|w(1,j-1),t(1,j-1)] = P(t(j)|t(j-1)] Thus, want to maximize P[w(n)|t(n-1)]*P[t(n+1)|t(n)]*P[w(n-1)|t(n-2)]*P[t(n)|t(n-1)]…

58 Example “The box floats”: given a corpus (a training set)
Assignment one: T(1)=det, T(2) = V, T(3)=V P(V|det) is rather low, so is P(V|V). Thus is less likely compared to Assignment two: T(t)=det, T(2) = N; t(3) = V P(N|det) is high, and P(V|N) is high, thus is more likely! In general, can use Hidden Markov Models to find probabilities floats the box V det N

59 Experiments Charniak and Colleagues did some experiments on a collection of documents called the “Brown Corpus”, where tags are assigned by hand. 90% of the corpus are used for training and the other 10% for testing They show they can get 95% correctness with HMM’s. A really simple algorithm: assign t to w by the highest probability tag P(t|w)  91% correctness!

60 Natural Language Summary
Parsing: context free grammars with features. Semantic interpretation: Translate sentences into logic-like language Use additional domain knowledge for word-sense disambiguation. Use context to disambiguate references.


Download ppt "Natural Language Processing"

Similar presentations


Ads by Google