Natural Language Processing

Name: Natural Language Processing
Uploaded: 2017-11-22T10:56:23+00:00
Duration: PTM26S37
Description: Natural Language Processing

Natural Language Processing
CSE 592 Applications of AI Winter 2003 Information Retrieval Speech Recognition Syntactic Parsing Semantic Interpretation

Example Applications Spelling and grammar checkers
Finding information on the WWW Spoken language control systems: banking, shopping Classification systems for messages, articles Machine translation tools

The Dream

Information Retrieval
(Thanks to Adam Carlson)

Motivation and Outline
Background Definitions The Problem 100,000+ pages The Solution Ranking docs Vector space Probabilistic approaches Extensions Relevance feedback, clustering, query expansion, etc.

What is Information Retrieval
Given a large repository of documents, how do I get at the ones that I want Examples: Lexus/Nexus, Medical reports, AltaVista Different from databases Unstructured (or semi-structured) data Information is (typically) text Requests are (typically) word-based

Information Retrieval Task
Start with a set of documents User specifies information need Keyword query, Boolean expression, high-level description System returns a list of documents Ordered according to relevance Known as the ad-hoc retrieval problem

Measuring Performance
tn Precision Proportion of selected items that are correct Recall Proportion of target items that were selected Precision-Recall curve Shows tradeoff fp tp fn System returned these Actual relevant docs Recall Precision

Basic IR System Use word overlap to determine relevance
Word overlap alone is inaccurate Rank documents by similarity to query Computed using Vector Space Model

CSE 473 Artificial Intelligence
2017/4/17 Vector Space Model Represent documents as a matrix Words are rows Documents are columns Cell i,j contains the number of times word i appears in document j Similarity between two documents is the cosine of the angle between the vectors representing those words The vector space model, due to Gerard Salton, is probably the most commonly used representation for information retrieval.

2017/4/17 Vector Space Example a: System and human system engineering testing of EPS b: A survey of user opinion of computer system response time c: The EPS user interface management system d: Human machine interface for ABC computer applications e: Relation of user perceived response time to error measurement f: The generation of random, binary, ordered trees g: The intersection graph of paths in trees h: Graph minors IV: Widths of trees and well-quasi-ordering i: Graph minors: A survey

Vector Space Example cont.
CSE 473 Artificial Intelligence 2017/4/17 Vector Space Example cont. interface user c b system a Transition out: We now return you to our regularly scheduled generals talk

Similarity in Vector Space
Measures word overlap Other metrics exist Normalizes for different length vectors

Answering a Query Using Vector Space
Represent query as vector Compute distances to all documents Rank according to distance Example “computer system”

Common Improvements The vector space model Possible fixes
Doesn’t handle morphology (eat, eats, eating) Favors common terms Possible fixes Stemming Convert each word to a common root form Stop lists Term weighting

Handling Common Terms Stop list Term weighting List of words to ignore
“a”, “and”, “but”, “to”, etc. Term weighting Words which appear everywhere aren’t very good discriminators – give higher weight to rare words

tf * idf

Inverse Document Frequency
IDF provides high values for rare words and low values for common words For a collection of documents

Probabilistic IR Vector space model robust in practice
Mathematically ad-hoc How to generalize to more complex queries? (intel or microsoft) and (not stock) Alternative approach: model problem as finding documents with highest probability of being relevant to the query Requires making some simplifying assumptions about underlying probability distributions In certain cases can be shown to yield same results as vector space model

Probability Ranking Principle
For a given query Q, find the documents D that maximize the odds that the document is relevant (R):

For a given query Q, find the documents D that maximize the odds that the document is relevant (R): Probability of document relevance to any query – i.e., the inherent quality of the document

For a given query Q, find the documents D that maximize the odds that the document is relevant (R): Probability that if document is indeed relevant, then the query is in fact Q But where do we get that number?

Bayesian nets for text retrieval
CSE 473 Artificial Intelligence 2017/4/17 Bayesian nets for text retrieval d1 Documents d2 Document Network w1 w2 w3 Words c1 c2 c3 Concepts Query Network q1 q2 Query operators (AND/OR/NOT) q0 Information need

CSE 473 Artificial Intelligence 2017/4/17 Bayesian nets for text retrieval Computed once for entire collection d1 Documents d2 Document Network w1 w2 w3 Words c1 c2 c3 Concepts Query Network q1 q2 Query operators (AND/OR/NOT) q0 Information need

CSE 473 Artificial Intelligence 2017/4/17 Bayesian nets for text retrieval d1 Documents d2 Document Network w1 w2 w3 Words Computed for each query c1 c2 c3 Concepts Query Network q1 q2 Query operators (AND/OR/NOT) q0 Information need

Conditional Probability Tables
P(d) = prior probability document d is relevant Uniform model: P(d) = 1 / Number docs In general, document quality P(r | d) P(w | d) = probability that a random word from document d is w Term frequency P(c | w) = probability that a given document word w has same meaning as a query word c Thesarus P(q | c1, c2, …) = canonical form of operators AND, OR, NOT, etc.

2017/4/17 Example Hamlet Macbeth Document Network reason trouble double reason trouble two Query Network OR NOT AND User Query

Details Set head q0 of user query to “true”
Compute posterior probability P(D | q0) “User information need” doesn’t have to be a query - can be a user profile, e.g., other documents user has read Instead of just words, can include phrases, inter-document links Link matrices can be modified over time. User feedback The promise of “personalization”

Extensions Meet demands of web-based systems
Modified ranking functions for the web Relevance feedback Query expansion Document clustering Latent Semantic Indexing Other IR tasks

IR on the Web Query AltaVista with “Java” Avoiding latency Solution
Almost 107 pages found Avoiding latency User wants (initial) results fast Solution Rank documents using word-overlap Use special data structure - inverted index

Improved Ranking on the Web
Not just arbitrary documents Can use HTML tags and other properties Query term in <TITLE></TITLE> Query term in <IMG>, <HREF>, etc. tag Check date of document (prefer recent docs) PageRank (Google)

PageRank Idea: Good pages link to other good pages
Round 1: count in-links Problems? Round 2: sum weighted in-links Round 3: and again, and again… Implementation: Repeated random walk on snapshot of the web weight  frequency visited

Relevance Feedback System returns initial set of documents
User identifies relevant documents System refines query to get documents more like those identified by user Add words common to relevant docs Reposition query vector closer to relevant docs Lather, rinse, repeat…

Query Expansion Given query, add words to improve recall Example
Workaround for synonym problem Example boat ® boat OR ship Can involve user feedback or not Can use thesaurus or other online source WordNet

Document Clustering Group similar documents
Similar means “close in vector space” If a document is relevant, return whole cluster Can be combined with relevance feedback GROUPER

Clustering Algorithms
K-means Hierarchical Agglomerative Clustering Initialize k cluster centers Loop Assign all document to closest center Move cluster centers to better fit assignment Until little movement Clusters Cluster centers Initialize each document to a singleton cluster Loop Merge two closest clusters Until k clusters exist Many ways to measure distance between clusters

Latent Semantic Indexing
Creates modified vector space Captures transitive co-occurrence information If docs A & B don’t share any words, with each other, but both share lots of words with doc C, then A & B will be considered similar Simulates query expansion and document clustering (sort of)

Variations on a Theme Text Categorization Routing & Filtering
Assign each document to a category Example: automatically put web pages in Yahoo hierarchy Routing & Filtering Match documents with users Example: news service that allows subscribers to specify “send news about high-tech mergers”

Speech Recognition TO BE COMPLETED

Syntactic Parsing Semantic Interpretation
TO BE COMPLETED

NLP Research Areas Morphology: structure of words
Syntactic interpretation (parsing): create a parse tree of a sentence. Semantic interpretation: translate a sentence into the representation language. Pragmatic interpretation: incorporate current situation into account. Disambiguation: there may be several interpretations. Choose the most probable

Some Difficult Examples
From the newspapers: Squad helps dog bite victim. Helicopter powered by human flies. Levy won’t hurt the poor. Once-sagging cloth diaper industry saved by full dumps. Ambiguities: Lexical: meanings of ‘hot’, ‘back’. Syntactic: I heard the music in my room. Referential: The cat ate the mouse. It was ugly.

Parsing Context-free grammars: (2 + X) * (17 + Y) is in the grammar.
EXPR -> NUMBER EXPR -> VARIABLE EXPR -> (EXPR + EXPR) EXPR -> (EXPR * EXPR) (2 + X) * (17 + Y) is in the grammar. (2 + (X)) is not. Why do we call them context-free?

Using CFG’s for Parsing
Can natural language syntax be captured using a context-free grammar? Yes, no, sort of, for the most part, maybe. Words: nouns, adjectives, verbs, adverbs. Determiners: the, a, this, that Quantifiers: all, some, none Prepositions: in, onto, by, through Connectives: and, or, but, while. Words combine together into phrases: NP, VP

An Example Grammar S -> NP VP VP -> V NP NP -> NAME
NP -> ART N ART -> a | the V -> ate | saw N -> cat | mouse NAME -> Sue | Tom

Example Parse The mouse saw Sue.

“Sue bought the cat biscuits”
Ambiguity “Sue bought the cat biscuits” S -> NP VP VP -> V NP VP -> V NP NP NP -> N NP -> N N NP -> Det NP Det -> the V -> ate | saw | bought N -> cat | mouse |biscuits | Sue | Tom

Example: Chart Parsing
Three main data structures: a chart, a key list, and a set of edges Chart: Name of terminal or non-terminal 4 length 3 2 1 1 2 3 4 Starting points

Key List and Edges Key list: Push down stack of chart entries
“the” “box” “floats” Edges: rules that can be applied to chart entries to build up larger entries 4 length 3 box 2 floats 1 the 1 2 3 4 detthe o

Chart Parsing Algorithm
Loop while entries in key list 1. Remove entry from key list 2. If entry already in chart, Add edge list Break 3. Add entry from key list to chart 4. For all rules that begin with entry’s type, add an edge for that rule 5. For all edges that need the entry next, add an extended edge (see algorithm on right) 6. If the edge is finished, add an entry to the key list with type, start point, length, and edge list To extend an edge with chart entry c Create a new edge e’ Set start (e’) to start (e) Set end(e’) to end(e) Set rule(e’) to rule(e) with “o” moved beyond c. Set the righthandside(e’) to the righthandside(e)+c

Try it S -> NP VP VP -> V NP -> Det N Det -> the
N -> box V -> floats

Semantic Interpretation
Our goal: to translate sentences into a logical form. But: sentences convey more than true/false: It will rain in Seattle tomorrow. Will it rain in Seattle tomorrow? A sentence can be analyzed by: propositional content, and speech act: tell, ask, request, deny, suggest

Propositional Content
We develop a logic-like language for representing propositional content: Word-sense ambiguity Scope ambiguity Proper names --> objects (John, Alon) Nouns --> unary predicates (woman, house) Verbs --> transitive: binary predicates (find, go) intransitive: unary predicates (laugh, cry) Quantifiers: most, some Example: Mary: Loves(John, Mary)

From Syntax to Semantics
ADD SLIDES ON SEMANTIC INTERPRETATION

Word Sense Disambiguation
ADD SLIDES!

Statistical NLP Consider the problem of tagging part-of-speech:
“The box floats” “The”  Det; “Box”  N; “Floats”  V; Given a sentence w(1,n), where w(i) is the i-th word, we want to find tags t(i) assigned to each word w(i)

Example “The box floats”: given a corpus (a training set)
Assignment one: T(1)=det, T(2) = V, T(3)=V P(V|det) is rather low, so is P(V|V). Thus is less likely compared to Assignment two: T(t)=det, T(2) = N; t(3) = V P(N|det) is high, and P(V|N) is high, thus is more likely! In general, can use Hidden Markov Models to find probabilities floats the box V det N

Experiments Charniak and Colleagues did some experiments on a collection of documents called the “Brown Corpus”, where tags are assigned by hand. 90% of the corpus are used for training and the other 10% for testing They show they can get 95% correctness with HMM’s. A really simple algorithm: assign t to w by the highest probability tag P(t|w)  91% correctness!

Natural Language Summary
Parsing: context free grammars with features. Semantic interpretation: Translate sentences into logic-like language Use additional domain knowledge for word-sense disambiguation. Use context to disambiguate references.

Natural Language Processing

Similar presentations

Presentation on theme: "Natural Language Processing"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Natural Language Processing

Similar presentations

Presentation on theme: "Natural Language Processing"— Presentation transcript:

Similar presentations

About project

Feedback