(Some issues in) Text Ranking. Recall General Framework Crawl – Use XML structure – Follow links to get new pages Retrieve relevant documents – Today.

Slides:



Advertisements
Similar presentations
Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.
Advertisements

Chapter 5: Introduction to Information Retrieval
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Learning HMM parameters
Part of Speech Tagging The DT students NN went VB to P class NN Plays VB NN well ADV NN with P others NN DT Fruit NN flies NN VB NN VB like VB P VB a DT.
Bag-of-Words Methods for Text Mining CSCI-GA.2590 – Lecture 2A
Introduction to Hidden Markov Models
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Ch 9. Markov Models 고려대학교 자연어처리연구실 한 경 수
Statistical NLP: Lecture 11
Hidden Markov Models Theory By Johan Walters (SR 2003)
Statistical NLP: Hidden Markov Models Updated 8/12/2005.
1 Hidden Markov Models (HMMs) Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic.
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Albert Gatt Corpora and Statistical Methods Lecture 8.
Tagging with Hidden Markov Models. Viterbi Algorithm. Forward-backward algorithm Reading: Chap 6, Jurafsky & Martin Instructor: Paul Tarau, based on Rada.
. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
1 Ranked Queries over sources with Boolean Query Interfaces without Ranking Support Vagelis Hristidis, Florida International University Yuheng Hu, Arizona.
Modeling Modern Information Retrieval
Link Analysis, PageRank and Search Engines on the Web
Forward-backward algorithm LING 572 Fei Xia 02/23/06.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
. Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss.
Markov Models. Markov Chain A sequence of states: X 1, X 2, X 3, … Usually over time The transition from X t-1 to X t depends only on X t-1 (Markov Property).
Retrieval Models II Vector Space, Probabilistic.  Allan, Ballesteros, Croft, and/or Turtle Properties of Inner Product The inner product is unbounded.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
Fall 2001 EE669: Natural Language Processing 1 Lecture 9: Hidden Markov Models (HMMs) (Chapter 9 of Manning and Schutze) Dr. Mary P. Harper ECE, Purdue.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Information Retrieval
CS246 Basic Information Retrieval. Today’s Topic  Basic Information Retrieval (IR)  Bag of words assumption  Boolean Model  Inverted index  Vector-space.
Chapter 5: Information Retrieval and Web Search
Albert Gatt Corpora and Statistical Methods Lecture 9.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Personalisation Seminar on Unlocking the Secrets of the Past: Text Mining for Historical Documents Sven Steudter.
Text Models. Why? To “understand” text To assist in text search & ranking For autocompletion Part of Speech Tagging.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
CSE 6331 © Leonidas Fegaras Information Retrieval 1 Information Retrieval and Web Search Engines Leonidas Fegaras.
CS315 – Link Analysis Three generations of Search Engines Anchor text Link analysis for ranking Pagerank HITS.
1 CS 430: Information Discovery Lecture 9 Term Weighting and Ranking.
Some Probability Theory and Computational models A short overview.
Chapter 6: Information Retrieval and Web Search
Text Models Continued HMM and PCFGs. Recap So far we have discussed 2 different models for text – Bag of Words (BOW) where we introduced TF-IDF Location.
Sequence Models With slides by me, Joshua Goodman, Fei Xia.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
Ranking in Information Retrieval Systems Prepared by: Mariam John CSE /23/2006.
Web Search. Crawling Start from some root site e.g., Yahoo directories. Traverse the HREF links. Search(initialLink) fringe.Insert( initialLink ); loop.
Comparing and Ranking Documents Once our search engine has retrieved a set of documents, we may want to Rank them by relevance –Which are the best fit.
Chapter 23: Probabilistic Language Models April 13, 2004.
Hidden Markov Models & POS Tagging Corpora and Statistical Methods Lecture 9.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
Bag-of-Words Methods for Text Mining CSCI-GA.2590 – Lecture 2A Ralph Grishman NYU.
Vector Space Models.
CPSC 422, Lecture 15Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15 Oct, 14, 2015.
Probability and Time. Overview  Modelling Evolving Worlds with Dynamic Baysian Networks  Simplifying Assumptions Stationary Processes, Markov Assumption.
Stochastic Methods for NLP Probabilistic Context-Free Parsers Probabilistic Lexicalized Context-Free Parsers Hidden Markov Models – Viterbi Algorithm Statistical.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,
Hidden Markov Models HMM Hassanin M. Al-Barhamtoshy
Hidden Markov Models BMI/CS 576
IST 516 Fall 2011 Dongwon Lee, Ph.D.
Hidden Markov Models Part 2: Algorithms
Chapter 5: Information Retrieval and Web Search
CS246: Information Retrieval
Information Retrieval and Web Design
Presentation transcript:

(Some issues in) Text Ranking

Recall General Framework Crawl – Use XML structure – Follow links to get new pages Retrieve relevant documents – Today Rank – PageRank, HITS – Rank Aggregation

Relevant documents Usually: relevant with respect to a keyword, set of keywords, logical expression.. Closely related to ranking – “How” relevant is it can be considered another measure Usually done as a separate step – Recall the Online vs. offline issue.. But some techniques are reusable

Defining Relevant Documents Common strategy: treat text documents as “bag of words” (BOW) – Denote BOW(D) for a document D – Bag rather than set (i.e. multiplicity is kept) – Words are typically stemmed Reduced to root form – Loses structure, but simplifies life Simple definition: – A document D is relevant to a keyword W if W is in BOW(D)

Cont. Simple variant – The level of relevance of D to W is the multiplicity of W in BOW(D) – Problem: Bias towards long documents – So divide by the document length |BOW(D)| – This is called term frequency (TF)

A different angle Given a document D, what are the “most important” words in D? Clearly high term frequency should be considered Rank terms according to TF?

Ranking according to TF A 2022 Is 1023 He 350. Liverpool 25 Beatles 12

IDF Observation: if w is rare in the documents set, but appears many times in a document D, then w is “important” for D IDF(w) = log(|Docs| / |Docs’|) – Docs is the set of all documents in the corpus, Docs’ is the subset of documents that contain w TFIDF(D,W)=TF(W,D)*IDF(W) – “Correlation” of D and W

Inverted Index For every term we keep a list of all documents in which it appears The list is sorted by TFIDF scores Scores are also kept Given a keyword it is then easy to give the top-k

Ranking Now assume that these documents are web pages How do we return the most relevant? How do we combine with other rankings? (e.g. PR?) How do we answer boolean queries? – X1 AND (X2 OR X3)

Rank Aggregation To combine TFIDF, PageRank.. To combine TFIDF with respect to different keywords

Part-of-Speech Tagging So far we have considered documents only as bags-of-words Computationally efficient, easy to program, BUT We lost the structure that may be very important: – E.g. perhaps we are interested (more) in documents for which W is often the sentence subject? Part-of-speech tagging – Useful for ranking – For machine translation – Word-Sense Disambiguation – …

Part-of-Speech Tagging Tag this word. This word is a tag. He dogs like a flea The can is in the fridge The sailor dogs me every day

A Learning Problem Training set: tagged corpus – Most famous is the Brown Corpus with about 1M words – The goal is to learn a model from the training set, and then perform tagging of untagged text – Performance tested on a test-set

Simple Algorithm Assign to each word its most popular tag in the training set Problem: Ignores context Dogs, tag will always be tagged as a noun… Can will be tagged as a verb Still, achieves around 80% correctness for real-life test-sets – Goes up to as high as 90% when combined with some simple rules

(HMM) Hidden Markov Model Model: sentences are generated by a probabilistic process In particular, a Markov Chain whose states correspond to Parts-of-Speech Transitions are probabilistic In each state a word is outputted – The output word is again chosen probabilistically based on the state

HMM HMM is: – A set of N states – A set of M symbols (words) – A matrix NXN of transition probabilities Ptrans – A vector of size N of initial state probabilities Pstart – A matrix NXM of emissions probabilities Pout “Hidden” because we see only the outputs, not the sequence of states traversed

Example

3 Fundamental Problems 1) Compute the probability of a given observation Sequence (=sentence) 2) Given an observation sequence, find the most likely hidden state sequence This is tagging 3) Given a training set find the model that would make the observations most likely

Tagging Find the most likely sequence of states that led to an observed output sequence Problem: exponentially many possible sequences!

Viterbi Algorithm Dynamic Programming V t,k is the probability of the most probable state sequence – Generating the first t + 1 observations (X0,..Xt) – And terminating at state k

Viterbi Algorithm Dynamic Programming V t,k is the probability of the most probable state sequence – Generating the first t + 1 observations (X0,..Xt) – And terminating at state k V 0,k = Pstart(k)*Pout(k,X 0 ) V t,k = Pout(k,X t )*max{V t-1k’ *Ptrans(k’,k)}

Finding the path Note that we are interested in the most likely path, not only in its probability So we need to keep track at each point of the argmax – Combine them to form a sequence What about top-k?

Complexity O(T*|S|^2) Where T is the sequence (=sentence) length, |S| is the number of states (= number of possible tags)

Computing the probability of a sequence Forward probabilities: α t (k) is the probability of seeing the sequence X 1 …X t and terminating at state k Backward probabilities: β t (k) is the probability of seeing the sequence X t+1 …X n given that the Markov process is at state k at time t.

Computing the probabilities Forward algorithm α 0 (k)= Pstart(k)*Pout(k,X 0 ) α t (k)= Pout(k,X t )*Σ k’ {α t-1k’ *Ptrans(k’,k)} P(O 1,… On )= Σ k α n (k) Backward algorithm β t (k) = P(O t+1 …O n | state at time t is k) β t (k) = Σ k’ {Ptrans(k,k’)* Pout(k’,X t+1 )* β t+1 (k’)} β n (k) = 1 for all k P(O)= Σ k β 0 (k)* Pstart(k)

Learning the HMM probabilities Expectation-Maximization Algorithm 1.Start with initial probabilities 2.Compute Eij the expected number of transitions from i to j while generating a sequence, for each i,j (see next) 3.Set the probability of transition from i to j to be Eij/ (Σ k Eik) 4. Similarly for omission probability 5. Repeat 2-4 using the new model, until convergence

Estimating the expectancies By sampling – Re-run a random a execution of the model 100 times – Count transitions By analysis – Use Bayes rule on the formula for sequence probability – Called the Forward-backward algorithm

Accuracy Tested experimentally Exceeds 96% for the Brown corpus – Trained on half and tested on the other half Compare with the 80-90% by the trivial algorithm The hard cases are few but are very hard..

NLTK Natrual Language ToolKit Open source python modules for NLP tasks – Including stemming, POS tagging and much more