Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tolerant Retrieval Review Questions

Similar presentations


Presentation on theme: "Tolerant Retrieval Review Questions"— Presentation transcript:

1 Tolerant Retrieval Review Questions

2 Storing a Rotated Lexicon
Suggest a structure that could be used to store a rotated lexicon Supposing that you had W words Average word length of n Longest word of length x How much space would the rotated lexicon require?

3 Storing a Digram Index Suggest a structure that could be used to store a digram index Supposing that you had W words Average word length of n Longest word of length x How much space would the digram index require?

4 Questions Suppose that: What is the precision? Recall?
there are 1000 documents 50 documents are relevant to the query 30 query results are returned, including 20 relevant documents What is the precision? Recall? How can perfect precision be achieved? How can perfect recall be achieved? Using these scores, how can search engine quality be automatically assessed?

5 Edit Distance What is the edit distance between hello and yelp, assuming a unit cost function? What is the edit distance if the cost of insert is 1, the cost of delete is 1, and the cost of rename is 3?

6 Jaccard Coefficient Given sets S of size n and T of size m, with S∩T of size k, what is the Jaccard coefficient of S and T? Compute the Jaccard coefficient of the bigrams of believe beleive If the edit distance of words s and t is 1, what is the maximum/minimum size of the Jaccard coefficient of the bigrams of s and t?

7 Spelling Correction Suppose the user typed the words “plane piot”
Piot is a real word (Peter Piot was under Secretary-General of the United Nations) possible corrections (as determined by your dictionary) are pivot and pilot the probability of deleting a “v” immediately after an “i” is 0.02 and the probability of deleting a “l” immediately after an “i” is 0.01, the probability of correctly typing a word is 0.9 there are 1000 words in the corpus the word “piot” appears once, “pivot” appears twice, “pilot” appears 10 times and “plane” appears 20 times the phrase “plane pilot” appears 9 times, “plane pivot” and “plane piot” do not appear at all What is the best spelling correction when using an interpolation of bigram and unigram models, choosing  = 0.5


Download ppt "Tolerant Retrieval Review Questions"

Similar presentations


Ads by Google