Tolerant Retrieval Review Questions

Slides:



Advertisements
Similar presentations
General Purpose Packages
Advertisements

Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Spelling Correction for Search Engine Queries Bruno Martins, Mario J. Silva In Proceedings of EsTAL-04, España for Natural Language Processing Presenter:
Introduction to Information Retrieval
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Exercising these ideas  You have a description of each item in a small collection. (30 web sites)  Assume we are looking for information about boxers,
Inverted Index Hongning Wang
Introduction to Information Retrieval Introduction to Information Retrieval Adapted from Christopher Manning and Prabhakar Raghavan Tolerant Retrieval.
Evaluating Search Engine
1 Language Model CSC4170 Web Intelligence and Social Computing Tutorial 8 Tutor: Tom Chao Zhou
PROBLEM BEING ATTEMPTED Privacy -Enhancing Personalized Web Search Based on:  User's Existing Private Data Browsing History s Recent Documents 
Modern Information Retrieval
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.
KnowItNow: Fast, Scalable Information Extraction from the Web Michael J. Cafarella, Doug Downey, Stephen Soderland, Oren Etzioni.
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Retrieval Evaluation: Precision and Recall. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity.
Information Retrieval Ch Information retrieval Goal: Finding documents Search engines on the world wide web IR system characters Document collection.
Evaluating the Performance of IR Sytems
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Introduction to Language Models Evaluation in information retrieval Lecture 4.
Automatic Indexing (Term Selection) Automatic Text Processing by G. Salton, Chap 9, Addison-Wesley, 1989.
CS246 Basic Information Retrieval. Today’s Topic  Basic Information Retrieval (IR)  Bag of words assumption  Boolean Model  Inverted index  Vector-space.
LIS618 lecture 11 i/r performance evaluation Thomas Krichel
General Purpose Packages Word Processors. Lesson Objectives You will know Different Features of word processing software: –Formatting –Standard Paragraphs.
Text Analysis Everything Data CompSci Spring 2014.
Word Lesson 17 Customizing Settings Microsoft Office 2010 Advanced Cable / Morrison 1.
The Development of a search engine & Comparison according to algorithms Sungsoo Kim Haebeom Lee The mid-term progress report.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
| 1 › Gertjan van Noord2014 Zoekmachines Lecture 3: tolerant retrieval.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Term Frequency. Term frequency Two factors: – A term that appears just once in a document is probably not as significant as a term that appears a number.
Information retrieval 1 Boolean retrieval. Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text)
LIS618 lecture 3 Thomas Krichel Structure of talk Document Preprocessing Basic ingredients of query languages Retrieval performance evaluation.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented.
Spelling correction. Spell correction Two principal uses Correcting document(s) being indexed Correcting user queries to retrieve “right” answers Two.
Chapter 23: Probabilistic Language Models April 13, 2004.
Performance Measurement. 2 Testing Environment.
Ravello, Settembre 2003Indexing Structures for Approximate String Matching Alessandra Gabriele Filippo Mignosi Antonio Restivo Marinella Sciortino.
What Does the User Really Want ? Relevance, Precision and Recall.
Generating Query Substitutions Alicia Wood. What is the problem to be solved?
Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.
Search Engines WS 2009 / 2010 Prof. Dr. Hannah Bast Chair of Algorithms and Data Structures Department of Computer Science University of Freiburg Lecture.
Information Retrieval Quality of a Search Engine.
GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
Information Retrieval Lecture 3 Introduction to Information Retrieval (Manning et al. 2007) Chapter 8 For the MSc Computer Science Programme Dell Zhang.
INFORMATION RETRIEVAL MEASUREMENT OF RELEVANCE EFFECTIVENESS 1Adrienn Skrop.
Knowledge and Information Retrieval Dr Nicholas Gibbins 32/4037.
Tolerant Retrieval Some of these slides are based on Stanford IR Course slides at 1.
Spelling correction. Spell correction Two principal uses Correcting document(s) being indexed Retrieve matching documents when query contains a spelling.
Language Identification and Part-of-Speech Tagging
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
INTRODUCTORY MICROSOFT WORD Lesson 3 – Helpful Word Features
Evaluation of IR Systems
Information Retrieval and Web Search
IST 516 Fall 2011 Dongwon Lee, Ph.D.
Do-Gil Lee1*, Ilhwan Kim1 and Seok Kee Lee2
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Multimedia Information Retrieval
Query Languages.
Language Models for Information Retrieval
Basic Information Retrieval
Multimedia Information Retrieval
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INF 141: Information Retrieval
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Presentation transcript:

Tolerant Retrieval Review Questions

Storing a Rotated Lexicon Suggest a structure that could be used to store a rotated lexicon Supposing that you had W words Average word length of n Longest word of length x How much space would the rotated lexicon require?

Storing a Digram Index Suggest a structure that could be used to store a digram index Supposing that you had W words Average word length of n Longest word of length x How much space would the digram index require?

Questions Suppose that: What is the precision? Recall? there are 1000 documents 50 documents are relevant to the query 30 query results are returned, including 20 relevant documents What is the precision? Recall? How can perfect precision be achieved? How can perfect recall be achieved? Using these scores, how can search engine quality be automatically assessed?

Edit Distance What is the edit distance between hello and yelp, assuming a unit cost function? What is the edit distance if the cost of insert is 1, the cost of delete is 1, and the cost of rename is 3?

Jaccard Coefficient Given sets S of size n and T of size m, with S∩T of size k, what is the Jaccard coefficient of S and T? Compute the Jaccard coefficient of the bigrams of believe beleive If the edit distance of words s and t is 1, what is the maximum/minimum size of the Jaccard coefficient of the bigrams of s and t?

Spelling Correction Suppose the user typed the words “plane piot” Piot is a real word (Peter Piot was under Secretary-General of the United Nations) possible corrections (as determined by your dictionary) are pivot and pilot the probability of deleting a “v” immediately after an “i” is 0.02 and the probability of deleting a “l” immediately after an “i” is 0.01, the probability of correctly typing a word is 0.9 there are 1000 words in the corpus the word “piot” appears once, “pivot” appears twice, “pilot” appears 10 times and “plane” appears 20 times the phrase “plane pilot” appears 9 times, “plane pivot” and “plane piot” do not appear at all What is the best spelling correction when using an interpolation of bigram and unigram models, choosing  = 0.5