Information Retrieval Review LBSC 796/INFM 718R 1
Structure of IR Systems IR process model System architecture Information needs Visceral, conscious, formalized, compromised Utility vs. relevance Known item vs. ad hoc search
Supporting the Search Process Source Selection Query Formulation IR System Search Query Selection Ranked List Indexing Index Examination Document Acquisition Collection Delivery Document
Relevance Relevance relates a topic and a document Duplicates are equally relevant, by definition Constant over time and across users Pertinence relates a task and a document Accounts for quality, complexity, language, … Utility relates a user and a document Accounts for prior knowledge
Taylor’s Model of Question Formation Q1 Visceral Need End-user Search Q2 Conscious Need Intermediated Search Q3 Formalized Need Q4 Compromised Need (Query)
Evidence from Content and Ranked Retrieval Inverted indexing Postings, postings file Bag of terms Segmentation, phrases, stemming, stopwords Boolean retrieval Vector space ranked retrieval TF, IDF, length normalization, BM25 Blind relevance feedback
An “Inverted Index” Term Index Postings Term Doc 1 Doc 2 Doc 3 Doc 4 Postings List AI aid 1 1 4, 8 A AL all 1 1 1 2, 4, 6 BA back 1 1 1 1, 3, 7 B BR brown 1 1 1 1 1, 3, 5, 7 Postings List C come 1 1 1 1 2, 4, 6, 8 D dog 1 1 3, 5 F fox 1 1 1 3, 5, 7 G good 1 1 1 1 2, 4, 6, 8 Postings List J jump 1 3 L lazy 1 1 1 1 1, 3, 5, 7 M men 1 1 1 2, 4, 8 N now 1 1 1 2, 6, 8 O over 1 1 1 1 1 1, 3, 5, 7, 8 P party 1 1 6, 8 Q quick 1 1 1, 3 TH their 1 1 1 1, 5, 7 T TI time 1 1 1 2, 4, 6 15
A Partial Solution: TF*IDF High TF is evidence of meaning Low DF is evidence of term importance Equivalently high “IDF” Multiply them to get a “term weight” Add up the weights for each query term
Cosine Normalization Example 1 2 3 4 1 2 3 4 1 2 3 4 complicated 5 2 0.301 1.51 0.60 0.57 0.69 contaminated 4 1 3 0.125 0.50 0.13 0.38 0.29 0.13 0.14 fallout 5 4 3 0.125 0.63 0.50 0.38 0.37 0.19 0.44 information 6 3 3 2 0.000 interesting 1 0.602 0.60 0.62 nuclear 3 7 0.301 0.90 2.11 0.53 0.79 retrieval 6 1 4 0.125 0.75 0.13 0.50 0.77 0.05 0.57 siberia 2 0.602 1.20 0.71 Length 1.70 0.97 2.67 0.87 query: contaminated retrieval, Result: 2, 4, 1, 3 (compare to 2, 3, 1, 4)
Interaction Query formulation vs. Query by example Summarization Indicative vs. informative Clustering Visualization Projection, starfield, contour maps
Evaluation Criteria Measures of effectiveness User studies Effectiveness, efficiency, usability Measures of effectiveness Recall Precision F-measure Mean Average Precision User studies
Set-Based Effectiveness Measures Precision How much of what was found is relevant? Often of interest, particularly for interactive searching Recall How much of what is relevant was found? Particularly important for law, patents, and medicine
Accuracy and exhaustiveness Space of all documents Relevant + Retrieved Relevant Retrieved Not Relevant + Not Retrieved
Mean Average Precision Average of precision at each retrieved relevant document Relevant documents not retrieved contribute zero to score Hits 1-10 Precision 1/1 1/2 1/3 1/4 2/5 3/6 3/7 4/8 4/9 4/10 Hits 11-20 Precision 5/11 5/12 5/13 5/14 5/15 6/16 6/17 6/18 6/19 4/20 Assume total of 14 relevant documents: 8 relevant documents not retrieved contribute eight zeros MAP = .2307 = relevant document
Blair and Maron (1985) A classic study of retrieval effectiveness Earlier studies used unrealistically small collections Studied an archive of documents for a lawsuit 40,000 documents, ~350,000 pages of text 40 different queries Used IBM’s STAIRS full-text system Approach: Lawyers wanted at least 75% of all relevant documents Precision and recall evaluated only after the lawyers were satisfied with the results David C. Blair and M. E. Maron. (1984) An Evaluation of Retrieval Effectiveness for a Full-Text Document-Retrieval System. Communications of the ACM, 28(3), 289--299.
Blair and Maron’s Results Mean precision: 79% Mean recall: 20% (!!) Why recall was low? Users can’t anticipate terms used in relevant documents Differing technical terminology Slang, misspellings Other findings: Searches by both lawyers had similar performance Lawyer’s recall was not much different from paralegal’s “accident” might be referred to as “event”, “incident”, “situation”, “problem,” …
Web Search Crawling PageRank Anchor text Deep Web i.e., database-generated content
Evidence from Behavior Implicit feedback Privacy risks Recommender systems
Evidence from Metadata Standards e.g., Dublin Core Controlled vocabulary Text classification Information extraction
Filtering Retrieval Filtering Information needs differ for stable collection Filtering Collection differs for stable information needs
Multimedia IR Image retrieval Video: Motion detection Color histograms Video: Motion detection Camera, object Video: Shot structure Boundary detection, classification Video: OCR Closed caption, on screen caption, scene text