Efficient Result-set Merging Across Thousands of Hosts Simulating an Internet-scale GIR application with the GOV2 Test Collection Christopher Fallen Arctic.

Slides:



Advertisements
Similar presentations
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Advertisements

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Hypothesis Testing Steps in Hypothesis Testing:
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
A Maximum Coherence Model for Dictionary-based Cross-language Information Retrieval Yi Liu, Rong Jin, Joyce Y. Chai Dept. of Computer Science and Engineering.
A Short Introduction to Curve Fitting and Regression by Brad Morantz
Evaluating Search Engine
LINEAR REGRESSION: Evaluating Regression Models. Overview Standard Error of the Estimate Goodness of Fit Coefficient of Determination Regression Coefficients.
Search Engines and Information Retrieval
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
Carnegie Mellon 1 Maximum Likelihood Estimation for Information Thresholding Yi Zhang & Jamie Callan Carnegie Mellon University
Statistical Methods Chichang Jou Tamkang University.
The Simple Regression Model
SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.
Retrieval Models II Vector Space, Probabilistic.  Allan, Ballesteros, Croft, and/or Turtle Properties of Inner Product The inner product is unbounded.
Evaluation CSC4170 Web Intelligence and Social Computing Tutorial 5 Tutor: Tom Chao Zhou
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Chap 3-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 3 Describing Data: Numerical Statistics for Business and Economics.
The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Search Engines and Information Retrieval Chapter 1.
- Interfering factors in the comparison of two sample means using unpaired samples may inflate the pooled estimate of variance of test results. - It is.
Introduction to Regression with Measurement Error STA302: Fall/Winter 2013.
Minimal Test Collections for Retrieval Evaluation B. Carterette, J. Allan, R. Sitaraman University of Massachusetts Amherst SIGIR2006.
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
Evaluating Search Engines in chapter 8 of the book Search Engines Information Retrieval in Practice Hongfei Yan.
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
The Common Shock Model for Correlations Between Lines of Insurance
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
IR System Evaluation Farhad Oroumchian. IR System Evaluation System-centered strategy –Given documents, queries, and relevance judgments –Try several.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
Research Methods. Measures of Central Tendency You will be familiar with measures of central tendency- averages. Mean Median Mode.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
1 Analysis of Variance & One Factor Designs Y= DEPENDENT VARIABLE (“yield”) (“response variable”) (“quality indicator”) X = INDEPENDENT VARIABLE (A possibly.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
© 2004 Chris Staff CSAW’04 University of Malta of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.
C.Watterscs64031 Evaluation Measures. C.Watterscs64032 Evaluation? Effectiveness? For whom? For what? Efficiency? Time? Computational Cost? Cost of missed.
 Used MapReduce algorithms to process a corpus of web pages and develop required index files  Inverted Index evaluated using TREC measures  Used Hadoop.
Performance Measurement. 2 Testing Environment.
PCB 3043L - General Ecology Data Analysis.
NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.
Learning to Estimate Query Difficulty Including Applications to Missing Content Detection and Distributed Information Retrieval Elad Yom-Tov, Shai Fine,
1 What Makes a Query Difficult? David Carmel, Elad YomTov, Adam Darlow, Dan Pelleg IBM Haifa Research Labs SIGIR 2006.
A Logistic Regression Approach to Distributed IR Ray R. Larson : School of Information Management & Systems, University of California, Berkeley --
Carnegie Mellon School of Computer Science Language Technologies Institute CMU Team-1 in TDT 2004 Workshop 1 CMU TEAM-A in TDT 2004 Topic Tracking Yiming.
Evaluation. The major goal of IR is to search document relevant to a user query. The evaluation of the performance of IR systems relies on the notion.
Indri at TREC 2004: UMass Terabyte Track Overview Don Metzler University of Massachusetts, Amherst.
Modern Information Retrieval
Information Retrieval Lecture 3 Introduction to Information Retrieval (Manning et al. 2007) Chapter 8 For the MSc Computer Science Programme Dell Zhang.
Nonparametric Statistics
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008 Annotations by Michael L. Nelson.
Federated text retrieval from uncooperative overlapped collections Milad Shokouhi, RMIT University, Melbourne, Australia Justin Zobel, RMIT University,
LEARNING IN A PAIRWISE TERM-TERM PROXIMITY FRAMEWORK FOR INFORMATION RETRIEVAL Ronan Cummins, Colm O’Riordan (SIGIR’09) Speaker : Yi-Ling Tai Date : 2010/03/15.
Nonparametric Statistics
Evaluation Anisio Lacerda.
Walid Magdy Gareth Jones
Evaluation of IR Systems
Tests in biostatistics and how to apply them? Prepared by Ajay Prakash Uniyal Department of Plant Sciences Central University of Punjab.
Nonparametric Statistics
The greatest blessing in life is
Warmup To check the accuracy of a scale, a weight is weighed repeatedly. The scale readings are normally distributed with a standard deviation of
Cumulated Gain-Based Evaluation of IR Techniques
INF 141: Information Retrieval
Presentation transcript:

Efficient Result-set Merging Across Thousands of Hosts Simulating an Internet-scale GIR application with the GOV2 Test Collection Christopher Fallen Arctic Region Supercomputing Center University of Alaska Fairbanks

Overview Internet-scale GIR simulation for the TREC 2006 Terabyte Track Using the distribution of relevance judgments from the TREC Terabyte Tracks to limit result-set length Collection ranking

TREC 2006 Terabyte Track GOV2 test collection is the publicly available text from in early 2004http://*.gov –426 GB of text –25 million documents Retrieve ranked result sets for 50 topics –Top results from each group are placed in the relevance pool –140,000 relevance judgments (qrels) for 149 topics (three years of TREC Terabyte Tracks) {qrels} := {(topic, document, relevance)}

A GIR simulation using GOV2 Suppose each host provides an Index/Search (I/S) Service –The query processor (QP) must merge ranked results from every responding host Partition GOV2 into hostnames, create one index per host –17,000 hostnames –{Mean, median} size = {1,500, 34} documents –Standard deviation = 18,000 documents

Ranked distribution of GOV2 host sizes

Consequences of search-by- host for result-set merging Hundreds of ranked lists (responding hosts) per topic –Many short lists –Several long lists –Large bandwidth requirement for each query Merging algorithms must be robust with respect to long and short result-sets and the large number of result-sets

“Logistic Regression” Merge Find logistic curve parameters that yield the best fit to the relevance score- at-rank data Parameter estimates need several results per result-set

Distribution of Relevance Judgments For most topics, there are many more non- relevant qrels than relevant qrels Across all topics, the number of non-relevant, relevant, and very relevant qrels is strongly correlated with host size The hosts that contain relevant qrels also contain non-relevant qrels –But the relevant documents are probably near the top of each host’s result-set!

(# relevant)/(# non-relevant) qrels

Skimming the topn documents from each host Is there a simple functional relationship between the number of likely relevant documents in a host (that retrieves any documents at all) to the size of the host? A proportional model of relevance for each relevance score r is simple… (# r qrels from Host) = C r (topic) * |Host|

Skimming the top n documents from each host … and the constant C r (topic) of proportionality can be measured from TREC Terabyte Track data, then averaged over topics to get Does the model adequately describe the data? –A posteriori: Does the model describe TREC data? –A priori: Can the model be used to truncate result- sets based on host size without affecting IR performance?

Proportional relevance model Two-way ANOVA applied to host-by-topic table of the values (# r qrels from Host)/|Host| = C r For the relevant and very-relevant qrels only, the total variance is largely between topics but not within the topics –C rel is not sensitive across hosts for a fixed topic –The standard deviation of C rel across topics is larger than the mean value so the stdv. can be used as a conservative estimate and the mean can be used as an aggressive estimate

Proportional relevance model Select the top-performing topics from one run of the TREC 2006 TB Track Truncate the result-set from each host using the aggressive =.0005 Merge truncated result sets and compare the IR performance of the merged list with the list merged from the entire result sets –No statistically significant difference in performance of either result-set was observed after discarding >30% of the results from one set.

Relevance-ranking hosts Assume that web documents are grouped non-arbitrarily by hostname according to content Then many orderings or rankings are possible on (host, document) pairs –Dictionary order –Round-robin Truncating the ranked lists of hosts may lead to increased search efficiency with negligible IR performance penalty

Retrieve from only 1:5 hosts?

Future work Collection ranking –Maximum document relevance score –Minimum query projection residual into reduced collection term-doc subspace