INFORMATION RETRIEVAL MEASUREMENT OF RELEVANCE EFFECTIVENESS 1Adrienn Skrop.

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Chapter 5: Introduction to Information Retrieval
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
CSE 330: Numerical Methods
Matrices, Digraphs, Markov Chains & Their Use by Google Leslie Hogben Iowa State University and American Institute of Mathematics Leslie Hogben Iowa State.
Search Engine – Metasearch Engine Comparison By Ali Can Akdemir.
Sampling Distributions
Exercising these ideas  You have a description of each item in a small collection. (30 web sites)  Assume we are looking for information about boxers,
Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model.
Precision and Recall.
Evaluating Search Engine
ISP 433/533 Week 2 IR Models.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Searching the Web II. The Web Why is it important: –“Free” ubiquitous information resource –Broad coverage of topics and perspectives –Becoming dominant.
INFO 624 Week 3 Retrieval System Evaluation
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Retrieval Evaluation: Precision and Recall. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity.
CS/Info 430: Information Retrieval
Computer comunication B Information retrieval. Information retrieval: introduction 1 This topic addresses the question on how it is possible to find relevant.
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
1 CS 430 / INFO 430 Information Retrieval Lecture 10 Probabilistic Information Retrieval.
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Evaluation CSC4170 Web Intelligence and Social Computing Tutorial 5 Tutor: Tom Chao Zhou
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
LIS618 lecture 11 i/r performance evaluation Thomas Krichel
Evaluation of Image Retrieval Results Relevant: images which meet user’s information need Irrelevant: images which don’t meet user’s information need Query:
Chapter 5: Information Retrieval and Web Search
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Evaluation David Kauchak cs458 Fall 2012 adapted from:
Evaluation David Kauchak cs160 Fall 2009 adapted from:
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Evaluation INST 734 Module 5 Doug Oard. Agenda Evaluation fundamentals  Test collections: evaluating sets Test collections: evaluating rankings Interleaving.
Presented by, Lokesh Chikkakempanna Authoritative Sources in a Hyperlinked environment.
1 CS 430: Information Discovery Lecture 9 Term Weighting and Ranking.
Information Retrieval Lecture 7. Recap of the last lecture Vector space scoring Efficiency considerations Nearest neighbors and approximations.
April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.
Planning a search strategy.  A search strategy may be broadly defined as a conscious approach to decision making to solve a problem or achieve an objective.
Math in Science.  Estimate  Accuracy  Precision  Significant Figures  Percent Error  Mean  Median  Mode  Range  Anomalous Data.
Common Fractions © Math As A Second Language All Rights Reserved next #6 Taking the Fear out of Math Dividing 1 3 ÷ 1 3.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
Chapter 6: Information Retrieval and Web Search
CS 533 Information Retrieval Systems.  Introduction  Connectivity Analysis  Kleinberg’s Algorithm  Problems Encountered  Improved Connectivity Analysis.
IR System Evaluation Farhad Oroumchian. IR System Evaluation System-centered strategy –Given documents, queries, and relevance judgments –Try several.
Parallel and Distributed Searching. Lecture Objectives Review Boolean Searching Indicate how Searches may be carried out in parallel Overview Distributed.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Basic Implementation and Evaluations Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Modern information retreival Chapter. 02: Modeling (Latent Semantic Indexing)
Measuring How Good Your Search Engine Is. *. Information System Evaluation l Before 1993 evaluations were done using a few small, well-known corpora of.
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Evaluation of Retrieval Effectiveness 1.
What Does the User Really Want ? Relevance, Precision and Recall.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Evaluation. The major goal of IR is to search document relevant to a user query. The evaluation of the performance of IR systems relies on the notion.
Information Retrieval Quality of a Search Engine.
Information Retrieval Lecture 3 Introduction to Information Retrieval (Manning et al. 2007) Chapter 8 For the MSc Computer Science Programme Dell Zhang.
Autumn Web Information retrieval (Web IR) Handout #14: Ranking Based on Click Through data Ali Mohammad Zareh Bidoki ECE Department, Yazd University.
General Architecture of Retrieval Systems 1Adrienn Skrop.
Virtual University of Pakistan
Evaluation.
Modern Information Retrieval
IR Theory: Evaluation Methods
CS 430: Information Discovery
Data Mining Chapter 6 Search Engines
Retrieval Performance Evaluation - Measures
Precision and Recall.
Presentation transcript:

INFORMATION RETRIEVAL MEASUREMENT OF RELEVANCE EFFECTIVENESS 1Adrienn Skrop

Relevance  "a state or quality of being to the purpose;  a state or quality of being related to the subject or matter at hand"  [The Cambridge English Dictionary, Grandreams Limited, London, English Edition, 1990] 2Adrienn Skrop

Effectiveness  The effectiveness of an information retrieval system means how well (or bad) it performs. 3Adrienn Skrop

Effectiveness categories  Relevance  Efficiency  Utility  User satisfaction Within each category, there are different specific effectiveness measures. 4Adrienn Skrop

Relevance measures  Precision: the proportion of relevant documents out of those returned.  Recall: the proportion of returned documents out of the relevant ones.  Fallout: the proportion of returned documents out of those nonrelevant. 5Adrienn Skrop

Efficiency measures  cost of search  amount of search time 6Adrienn Skrop

Utility measures  worth of search results in some currency 7Adrienn Skrop

User satisfaction measures  user's satisfaction with precision  intermediary's understanding of request 8Adrienn Skrop

Relevance effectiveness  Relevance effectiveness is the ability of a retrieval method or system to return relevant answers.  The traditional relevance effectiveness measures are the Precision, Recall and Fallout. 9Adrienn Skrop

Notations  Let  D denote a collection of documents,  q a query, and    0 denote the total number of relevant documents to query q,    0 denote the number of retrieved documents in response to query q,   denote the number of retrieved and relevant documents.  It is reasonable to assume that the total number of documents to be searched, M, is greater than those retrieved, i.e., |D| = M > Δ. 10Adrienn Skrop

Formal definitions  Recall ρ is defined as  Precision π is defined as  Fallout ϕ is defined as 11Adrienn Skrop

Visual representation of precision, recall and fallout 12Adrienn Skrop

Properties  0 ≤  ≤ 1,  0 ≤  ≤ 1,   = 0   = 0,   = 1   = 0,   =  =   (  =  = 1   = 0). 13Adrienn Skrop

Precision-Recall Graph Method  The precision-recall graph method is being used for the measurement of retrieval effectiveness under laboratory conditions, i.e., in a controlled and repeatable manner.  In this measurement method, test databases (test collections) are used.  Each test collection is manufactured by specialists, and has a fixed structure. 14Adrienn Skrop

Test collection  The documents d are given.  The queries q are given.  The relevance list is given, i.e., it is exactly known which document is relevant to which query. 15Adrienn Skrop

The methodology 1.For every query, retrieval should be performed (using the retrieval method whose relevance effectiveness is to be measured). 2.The hit list is compared with the relevance list (corresponding to the query under focus). The following recall levels are considered to be standard levels: 0.1; 0.2; 0.3; 0.4; 0.5; 0.6; 0.7; 0.8; 0.9; 1; (the levels can also be given as %, for example 0.1 = 10%). 16Adrienn Skrop

The methodology cont’d. 3.For every query, pairs of recall and precision are computed. 4.If the computed recall value is not standard, then it is approximated. 5.The precision values corresponding to equal recall values are averaged. 17Adrienn Skrop

Approximation When the computed recall value r is not equal to a standard level, the following interpolation method can be used to calculate the precision value p(r j ) corresponding to the standard recall value r j : p(r j ) = max p(r) r j - 1 < r ≤ r j where r j, j = 2,…,10, denotes the j th standard recall level. 18Adrienn Skrop

Averaging For all queries q i, the precision values p i (r j ) are averaged at all standard recall levels as follows where n denotes the number of queries used. call levels as follows: 19Adrienn Skrop

Typical precision-recall graph 20Adrienn Skrop

Measurement of Search Engine Effectiveness  The measurement of relevance effectiveness of a Web search engine is, typically (due to the characteristics of the Web), user centred.  It is an experimentally established fact that the majority of users examine, in general, the first two page of a hit list.  Thus, the search engine should rank the most relevant pages in the first few pages. 21Adrienn Skrop

Problems The traditional measures cannot always be computed (for example, recall and fallout). ⇓ search engines requires other measures than the traditional ones. 22Adrienn Skrop

New measures  When elaborating such new measures:  one is trying to use traditional measures (for example, precision which can be calculated also for a hit list of a search engine),  and on the other hand, takes into account different characteristics of the Web. 23Adrienn Skrop

M-L-S Method  The principles of the method are as follows:  Definition of relevance categories: each hit of a hit list returned in response to a query is assigned to exactly one category.  Definition of groups: the hit list is divided into s i groups having c i weights (i = 1,...,m). .  Weighting of hits: the value of first n-precision is defined as the sum of the weights of relevant hits divided by the maximum sum. 24Adrienn Skrop

M-L-S Method cont’d  The M-L-S method measures the capability of a search engine to rank relevant hits within the first 5 or 10 hits of the hit list. 25Adrienn Skrop

M-L-S method (first 5/10-precision) 1. Select search engine to be measured. 2. Define relevance categories. 3. Define groups. 4. Define weights. 5. Give queries q i (i = 1,...,s). 6. Compute P5 i and/or P10 i for q i (i=1,...,s). 7. The first 5/10-precision of the search engine is: where k = 5 or k = 10 26Adrienn Skrop

Relevance categories  The relevance categories are as follows:  0-category (irrelevant hit),  1-category (relevant hit). 27Adrienn Skrop

First 5 groups  When measuring first 5-precision, the first five hits are grouped into two groups as follows: 1.group: the first two hits (on the ground that they are on the first screen), 2.group: the following three hits. 28Adrienn Skrop

First 10 groups  When measuring the first 10-precision, the first ten hits are grouped into the following three groups: 1.group: the first two hits, 2.group: the next three hits, 3.group: the rest of five hits.  Groups 1 and 2 are based on the assumption that, in practice, the most important hits are the first five (usually on the first screen).  Hits within the same group get equal weights.  The weights reflect that the user is more satisfied if the relevant hits appear on the first screen. 29Adrienn Skrop

Weights  For the first 5-precision, the weights are:  For group 1: 10.  For group 2: 5.  For the first 10-precision, the weights are:  For group 1: 20.  For group 2: 17.  For group 3: 10.  Other but proportional values may be used. 30Adrienn Skrop

Definition of queries  The is a very important step.  It is advisable to define a topic first, and the queries after that. The topic should be broad enough as the goal is to see how well the search engine performs at a general level.  In order to avoid bias, define both general and specialised queries.  As most users prefer unstructured queries, such queries should be defined.  It is very important that the weights be defined prior to obtaining any hits, or else our assessments would be more subjective or biased (because, in this case, we already get to know how the search engine ‚behaves’ for certain queries). 31Adrienn Skrop

P5 measure where the numerator is the weighted sum of the relevant hits within the first five hits.  In the denominator, 35 is the weighted sum in the best case (i.e., when the first five hits are all relevant): (2 × 10) + (3 × 5) = 35.  For every missing hit out of five, 5 is subtracted. 32Adrienn Skrop

P10 measure  The P10 measure is defined in a similar manner as follows: 33Adrienn Skrop

RP Method  RP method is used to measure the relevance effectiveness of meta-search engines.  A Web meta-search engine uses the hit lists of search engines to produce its own hit list.  Thus, taking into account also the definition of precision, a method to compute a relative precision (referred to as RP method) can be given. 34Adrienn Skrop

RP method idea  If the hits of a meta-search engine are compared to the hits of the search engines used, then a relative precision can be defined for the meta-search engine. 35Adrienn Skrop

Notations Let  q be a query,  V be the number of hits returned by the meta-search engine under focus, and  T those hits out of these V that were ranked by at least one of the search engines used within the first m of its hits. 36Adrienn Skrop

Relative precision  Relative precision RP q,m of the meta-search engine is calculated as follows:  The value of m can be, for example  m = 10 or  m = 5, or  The value of relative precision should be computed for several queries, and an average should be computed. 37Adrienn Skrop

Relative Precision of Web meta- search engine  Select meta-search engine to be measured.  Define queries q i, i = 1,...,n.  Define the value of m; typically m = 5 or m = 10.  Perform searches for every q i using the meta-search engine as well as the search engines used by the meta-search engine, i = 1,...,n.  Compute relative precision fro q i as follows:  Compute average: 38Adrienn Skrop