Basic Implementation and Evaluations Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.

Slides:



Advertisements
Similar presentations
Text Processing. Slide 1 Simple Tokenization Analyze text into a sequence of discrete tokens (words). Sometimes punctuation ( ), numbers (1999),
Advertisements

Chapter 5: Introduction to Information Retrieval
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Lecture 11 Search, Corpora Characteristics, & Lucene Introduction.
Performance Evaluation
Exercising these ideas  You have a description of each item in a small collection. (30 web sites)  Assume we are looking for information about boxers,
Evaluating Search Engine
Information Retrieval in Practice
Information Retrieval Review
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
The College of Saint Rose CIS 460 – Search and Information Retrieval David Goldschmidt, Ph.D. from Search Engines: Information Retrieval in Practice, 1st.
Modern Information Retrieval
INFO 624 Week 3 Retrieval System Evaluation
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Retrieval Evaluation: Precision and Recall. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity.
Evaluating the Performance of IR Sytems
1 Basic Text Processing and Indexing. 2 Document Processing Steps Lexical analysis (tokenizing) Stopwords removal Stemming Selection of indexing terms.
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
1 CS 430: Information Discovery Lecture 2 Introduction to Text Based Information Retrieval.
ISP 433/633 Week 6 IR Evaluation. Why Evaluate? Determine if the system is desirable Make comparative assessments.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
1 Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
Information Retrieval CSE 8337 Spring 2007 Retrieval Evaluation Many slides in this section are adapted from Prof. Raymond J. Mooney in CS378 at UT which.
Search Engines and Information Retrieval Chapter 1.
Information Retrieval and Web Search IR Evaluation and IR Standard Text Collections.
Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.
Information Retrieval and Web Search Text properties (Note: some of the slides in this set have been adapted from the course taught by Prof. James Allan.
Information Retrieval Evaluation and the Retrieval Process.
1 Fall 2004, CIS, Temple University CIS527: Data Warehousing, Filtering, and Mining Lecture 11 Mining Complex Types of Data: Information Retrieval Lecture.
Chapter 6: Information Retrieval and Web Search
Information Retrieval Model Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
PERFORMANCE EVALUATION Information Retrieval Systems 1.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Web- and Multimedia-based Information Systems Lecture 2.
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
1 Language Specific Crawler for Myanmar Web Pages Pann Yu Mon Management and Information System Engineering Department Nagaoka University of Technology,
Performance Measurement. 2 Testing Environment.
An Efficient Information Retrieval System Objectives: n Efficient Retrieval incorporating keyword’s position; and occurrences of keywords in heading or.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Evaluation. The major goal of IR is to search document relevant to a user query. The evaluation of the performance of IR systems relies on the notion.
Information Retrieval Quality of a Search Engine.
Selecting Relevant Documents Assume: –we already have a corpus of documents defined. –goal is to return a subset of those documents. –Individual documents.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
INFORMATION RETRIEVAL MEASUREMENT OF RELEVANCE EFFECTIVENESS 1Adrienn Skrop.
Knowledge and Information Retrieval Dr Nicholas Gibbins 32/4037.
Information Retrieval in Practice
Sampath Jayarathna Cal Poly Pomona
Text Based Information Retrieval
7CCSMWAL Algorithmic Issues in the WWW
Information Retrieval and Web Search
Special Topics on Information Retrieval
Evaluation.
אחזור מידע, מנועי חיפוש וספריות
Modern Information Retrieval
IR Theory: Evaluation Methods
Java VSR Implementation
Text Categorization Assigning documents to a fixed set of categories
CS 430: Information Discovery
Java VSR Implementation
Dr. Sampath Jayarathna Cal Poly Pomona
Retrieval Evaluation - Measures
Java VSR Implementation
Retrieval Performance Evaluation - Measures
Dr. Sampath Jayarathna Cal Poly Pomona
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Information Retrieval and Web Design
Presentation transcript:

Basic Implementation and Evaluations Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL

2 Simple Tokenizing Analyze text into a sequence of discrete tokens (words) Sometimes punctuation ( ), numbers (1999), and case (Republican vs. republican) can be a meaningful part of a token However, frequently they are not Simplest approach is to ignore all numbers and punctuation and use only case-insensitive unbroken strings of alphabetic characters as tokens

3 Tokenizing HTML Should text in HTML commands not typically seen by the user be included as tokens? Words appearing in URLs Words appearing in “meta text” of images Simplest approach is to exclude all HTML tag information (between “ ”) from tokenization

4 Stopwords It is typical to exclude high-frequency words (e.g. function words: “a”, “the”, “in”, “to”; pronouns: “I”, “he”, “she”, “it”) Stopwords are language dependent. VSR uses a standard set of about 500 for English For efficiency, store strings for stopwords in a hashtable to recognize them in constant time

5 Stemming Reduce tokens to “root” form of words to recognize morphological variation. “computer”, “computational”, “computation” all reduced to same token “compute” Correct morphological analysis is language specific and can be complex Stemming “blindly” strips off known affixes (prefixes and suffixes) in an iterative fashion

6 Porter Stemmer Simple procedure for removing known affixes in English without using a dictionary Can produce unusual stems that are not English words: “computer”, “computational”, “computation” all reduced to same token “compute” May conflate (reduce to the same token) words that are actually distinct Not recognize all morphological derivations

7 Porter Stemmer Errors Errors of “commission”: organization, organ  organ police, policy  polic arm, army  arm Errors of “omission”: cylinder, cylindrical create, creation Europe, European

Evaluation

9 Why System Evaluation? There are many retrieval models, algorithms, systems, which one is the best? What is the best component for: Ranking function (cosine, …) Term selection (stopword removal, stemming…) Term weighting (TF, TF-IDF,…) How far down the ranked list will a user need to look to find some/all relevant documents?

10 Difficulties in Evaluating IR Systems Effectiveness is related to the relevancy of retrieved items Even if relevancy is binary, it can be a difficult judgment to make Relevancy, from a human standpoint, is: Subjective: Depends upon a specific user’s judgment Situational: Relates to user’s current needs Cognitive: Depends on human perception and behavior Dynamic: Changes over time

11 Human Labeled Corpora Start with a corpus of documents Collect a set of queries for this corpus Have one or more human experts exhaustively label the relevant documents for each query Typically assumes binary relevance judgments Requires considerable human effort for large document/query corpora

12 Precision and Recall Precision The ability to retrieve top-ranked documents that are mostly relevant Recall The ability of the search to find all of the relevant items in the corpus

13 Precision and Recall Relevant documents Retrieved documents Entire document collection retrieved & relevant not retrieved but relevant retrieved & irrelevant Not retrieved & irrelevant retrievednot retrieved relevant irrelevant

14 Determining Recall is Difficult Total number of relevant items is sometimes not available: Sample across the database and perform relevance judgment on these items Apply different retrieval algorithms to the same database for the same query. The aggregate of relevant items is taken as the total relevant set

15 Trade-off between Recall and Precision Recall Precision The ideal Returns relevant documents but misses many useful ones too Returns most relevant documents but includes lots of junk

16 Computing Recall/Precision Points For a given query, produce the ranked list of retrievals Adjusting a threshold on this ranked list produces different sets of retrieved documents, and therefore different recall/precision measures Mark each document in the ranked list that is relevant Compute a recall/precision pair for each position in the ranked list that contains a relevant document

17 Common Representation Relevant = A+C Retrieved = A+B Collections size = A+B+C+D Precision = A/(A+B) Recall = A/(A+C) Miss = C/(A+C) False alarm = B/(B+D) RelevantNot Relevant Retrieved AB Not Retrieved CD

18 Precision and Recall Example <- Relevant documents Recall Precision Recall Precision Ranking 1 Ranking 2

19 Average Precision of a Query Often want a single number effectiveness measure E.g. for machine learning algorithm to detect improvement Average precision is widely use in IR Calculate by averaging when recall increases Recall Precision Recall Precision Average precision 53.2 % Average precision 42.3 %

20 Average Recall/Precision Curve Typically average performance over a large set of queries Compute average precision at each standard recall level across all queries Plot average precision/recall curves to evaluate overall system performance on a document/query corpus

21 Compare Two or More Systems The curve closest to the upper right-hand corner of the graph indicates the best performance

22 Fallout Rate Problems with both precision and recall: Number of irrelevant documents in the collection is not taken into account Recall is undefined when there is no relevant document in the collection Precision is undefined when no document is retrieved

23 Subjective Relevance Measure Novelty Ratio: The proportion of items retrieved and judged relevant by the user and of which they were previously unaware Ability to find new information on a topic Coverage Ratio: The proportion of relevant items retrieved out of the total relevant documents known to a user prior to the search Relevant when the user wants to locate documents which they have seen before (e.g., the budget report for Year 2000)

24 Other Factors to Consider User effort: Work required from the user in formulating queries, conducting the search, and screening the output Response time: Time interval between receipt of a user query and the presentation of system responses Form of presentation: Influence of search output format on the user ’ s ability to utilize the retrieved materials Collection coverage: Extent to which any/all relevant items are included in the document corpus