Presentation is loading. Please wait.

Presentation is loading. Please wait.

Basic Implementation and Evaluations Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.

Similar presentations


Presentation on theme: "Basic Implementation and Evaluations Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL."— Presentation transcript:

1 Basic Implementation and Evaluations Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL

2 2 Simple Tokenizing Analyze text into a sequence of discrete tokens (words) Sometimes punctuation (e-mail), numbers (1999), and case (Republican vs. republican) can be a meaningful part of a token However, frequently they are not Simplest approach is to ignore all numbers and punctuation and use only case-insensitive unbroken strings of alphabetic characters as tokens

3 3 Tokenizing HTML Should text in HTML commands not typically seen by the user be included as tokens? Words appearing in URLs Words appearing in “meta text” of images Simplest approach is to exclude all HTML tag information (between “ ”) from tokenization

4 4 Stopwords It is typical to exclude high-frequency words (e.g. function words: “a”, “the”, “in”, “to”; pronouns: “I”, “he”, “she”, “it”) Stopwords are language dependent. VSR uses a standard set of about 500 for English For efficiency, store strings for stopwords in a hashtable to recognize them in constant time

5 5 Stemming Reduce tokens to “root” form of words to recognize morphological variation. “computer”, “computational”, “computation” all reduced to same token “compute” Correct morphological analysis is language specific and can be complex Stemming “blindly” strips off known affixes (prefixes and suffixes) in an iterative fashion

6 6 Porter Stemmer Simple procedure for removing known affixes in English without using a dictionary Can produce unusual stems that are not English words: “computer”, “computational”, “computation” all reduced to same token “compute” May conflate (reduce to the same token) words that are actually distinct Not recognize all morphological derivations

7 7 Porter Stemmer Errors Errors of “commission”: organization, organ  organ police, policy  polic arm, army  arm Errors of “omission”: cylinder, cylindrical create, creation Europe, European

8 Evaluation

9 9 Why System Evaluation? There are many retrieval models, algorithms, systems, which one is the best? What is the best component for: Ranking function (cosine, …) Term selection (stopword removal, stemming…) Term weighting (TF, TF-IDF,…) How far down the ranked list will a user need to look to find some/all relevant documents?

10 10 Difficulties in Evaluating IR Systems Effectiveness is related to the relevancy of retrieved items Even if relevancy is binary, it can be a difficult judgment to make Relevancy, from a human standpoint, is: Subjective: Depends upon a specific user’s judgment Situational: Relates to user’s current needs Cognitive: Depends on human perception and behavior Dynamic: Changes over time

11 11 Human Labeled Corpora Start with a corpus of documents Collect a set of queries for this corpus Have one or more human experts exhaustively label the relevant documents for each query Typically assumes binary relevance judgments Requires considerable human effort for large document/query corpora

12 12 Precision and Recall Precision The ability to retrieve top-ranked documents that are mostly relevant Recall The ability of the search to find all of the relevant items in the corpus

13 13 Precision and Recall Relevant documents Retrieved documents Entire document collection retrieved & relevant not retrieved but relevant retrieved & irrelevant Not retrieved & irrelevant retrievednot retrieved relevant irrelevant

14 14 Determining Recall is Difficult Total number of relevant items is sometimes not available: Sample across the database and perform relevance judgment on these items Apply different retrieval algorithms to the same database for the same query. The aggregate of relevant items is taken as the total relevant set

15 15 Trade-off between Recall and Precision 1 0 1 Recall Precision The ideal Returns relevant documents but misses many useful ones too Returns most relevant documents but includes lots of junk

16 16 Computing Recall/Precision Points For a given query, produce the ranked list of retrievals Adjusting a threshold on this ranked list produces different sets of retrieved documents, and therefore different recall/precision measures Mark each document in the ranked list that is relevant Compute a recall/precision pair for each position in the ranked list that contains a relevant document

17 17 Common Representation Relevant = A+C Retrieved = A+B Collections size = A+B+C+D Precision = A/(A+B) Recall = A/(A+C) Miss = C/(A+C) False alarm = B/(B+D) RelevantNot Relevant Retrieved AB Not Retrieved CD

18 18 Precision and Recall Example <- Relevant documents Recall 0.2 0.2 0.4 0.4 0.4 0.6 0.6 0.6 0.8 1.0 Precision 1.0 0.5 0.67 0.5 0.4 0.5 0.43 0.38 0.44 0.5 Recall 0.0 0.2 0.2 0.2 0.4 0.6 0.8 1.0 1.0 1.0 Precision 0.0 0.5 0.33 0.25 0.4 0.5 0.57 0.63 0.55 0.5 Ranking 1 Ranking 2

19 19 Average Precision of a Query Often want a single number effectiveness measure E.g. for machine learning algorithm to detect improvement Average precision is widely use in IR Calculate by averaging when recall increases Recall 0.2 0.2 0.4 0.4 0.4 0.6 0.6 0.6 0.8 1.0 Precision 1.0 0.5 0.67 0.5 0.4 0.5 0.43 0.38 0.44 0.5 Recall 0.0 0.2 0.2 0.2 0.4 0.6 0.8 1.0 1.0 1.0 Precision 0.0 0.5 0.33 0.25 0.4 0.5 0.57 0.63 0.55 0.5 Average precision 53.2 % Average precision 42.3 %

20 20 Average Recall/Precision Curve Typically average performance over a large set of queries Compute average precision at each standard recall level across all queries Plot average precision/recall curves to evaluate overall system performance on a document/query corpus

21 21 Compare Two or More Systems The curve closest to the upper right-hand corner of the graph indicates the best performance

22 22 Fallout Rate Problems with both precision and recall: Number of irrelevant documents in the collection is not taken into account Recall is undefined when there is no relevant document in the collection Precision is undefined when no document is retrieved

23 23 Subjective Relevance Measure Novelty Ratio: The proportion of items retrieved and judged relevant by the user and of which they were previously unaware Ability to find new information on a topic Coverage Ratio: The proportion of relevant items retrieved out of the total relevant documents known to a user prior to the search Relevant when the user wants to locate documents which they have seen before (e.g., the budget report for Year 2000)

24 24 Other Factors to Consider User effort: Work required from the user in formulating queries, conducting the search, and screening the output Response time: Time interval between receipt of a user query and the presentation of system responses Form of presentation: Influence of search output format on the user ’ s ability to utilize the retrieved materials Collection coverage: Extent to which any/all relevant items are included in the document corpus


Download ppt "Basic Implementation and Evaluations Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL."

Similar presentations


Ads by Google