We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byEduardo Hatch
Modified over 4 years ago
© CvR SIGIR2002
© CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle
© CvR SIGIR2002 Introductory Remarks Exclusions – IE, TM,.. Commercial successes and failures Caveats Why we have survived. Where we were, where we are, where we are going.
© CvR SIGIR2002 Pre-history Smee(1850) Wells (1936) Bush (1945) Bagley (1951) MIT Fairthorne (1945-52) RAE Luhn(1958) Mooers(1952)
© CvR SIGIR2002 Experimental Methodology CleverdonCranfield LancasterMedlars KeenCranfield/Smart SaracevicCWRU SaltonSmart Sparck JonesIdeal Test Collection Blair & MaronStairs HarmanTREC
© CvR SIGIR2002 Evaluation ABNO/OBNA(Fairthorne) Precision, Recall -> trade-off (Cleverdon) Probabilistic versions (Swets) Measure-theoretic(Bollman)
© CvR SIGIR2002 ‘the world in 1980 according to Belver Griffith’ Who is missing?
© CvR SIGIR2002 Landmarks Luhn’s tf weighting Architecture Relevance Feedback Stemming Poisson Model -> BM25 Statistical weighting tf*idf Various models
© CvR SIGIR2002 Luhn’s curve
© CvR SIGIR2002 What about evaluation? Information Problem Indexed Objects Query Fictive Objects Representation Compare
© CvR SIGIR2002 Architecture (Brenda Gerrie, 1983)
© CvR SIGIR2002 Time I ( highlights for me ) 1952 Mooers coins IR 1958 International Conference on Scientific Information 1960 Cranfield I 1960 Maron and Kuhns paper 1961 Towards IR, RAF 1961 (-1965) Smart built 1964 Washington conference on Association Methods 1966 Cranfield II 1968 Salton’s first book 197- Cranfield conferences 1975 CvR’s book 1975 Ideal test collection 1976 KSJ/SER JASIS paper
© CvR SIGIR2002 Time II 1978 1 st SIGIR 1979 1 st BCSIRSG 1980 1 st joint ACM/BCS conference on IR1 st joint ACM/BCS conference on IR 1981 KSJ book on IR Experiments 1982 Belkin et al ASK hypothesis 1983 - Okapi started 1985 RIAO-1 1986 CvR logic model 1990 Deerwester et al,LSI paper 1991 CoLIS 1 (in Tampere!) 1991 – Inquiry started 1992 Ingwersen’s book 1992 TREC-1 1998 Croft Ponte paper on language models
© CvR SIGIR2002 Matching Inference Model Classification Query Language Query Definition Query Dependence Items wanted Error response Logic Exact MatchPartial (best) Match DeductionInduction DeterministicProbabilistic MonotheticPolythetic ArtificialNatural CompleteIncomplete YesNo MatchingRelevant SensitiveInsensitive ClassicalNon-classical Representationa prioria posteriori Language Models Logical Statistical dimensions
© CvR SIGIR2002 Probabilistic Retrieval Maron and Kuhns Miller (following Goffman) SER/KSJ Croft
© CvR SIGIR2002 Vector Space Model Salton Murray Rocchio
© CvR SIGIR2002 Logical Model Mooers/Faithorne1960+ Hillman1965 Cooper/Maron1970+ CvR1986 Nie/Amati/Bruza/Huibers1990+ For Against Bar-Hillel1950+ Kasher1966
© CvR SIGIR2002 Buried Treasure Dependence e.g C.T Yu Unified Probabilistic Model Maron/Cooper/SER Co-relevanceIvie Stochastic ProcessesMandelbrot/Herdan Brouwerian LogicsHillman Error AnalysisHughes/Cover/Duda
© CvR SIGIR2002 Hypotheses/Principles P & R trade-off – ABNO/OBNA Exhaustivity/Specificity Cluster Hypothesis Association Hypothesis Probability Ranking Principle Logical Uncertainty Principle ASK Polyrepresentation Items may be associated without apparent meaning but exploiting their association may help retrieval
© CvR SIGIR2002 Postulates of Impotence (according to Swanson, 1988) An information need cannot be expressed independent of context It is impossible to instruct a machine to translate a request into adequate search terms A document’s relevance depends on other seen documents It is never possible to verify whether all relevant documents have been found Machines cannot recognise meaning -> can’t beat human indexing etc
© CvR SIGIR2002 ….more postulates Word-occurrence statistics can neither represent meaning nor substitute for it The ability of an IR system to support an iterative process cannot be evaluated in terms of single-iteration human relevance judgment You can have either subtle relevance judgments or highly effective mechanised procedures, but not both Thus, consistently effective fully automatic in dexing and retrieval is not possible
© CvR SIGIR2002 ? Conclusions
© CvR SIGIR2002 Co-ordination is positively correlated with external relevance Jackson, 1969 – Association Hypothesis The larger the number of matching descriptive items, for a request and document, the more likely the document is to be relevant to the request Sparck Jones, 1971- Relevance Hypothesis Matching
© CvR SIGIR2002 It is a common fallacy, underwritten at this date by the investment of several million dollars in a variety of retrieval hardware, that the algebra of Boole (1847) is the appropriate formalism for retrieval design…..The ‘logic’ of Brouwer, as invoked by Fairthorne, is one such weakening of the postulate system,…… Mooers, 1961 Another one: Logical Uncertainty Principle CvR, 1986 Inference
© CvR SIGIR2002 Co-occurrence [of terms] as a basis for grouping makes for good swops i.e. permits substitutions which retrieve relevant rather than irrelevant documents. Sparck Jones, 1971. – Classification Hypothesis If an index term is good at discriminating relevant from non-relevant document then any closely associated index term is also likely to be good at this. CvR, 1979 – Association Hypothesis Closely associated documents tend to be relevant to the same requests – CvR, 1971 - Cluster Hypothesis Classification
© CvR SIGIR2002 Vector Space/LSI Probabilistic Logical Models
© CvR SIGIR2002 Query Language Artificial/Natural Multilingual/cross-lingual images none at all
© CvR SIGIR2002 Query Definition Complete/Incomplete Independence/Dependence Weighted/Unweighted Query Expansion/one shot (feedback, web) Sense disambiguation Cross-lingual
© CvR SIGIR2002 Relevance Feedback Ostensive Retrieval Context Query Expansion Query Dependence
© CvR SIGIR2002 Relevance ASK: Anomolous State of Knowledge Situated Relevance Items wanted
© CvR SIGIR2002 Precision and Recall Error response
© CvR SIGIR2002 Logic standard/non-standard probabilistic logic information flow/logic
© CvR SIGIR2002 Discrimination/Representation Specificity/Exhaustivity Representation
© CvR SIGIR2002 NLP Montague Semantics Language Models Stochastic
© CvR SIGIR2002
© CvR1 The Geometry of IR Keith van Rijsbergen Tampere 15 th August, 2002 (lost in Hilbert space!)
Relevance Feedback User tells system whether returned/disseminated documents are relevant to query/information need or not Feedback: usually positive sometimes.
Information Retrieval and Organisation Chapter 11 Probabilistic Information Retrieval Dell Zhang Birkbeck, University of London.
WEB MINING. Why IR ？ Research & Fun
Chapter 5: Introduction to Information Retrieval
Introduction to Information Retrieval
INSTRUCTOR: DR.NICK EVANGELOPOULOS PRESENTED BY: QIUXIA WU CHAPTER 2 Information retrieval DSCI 5240.
Modern Information Retrieval Chapter 1: Introduction
Probabilistic Information Retrieval Chris Manning, Pandu Nayak and
Introduction to Information Retrieval (Part 2) By Evren Ermis.
Probabilistic Ranking Principle
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Information Retrieval Models: Probabilistic Models
Web Search – Summer Term 2006 I. General Introduction (c) Wolfgang Hürst, Albert-Ludwigs-University.
Search Engines and Information Retrieval
Information Retrieval Review
ISP 433/533 Week 2 IR Models.
Database Management Systems, R. Ramakrishnan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides.
1 CS 430: Information Discovery Lecture 10 Cranfield and TREC.
Chapter 5: Query Operations Baeza-Yates, 1999 Modern Information Retrieval.
© 2018 SlidePlayer.com Inc. All rights reserved.