We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byEduardo Hatch
Modified over 2 years ago
© CvR SIGIR2002
© CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle
© CvR SIGIR2002 Introductory Remarks Exclusions – IE, TM,.. Commercial successes and failures Caveats Why we have survived. Where we were, where we are, where we are going.
© CvR SIGIR2002 Pre-history Smee(1850) Wells (1936) Bush (1945) Bagley (1951) MIT Fairthorne ( ) RAE Luhn(1958) Mooers(1952)
© CvR SIGIR2002 Experimental Methodology CleverdonCranfield LancasterMedlars KeenCranfield/Smart SaracevicCWRU SaltonSmart Sparck JonesIdeal Test Collection Blair & MaronStairs HarmanTREC
© CvR SIGIR2002 Evaluation ABNO/OBNA(Fairthorne) Precision, Recall -> trade-off (Cleverdon) Probabilistic versions (Swets) Measure-theoretic(Bollman)
© CvR SIGIR2002 ‘the world in 1980 according to Belver Griffith’ Who is missing?
© CvR SIGIR2002 Landmarks Luhn’s tf weighting Architecture Relevance Feedback Stemming Poisson Model -> BM25 Statistical weighting tf*idf Various models
© CvR SIGIR2002 Luhn’s curve
© CvR SIGIR2002 What about evaluation? Information Problem Indexed Objects Query Fictive Objects Representation Compare
© CvR SIGIR2002 Architecture (Brenda Gerrie, 1983)
© CvR SIGIR2002 Time I ( highlights for me ) 1952 Mooers coins IR 1958 International Conference on Scientific Information 1960 Cranfield I 1960 Maron and Kuhns paper 1961 Towards IR, RAF 1961 (-1965) Smart built 1964 Washington conference on Association Methods 1966 Cranfield II 1968 Salton’s first book 197- Cranfield conferences 1975 CvR’s book 1975 Ideal test collection 1976 KSJ/SER JASIS paper
© CvR SIGIR2002 Time II st SIGIR st BCSIRSG st joint ACM/BCS conference on IR1 st joint ACM/BCS conference on IR 1981 KSJ book on IR Experiments 1982 Belkin et al ASK hypothesis Okapi started 1985 RIAO CvR logic model 1990 Deerwester et al,LSI paper 1991 CoLIS 1 (in Tampere!) 1991 – Inquiry started 1992 Ingwersen’s book 1992 TREC Croft Ponte paper on language models
© CvR SIGIR2002 Matching Inference Model Classification Query Language Query Definition Query Dependence Items wanted Error response Logic Exact MatchPartial (best) Match DeductionInduction DeterministicProbabilistic MonotheticPolythetic ArtificialNatural CompleteIncomplete YesNo MatchingRelevant SensitiveInsensitive ClassicalNon-classical Representationa prioria posteriori Language Models Logical Statistical dimensions
© CvR SIGIR2002 Probabilistic Retrieval Maron and Kuhns Miller (following Goffman) SER/KSJ Croft
© CvR SIGIR2002 Vector Space Model Salton Murray Rocchio
© CvR SIGIR2002 Logical Model Mooers/Faithorne1960+ Hillman1965 Cooper/Maron1970+ CvR1986 Nie/Amati/Bruza/Huibers1990+ For Against Bar-Hillel1950+ Kasher1966
© CvR SIGIR2002 Buried Treasure Dependence e.g C.T Yu Unified Probabilistic Model Maron/Cooper/SER Co-relevanceIvie Stochastic ProcessesMandelbrot/Herdan Brouwerian LogicsHillman Error AnalysisHughes/Cover/Duda
© CvR SIGIR2002 Hypotheses/Principles P & R trade-off – ABNO/OBNA Exhaustivity/Specificity Cluster Hypothesis Association Hypothesis Probability Ranking Principle Logical Uncertainty Principle ASK Polyrepresentation Items may be associated without apparent meaning but exploiting their association may help retrieval
© CvR SIGIR2002 Postulates of Impotence (according to Swanson, 1988) An information need cannot be expressed independent of context It is impossible to instruct a machine to translate a request into adequate search terms A document’s relevance depends on other seen documents It is never possible to verify whether all relevant documents have been found Machines cannot recognise meaning -> can’t beat human indexing etc
© CvR SIGIR2002 ….more postulates Word-occurrence statistics can neither represent meaning nor substitute for it The ability of an IR system to support an iterative process cannot be evaluated in terms of single-iteration human relevance judgment You can have either subtle relevance judgments or highly effective mechanised procedures, but not both Thus, consistently effective fully automatic in dexing and retrieval is not possible
© CvR SIGIR2002 ? Conclusions
© CvR SIGIR2002 Co-ordination is positively correlated with external relevance Jackson, 1969 – Association Hypothesis The larger the number of matching descriptive items, for a request and document, the more likely the document is to be relevant to the request Sparck Jones, Relevance Hypothesis Matching
© CvR SIGIR2002 It is a common fallacy, underwritten at this date by the investment of several million dollars in a variety of retrieval hardware, that the algebra of Boole (1847) is the appropriate formalism for retrieval design…..The ‘logic’ of Brouwer, as invoked by Fairthorne, is one such weakening of the postulate system,…… Mooers, 1961 Another one: Logical Uncertainty Principle CvR, 1986 Inference
© CvR SIGIR2002 Co-occurrence [of terms] as a basis for grouping makes for good swops i.e. permits substitutions which retrieve relevant rather than irrelevant documents. Sparck Jones, – Classification Hypothesis If an index term is good at discriminating relevant from non-relevant document then any closely associated index term is also likely to be good at this. CvR, 1979 – Association Hypothesis Closely associated documents tend to be relevant to the same requests – CvR, Cluster Hypothesis Classification
© CvR SIGIR2002 Vector Space/LSI Probabilistic Logical Models
© CvR SIGIR2002 Query Language Artificial/Natural Multilingual/cross-lingual images none at all
© CvR SIGIR2002 Query Definition Complete/Incomplete Independence/Dependence Weighted/Unweighted Query Expansion/one shot (feedback, web) Sense disambiguation Cross-lingual
© CvR SIGIR2002 Relevance Feedback Ostensive Retrieval Context Query Expansion Query Dependence
© CvR SIGIR2002 Relevance ASK: Anomolous State of Knowledge Situated Relevance Items wanted
© CvR SIGIR2002 Precision and Recall Error response
© CvR SIGIR2002 Logic standard/non-standard probabilistic logic information flow/logic
© CvR SIGIR2002 Discrimination/Representation Specificity/Exhaustivity Representation
© CvR SIGIR2002 NLP Montague Semantics Language Models Stochastic
© CvR SIGIR2002
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
© CvR1 The Geometry of IR Keith van Rijsbergen Tampere 15 th August, 2002 (lost in Hilbert space!)
Information Retrieval and Organisation Chapter 11 Probabilistic Information Retrieval Dell Zhang Birkbeck, University of London.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture Probabilistic Information Retrieval.
Weighting and Matching against Indices. Zipf’s Law In any corpus, such as the AIT, we can count how often each word occurs in the corpus as a whole =
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Evaluation of Retrieval Effectiveness 1.
Probabilistic Models in IR Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata Using majority of the slides from.
Term Weighting approaches in automatic text retrieval. Presented by Ehsan.
INSTRUCTOR: DR.NICK EVANGELOPOULOS PRESENTED BY: QIUXIA WU CHAPTER 2 Information retrieval DSCI 5240.
Relevance Feedback User tells system whether returned/disseminated documents are relevant to query/information need or not Feedback: usually positive sometimes.
Search and Retrieval: Finding Out About Prof. Marti Hearst SIMS 202, Lecture 18.
Introduction to Information Retrieval Introduction to Information Retrieval Probabilistic Information Retrieval Chris Manning, Pandu Nayak and Prabhakar.
Query Operations J. H. Wang Mar. 26, The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text.
Information retrieval: overview. Information Retrieval and Text Processing Huge literature dating back to the 1950’s! SIGIR/TREC - home for much of this.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Chapter 6: Information Retrieval and Web Search Dr. Mehmet S. Aktaş Acknowledgement: Thanks to Dr. Bing Liu for teaching materials.
Chapter 5: Information Retrieval and Web Search An introduction.
Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.
Information Retrieval Models: Probabilistic Models ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Search Engines and Information Retrieval Chapter 1.
1 Evaluating the Performance of IR Sytems. 2 Outline Summary of IR system. –indexing: stop list, stemming, term weights –file organisation for term indexes.
Natural Language Processing for Information Retrieval D a v i d D. L e w i s AT&T Bell Lab.’s K a r e n S p a r c k J o n e s University of Cambridge Ferhat.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Probabilistic Ranking Principle Hongning Wang
CIS 8590 – Fall 2008 NLP 1 Introduction to Information Retrieval Slides by me,
ISP 433/533 Week 2 IR Models. Outline IR defined IR tasks IR processes Boolean model Break Vector space model Probabilistic model.
Recall: Query Reformulation Approaches 1. Relevance feedback based vector model (Rocchio …) probabilistic model (Robertson & Sparck Jones, Croft…) 2. Cluster.
Jhu-hlt-2004 © n.j. belkin 1 Information Retrieval: A Quick Overview Nicholas J. Belkin
1 CS 430 / INFO 430 Information Retrieval Lecture 10 Probabilistic Information Retrieval.
Chapter 5: Introduction to Information Retrieval.
Information Retrieval Model Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
WEB MINING. Why IR ？ Research & Fun
Indexing and Representation: The Vector Space Model Document represented by a vector of terms Document represented by a vector of terms Words (or word.
University of Malta CSA3080: Lecture 6 © Chris Staff 1 of 20 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
1 Computing Relevance, Similarity: The Vector Space Model.
Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni
Web Search – Summer Term 2006 I. General Introduction (c) Wolfgang Hürst, Albert-Ludwigs-University.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Information Retrieval Models: Vector Space Models ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Introduction to Information Retrieval Probabilistic Information Retrieval Chapter 11 1.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
© 2017 SlidePlayer.com Inc. All rights reserved.