INSTRUCTOR: DR.NICK EVANGELOPOULOS PRESENTED BY: QIUXIA WU CHAPTER 2 Information retrieval DSCI 5240.

Slides:



Advertisements
Similar presentations
Information Retrieval and Organisation Chapter 11 Probabilistic Information Retrieval Dell Zhang Birkbeck, University of London.
Advertisements

Traditional IR models Jian-Yun Nie.
Chapter 5: Introduction to Information Retrieval
INFO624 - Week 2 Models of Information Retrieval Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.
Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
Multimedia Database Systems
Modern Information Retrieval Chapter 1: Introduction
Introduction to Information Retrieval (Part 2) By Evren Ermis.
Modern Information Retrieval by R. Baeza-Yates and B. Ribeiro-Neto
Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model.
Web Search - Summer Term 2006 II. Information Retrieval (Basics Cont.)
IR Models: Overview, Boolean, and Vector
Information Retrieval Ling573 NLP Systems and Applications April 26, 2011.
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
ISP 433/533 Week 2 IR Models.
Chapter 5: Query Operations Baeza-Yates, 1999 Modern Information Retrieval.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 11: Probabilistic Information Retrieval.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.
Modern Information Retrieval Chapter 2 Modeling. Probabilistic model the appearance or absent of an index term in a document is interpreted either as.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Chapter 2Modeling 資工 4B 陳建勳. Introduction.  Traditional information retrieval systems usually adopt index terms to index and retrieve documents.
Modeling Modern Information Retrieval
Query Reformulation: User Relevance Feedback. Introduction Difficulty of formulating user queries –Users have insufficient knowledge of the collection.
Vector Space Model CS 652 Information Extraction and Integration.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
IR Models: Review Vector Model and Probabilistic.
WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.
Chapter 5: Information Retrieval and Web Search
INFORMATION RETRIEVAL VECTOR SPACE MODEL IN-DEPTH PART 2 Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
Query Operations J. H. Wang Mar. 26, The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text.
Information Retrieval Models - 1 Boolean. Introduction IR systems usually adopt index terms to process queries Index terms:  A keyword or group of selected.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
IR Models J. H. Wang Mar. 11, The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y.
Chapter 6: Information Retrieval and Web Search
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
University of Malta CSA3080: Lecture 6 © Chris Staff 1 of 20 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
1 University of Palestine Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester
Lecture 1: Overview of IR Maya Ramanath. Who hasn’t used Google? Why did Google return these results first ? Can we improve on it? Is this a good result.
Boolean Model Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
Chapter 23: Probabilistic Language Models April 13, 2004.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
1 Patrick Lambrix Department of Computer and Information Science Linköpings universitet Information Retrieval.
1 Information Retrieval LECTURE 1 : Introduction.
Information Retrieval
The Boolean Model Simple model based on set theory
C.Watterscsci64031 Classical IR Models. C.Watterscsci64032 Goal Hit set of relevant documents Ranked set Best match Answer.
Set Theoretic Models 1. IR Models Non-Overlapping Lists Proximal Nodes Structured Models Retrieval: Adhoc Filtering Browsing U s e r T a s k Classic Models.
Introduction n IR systems usually adopt index terms to process queries n Index term: u a keyword or group of selected words u any word (more general) n.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture Probabilistic Information Retrieval.
CS315 Introduction to Information Retrieval Boolean Search 1.
Introduction to Information Retrieval Probabilistic Information Retrieval Chapter 11 1.
Tutorial#3.
Ranking in IR and WWW Modern Information Retrieval: A Brief Overview
Information Retrieval on the World Wide Web
موضوع پروژه : بازیابی اطلاعات Information Retrieval
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Introduction to Information Retrieval
CS 430: Information Discovery
Recuperação de Informação B
Information Retrieval and Web Design
Recuperação de Informação B
Information Retrieval and Web Design
Advanced information retrieval
Introduction to information retrieval
ADVANCED TOPICS IN INFORMATION RETRIEVAL AND WEB SEARCH
Presentation transcript:

INSTRUCTOR: DR.NICK EVANGELOPOULOS PRESENTED BY: QIUXIA WU CHAPTER 2 Information retrieval DSCI 5240

Introduction Definition: Information retrieval is the activity of obtaining information resources relevant to an information need from a collection of information resources.

History of Modern IR For over 4000 years, humans have been designing tools to improve information storage and retrieval. Vannevar Bush 1945 paper: “As We May Think” The 1 st automated information retrieval systems (1950s and 1960) SMART (the System for the Manipulation and Retrieval of Text  conceived at Harvard University and flourished at Cornell University  under the leadership of Gerard Salton  the first practical implementation of an IR system The basic theoretical foundations of SMART still play a major role in today’s IR systems.

Modern Information Retrieval Document representation  Using keywords  Relative weight of keywords Query representation  Keywords  Relative importance of keywords

Retrieval Models Retrieval models match query with documents to:  separate documents into relevant an non-relevant class  rank the documents according to the relevance

Retrieval Models Boolean model Vector space model Probabilistic models

Boolean Retrieval Model One of the simplest and most efficient retrieval mechanisms Based on set theory and Boolean algebra Conventional numeric representations of false as 0 and true as 1 Boolean model is interested only in the presence or absence of a term in a document In the term-document matrix replace all the nonzero values with 1

Boolean Model: Advantages Simplicity and efficiency of implementation Binary values can be stored using bits  reduced storage requirements  retrieval using bitwise operations is efficient Boolean retrieval was adopted by many commercial bibliographic systems Boolean queries are akin to database queries Bibliographic systems:  database systems, instead of information retrieval systems

Boolean Model: Disadvantages A document is either relevant or nonrelevant to the query It is not possible to assign a degree of relevance Complicated Boolean queries are difficult for users Boolean queries retrieve too few or too many documents.  K0 and K4 retrieved only 1 out of 6 documents  K0 or K4 retrieved 5 out of a possible 6 documents

Vector Space Model Both the documents and queries as vectors A weight based on the frequency in the document: More sophisticated weighting schemes will be studied later

VSM versus Boolean Model Queries are easier to express: allow users to attach relative weights to terms A descriptive query can be transformed to a query vector similar to documents Matching between a query and a document is not precise: document is allocated a degree of similarity Documents are ranked based on their similarity scores instead of relevant/nonrelevant classes Users can go through the ranked list until their information needs are met

Probabilistic Retrieval Model Sparck-Jones (1976): classical probabilistic retrieval model, also known as the binary independence retrieval model Formulates IR in probabilistic framework

Comments on Probabilistic Retrieval Probabilistic independence model is not realistic Two-stage retrieval is more complicated Performance gain over VSM is debatable

Evaluation of Retrieval Performance Precision VS. Recall F-measure Average precision

Precision and Recall

F measure

Average Precision