Presentation is loading. Please wait.

Presentation is loading. Please wait.

Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.

Similar presentations


Presentation on theme: "Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing."— Presentation transcript:

1 Modern Information Retrieval Chapter 2 Modeling

2 Can keywords be used to represent a document or a query? keywords as query and matching as query processing cannot generate good results, in general ranking algorithm, document relevance and IR model

3 Taxonomy of IR models

4 Ad hoc and filtering retrieval ad hoc retrieval: static document collection, queries submitted filtering retrieval: static queries, document streaming  user profile describes user ’ s preference  keywords, relevance feedback and dynamic keywords adjustment

5 Formal characterization of IR models

6 Classic IR Index terms  deciding on the importance of a term is difficult  consider a term ’ s semantics as well as its distribution in all documents  weight ’ s are used to quantify the importance of the index terms for describing the document contents

7  mutual independence assumption simplifies the task of fast ranking computation

8 Boolean model index term weights are binary query as a Boolean expression  not, and, or as connectives  Users might find it difficult to specify their information needs dominant model for commercial systems

9 advantages and disadvantages  each document is either relevant or non- relevant given = (0,1,0), is document d j an answer?

10 Vector model given a set of index terms, allows partial matching and ranking by a similarity measure coordinate matching  the number of query index terms contained in a document decides the similarity degree  three drawbacks: term frequency, term scarcity, document size

11 sim(d j,q) = d j ‧ q  favor long documents sim(d j,q) = (d j ‧ q) / ︱ d j ︱ sim(d j,q) = 1 - D(d j,q)  discriminate against long documents

12

13 Computing index term weights  term frequency, tf factor: how well the term describes the document contents  inverse document frequency, idf factor: how well the term represents the document  how to balance these two effects?

14

15 the term-weighting scheme improves retrieval performance the partial matching strategy allows approximate query results the results are ranked by the similarity degree the vector model is a popular retrieval model nowadays due to its simplicity and performance


Download ppt "Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing."

Similar presentations


Ads by Google