Download presentation
Presentation is loading. Please wait.
1
Modern Information Retrieval Chapter 2 Modeling
2
Can keywords be used to represent a document or a query? keywords as query and matching as query processing cannot generate good results, in general ranking algorithm, document relevance and IR model
3
Taxonomy of IR models
4
Ad hoc and filtering retrieval ad hoc retrieval: static document collection, queries submitted filtering retrieval: static queries, document streaming user profile describes user ’ s preference keywords, relevance feedback and dynamic keywords adjustment
5
Formal characterization of IR models
6
Classic IR Index terms deciding on the importance of a term is difficult consider a term ’ s semantics as well as its distribution in all documents weight ’ s are used to quantify the importance of the index terms for describing the document contents
7
mutual independence assumption simplifies the task of fast ranking computation
8
Boolean model index term weights are binary query as a Boolean expression not, and, or as connectives Users might find it difficult to specify their information needs dominant model for commercial systems
9
advantages and disadvantages each document is either relevant or non- relevant given = (0,1,0), is document d j an answer?
10
Vector model given a set of index terms, allows partial matching and ranking by a similarity measure coordinate matching the number of query index terms contained in a document decides the similarity degree three drawbacks: term frequency, term scarcity, document size
11
sim(d j,q) = d j ‧ q favor long documents sim(d j,q) = (d j ‧ q) / ︱ d j ︱ sim(d j,q) = 1 - D(d j,q) discriminate against long documents
13
Computing index term weights term frequency, tf factor: how well the term describes the document contents inverse document frequency, idf factor: how well the term represents the document how to balance these two effects?
15
the term-weighting scheme improves retrieval performance the partial matching strategy allows approximate query results the results are ranked by the similarity degree the vector model is a popular retrieval model nowadays due to its simplicity and performance
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.