Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advanced Search Features Dr. Susan Gauch. Pruning Search Results  If a query term has many postings  It is inefficient to add all postings to the accumulator.

Similar presentations


Presentation on theme: "Advanced Search Features Dr. Susan Gauch. Pruning Search Results  If a query term has many postings  It is inefficient to add all postings to the accumulator."— Presentation transcript:

1 Advanced Search Features Dr. Susan Gauch

2 Pruning Search Results  If a query term has many postings  It is inefficient to add all postings to the accumulator and then sort the results  Just reading all postings from the inverted file is not scalable when a word may be in a billion documents  So, process highest weighted postings for a given query term  How many to use?  Several thousand so that we have the chance of adding weights from multiple query terms for a given document

3 Pruning Search Results  Implementation  Must sort all postings for a given term by weight during indexing  Since all postings for a given term have same idf  Sort postings by rtf during indexing  Can also affect incremental indexing  Kept P postings (max) for any given term  Sorted in order by rtf  If only processing p postings per term (max) at query time, only keep P = p*4 in inverted file  Run experiments on P  How many postings do you need to process to get unchanged top results

4 Pruning Search Results  Incremental Indexing  Puts a bound on possible growth of postings file  Only ever storing P postings for a given term  Makes adding to the postings slower  Must insert new posting in right location in list of postings for the term by weight  Have a max of P postings per term  Can pre-allocate P posting records per term  Never have to move postings around

5 Bounded Accumulator  If you create a bounded size accumulator  Want it to store the highest weighted results  Can achieve best results by adding highest postings to accumulator first  Then make minor adjustments by adding lower weight postings  This is achieved by processing query terms with highest idf first

6 Wildcards  Usually not implemented in web search engines  Wildcards at the end:  Nation*  Matches nation, nations, nationality, nationalization, …  Requires:  Sorted dictionary (inefficient; could use B+ Tree instead of hashtable)  Stemming:  Map words to stems during indexing  Store stems in dict file


Download ppt "Advanced Search Features Dr. Susan Gauch. Pruning Search Results  If a query term has many postings  It is inefficient to add all postings to the accumulator."

Similar presentations


Ads by Google