Search Engine Caching Rank-preserving two-level caching for scalable search engines, Paricia Correia Saraiva et al, September 2001

Slides:



Advertisements
Similar presentations
Chapter 4 Memory Management Basic memory management Swapping
Advertisements

The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms (ACM SIGMETRIC 05 ) ACM International Conference on Measurement & Modeling.
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
ARC: A SELF-TUNING, LOW OVERHEAD REPLACEMENT CACHE
1 Cache and Caching David Sands CS 147 Spring 08 Dr. Sin-Min Lee.
Qinqing Gan Torsten Suel Improved Techniques for Result Caching in Web Search Engines Presenter: Arghyadip ● Konark.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Cache Definition Cache is pronounced cash. It is a temporary memory to store duplicate data that is originally stored elsewhere. Cache is used when the.
Chapter 8 Virtual Memory Operating Systems: Internals and Design Principles, 6/E William Stallings Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice.
Memory/Storage Architecture Lab Computer Architecture Virtual Memory.
Virtual Memory Chapter 8.
Review CPSC 321 Andreas Klappenecker Announcements Tuesday, November 30, midterm exam.
Virtual Memory Chapter 8. Hardware and Control Structures Memory references are dynamically translated into physical addresses at run time –A process.
Memory Management and Paging CSCI 3753 Operating Systems Spring 2005 Prof. Rick Han.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Memory Management 2010.
Virtual Memory Chapter 8.
1 Virtual Memory Chapter 8. 2 Hardware and Control Structures Memory references are dynamically translated into physical addresses at run time –A process.
Chapter 3.2 : Virtual Memory
1 Lecture 9: Virtual Memory Operating System I Spring 2007.
1 Chapter 8 Virtual Memory Virtual memory is a storage allocation scheme in which secondary memory can be addressed as though it were part of main memory.
Internet Cache Pollution Attacks and Countermeasures Yan Gao, Leiwen Deng, Aleksandar Kuzmanovic, and Yan Chen Electrical Engineering and Computer Science.
1  2004 Morgan Kaufmann Publishers Chapter Seven.
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy (Part II)
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
CS 241 Section Week #12 (04/22/10).
By Ravi Shankar Dubasi Sivani Kavuri A Popularity-Based Prediction Model for Web Prefetching.
Lecture 19: Virtual Memory
« Performance of Compressed Inverted List Caching in Search Engines » Proceedings of the International World Wide Web Conference Commitee, Beijing 2008)
1 Chapter 3.2 : Virtual Memory What is virtual memory? What is virtual memory? Virtual memory management schemes Virtual memory management schemes Paging.
Operating Systems COMP 4850/CISG 5550 Page Tables TLBs Inverted Page Tables Dr. James Money.
Qingqing Gan Torsten Suel CSE Department Polytechnic Institute of NYU Improved Techniques for Result Caching in Web Search Engines.
Virtual Memory Chapter 8. Hardware and Control Structures Memory references are dynamically translated into physical addresses at run time –A process.
1 Virtual Memory Chapter 8. 2 Hardware and Control Structures Memory references are dynamically translated into physical addresses at run time –A process.
An Effective Disk Caching Algorithm in Data Grid Why Disk Caching in Data Grids?  It takes a long latency (up to several minutes) to load data from a.
The Anatomy of a Large-Scale Hyper textual Web Search Engine S. Brin, L. Page Presenter :- Abhishek Taneja.
Virtual Memory 1 1.
1 Memory Management. 2 Fixed Partitions Legend Free Space 0k 4k 16k 64k 128k Internal fragmentation (cannot be reallocated) Divide memory into n (possible.
Virtual Memory Chapter 8. Hardware and Control Structures Memory references are dynamically translated into physical addresses at run time –A process.
Predictive Caching and Prefetching of Query Results in Search Engines Based on a Paper by: Ronny Lempel and Shlomo Moran Presentation: Omid Fatemieh CS598CXZ.
Client Cache Management Improving the broadcast for one probability access distribution will hurt the performance of other clients with different access.
11 Intro to cache memory Kosarev Nikolay MIPT Nov, 2009.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
Doug Raiford Phage class: introduction to sequence databases.
1  1998 Morgan Kaufmann Publishers Chapter Seven.
Page Table Implementation. Readings r Silbershatz et al:
Mining di Dati Web Web Search Engine ’ s Query Log Mining A.A 2006/2007.
Lectures 8 & 9 Virtual Memory - Paging & Segmentation System Design.
1 Lecture 8: Virtual Memory Operating System Fall 2006.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
On Caching Search Engine Query Results Evangelos Markatos Evangelos Markatoshttp://archvlsi.ics.forth.gr/OS/os.html Computer Architecture and VLSI Systems.
Page Replacement FIFO, LIFO, LRU, NUR, Second chance
Virtual Memory (Section 9.3). The Need For Virtual Memory Many computers don’t have enough memory in RAM to accommodate all the programs a user wants.
Virtual Memory Chapter 8.
Evaluation Anisio Lacerda.
Virtual Memory Chapter 7.4.
Memshare: a Dynamic Multi-tenant Key-value Cache
Computer Architecture
Database Management Systems (CS 564)
Copyright © 2011, Elsevier Inc. All rights Reserved.
Chapter 8 Virtual Memory
Virtual Memory Chapter 8.
Lecture 10: Virtual Memory
Virtual Memory فصل هشتم.
Zipf-Distributions & Caching
Data Mining Chapter 6 Search Engines
Contents Memory types & memory hierarchy Virtual memory (VM)
CSC3050 – Computer Architecture
Computer Architecture
Presentation transcript:

Search Engine Caching Rank-preserving two-level caching for scalable search engines, Paricia Correia Saraiva et al, September Predictive Caching and Prefetching of Query Results in Search Engines, Ronny Lempel and Shlomo Moran, September Presented by Adam "So, is this gonna be on the test?" Edelman

The Problem The User: "I want my results now!" But... –Over 4 billion web pages –Over 1 million queries per minute How do we keep response times down as the web grows?

Search Engine Statistics 63.7% of the search phrases appear only once in the billion query log The 25 most popular queries in the log account for 1.5% of the submissions Considerable time and processing power can be saved through well implemented caching

Search Engine Statistics 58% of the users view only the first page of results (the top-10 results) No more than 12% of users browse through more than 3 result pages. We do not need to cache large result sets for a given query

What do we Cache? 36% of all queries have been retrieved before Can we apply caching even if the query does not exactly match any previous query?

What do we Cache? Saraiva et. al propose a two level cache In addition to caching query results, we also cache inverted lists for popular terms

Query Cache Implementation Store only the first 50 references per query –~25KB per query Query logs show that the miss ratios do not drastically improve after query result cache exceeds 10 MB

Inverted List Cache Implementation For this data set 50-75% of inverted lists contain documents where term appears only once Use 4KB inverted list size per term –More work needs to be done Asymptotic behavior is apparent after cache exceeds 200MB Use 250MB for IL Cache

Two-Level Cache Implementation Combine previous two caches 270MB total cache –Accounts for only 6.5% of overall index size Tested over a log of 100K queries to TodoBR

Two-Level Cache Results Compared to caches of 270MB for only query results, only inverted lists and no cache Queries processed reduced by 62% –21% increase compared to only query result cache Page fetches from the database reduced 95% –3% increase compared to only inverted list cache

Two-Level Cache Results For more than 20 queries per second two- level cache is 20% disk reads of no cache Two-level cache can handle 64 queries per second against 22 per second with no cache

How do we cache? Saraiva et al use a least recently used (LRU) replacement policy for cache maintenance Users search in sessions, the next query will probably be related to the previous query Can we use this to improve caching?

Probability Driven Cache (PDC) Lempel and Moran propose a cache based on the probability of a page being requested

Page Least Recently Used (PLRU) Allocate a page queue that can accommodate a certain number of result pages When the queue is full and a new page needs to be cached, the least recently used page is removed from the cach Achieves hit ratios around 30% for warm, large caches

Page Segmented LRU (PSLRU) Maintains two LRU segments, a protected segment and a probationary segment Pages are first placed in the probationary segment, if requested again they are moved to the protected segment Pages evicted from the protected segment are moved to the probationary segment Pages evicted from the probationary segment are removed from the cache Consistently outperforms PLRU although difference is very small

Topic LRU (TLRU) Let t(q) denote the topic of the query q After the cache is warm, any cached result page of t(q) is moved to the tail of the queue. Each topic’s pages will reside contiguously in the queue

Topic SLRU (TSLRU) All pages are initially inserted in the probationary segment In addition to promoting pages from probationary to protected, we also promote all pages of t(q)