1Bloom Filters Lookup questions: Does item “ x ” exist in a set or multiset? Data set may be very big or expensive to access. Filter lookup questions with.

Slides:



Advertisements
Similar presentations
Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol Li Fan, Pei Cao and Jussara Almeida University of Wisconsin-Madison Andrei Broder Compaq/DEC.
Advertisements

©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
SIGMOD 2006University of Alberta1 Approximately Detecting Duplicates for Streaming Data using Stable Bloom Filters Presented by Fan Deng Joint work with.
Skills: none Concepts: Web client (browser), Web server, network connection, URL, mobile client, protocol This work is licensed under a Creative Commons.
Hit or Miss ? !!!.  Cache RAM is high-speed memory (usually SRAM).  The Cache stores frequently requested data.  If the CPU needs data, it will check.
Hit or Miss ? !!!.  Small size.  Simple and fast.  Implementable with hardware.  Does not need too much power.  Does not predict miss if we have.
CSC1016 Coursework Clarification Derek Mortimer March 2010.
Bloom Filters Kira Radinsky Slides based on material from:
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
1 GroCoca: Group-based Peer-to-Peer Cooperative Caching in Mobile Environment Authors: Chi-Yin Chow, Hong Va Leong, and Alvin T. S. Chan Present: I-Wei.
1 CSE 380 Computer Operating Systems Instructor: Insup Lee University of Pennsylvania, Fall 2002 Lecture Note: Memory Management.
Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP.
Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol By Abuzafor Rasal and Vinoth Rayappan.
Storage Management and Caching in PAST, a large-scale, persistent peer- to-peer storage utility Authors: Antony Rowstorn (Microsoft Research) Peter Druschel.
Ph.D. DefenceUniversity of Alberta1 Approximation Algorithms for Frequency Related Query Processing on Streaming Data Presented by Fan Deng Supervisor:
Ph.D. SeminarUniversity of Alberta1 Approximation Algorithms for Frequency Related Query Processing on Streaming Data Presented by Fan Deng Supervisor:
Web Servers How do our requests for resources on the Internet get handled? Can they be located anywhere? Global?
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #13 Web Caching Protocols ICP, CARP.
Internet Networking Spring 2002 Tutorial 13 Web Caching Protocols ICP, CARP.
Basic Data Structures for IP lookups and Packet Classification
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
Hashing General idea: Get a large array
Proteus: Power Proportional Memory Cache Cluster in Data Centers Shen Li, Shiguang Wang, Fan Yang, Shaohan Hu, Fatemeh Saremi, Tarek Abdelzaher.
Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Note to other teachers and users of these.
Web Caching Schemes For The Internet – cont. By Jia Wang.
1 The Mystery of Cooperative Web Caching 2 b b Web caching : is a process implemented by a caching proxy to improve the efficiency of the web. It reduces.
Public Key Encryption that Allows PIR Queries Dan Boneh 、 Eyal Kushilevitz 、 Rafail Ostrovsky and William E. Skeith Crypto 2007.
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates
Web Proxy Server Anagh Pathak Jesus Cervantes Henry Tjhen Luis Luna.
Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.
CS2110 Recitation Week 8. Hashing Hashing: An implementation of a set. It provides O(1) expected time for set operations Set operations Make the set empty.
Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman.
New Protocols for Remote File Synchronization Based on Erasure Codes Utku Irmak Svilen Mihaylov Torsten Suel Polytechnic University.
Lecture 21 Last lecture Today’s lecture Cache Memory Virtual memory
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
Web Prefetching Between Low-Bandwidth Clients and Proxies : Potential and Performance Li Fan, Pei Cao and Wei Lin Quinn Jacobson (University of Wisconsin-Madsion)
The Anatomy of a Large-Scale Hypertextual Web Search Engine Presented By: Sibin G. Peter Instructor: Dr. R.M.Verma.
Compact Data Structures and Applications Gil Einziger and Roy Friedman Technion, Haifa.
1 Lecture 11: Bloom Filters, Final Review December 7, 2011 Dan Suciu -- CSEP544 Fall 2011.
TinyLFU: A Highly Efficient Cache Admission Policy
Efficient Peer to Peer Keyword Searching Nathan Gray.
Web Performance 성민영 SNU Computer Systems lab.. 2 차례 4 Modeling the Performance of HTTP Over Several Transport Protocols. 4 Summary Cache : A Scaleable.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
Dr. Yingwu Zhu Summary Cache : A Scalable Wide- Area Web Cache Sharing Protocol.
Computer Architecture Memory organization. Types of Memory Cache Memory Serves as a buffer for frequently accessed data Small  High Cost RAM (Main Memory)
CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
Jennifer Rexford Princeton University MW 11:00am-12:20pm Measurement COS 597E: Software Defined Networking.
Spatial Issues in DBGlobe Dieter Pfoser. Location Parameter in Services Entering the harbor (x,y position)… …triggers information request.
ICP and the Squid Web Cache Duanc Wessels k Claffy August 13, 1997 元智大學系統實驗室 宮春富 2000/01/26.
The Bloom Paradox Ori Rottenstreich Joint work with Yossi Kanizo and Isaac Keslassy Technion, Israel.
Storage Structures. Memory Hierarchies Primary Storage –Registers –Cache memory –RAM Secondary Storage –Magnetic disks –Magnetic tape –CDROM (read-only.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
Standard Template Library The Standard Template Library was recently added to standard C++. –The STL contains generic template classes. –The STL permits.
Data Structures & Algorithms
HTTP evolution - TCP/IP issues Lecture 4 CM David De Roure
The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.
Bloom Cookies: Web Search Personalization without User Tracking Authors: Nitesh Mor, Oriana Riva, Suman Nath, and John Kubiatowicz Presented by Ben Summers.
1 Efficient System-on-Chip Energy Management with a Segmented Counting Bloom Filter Mrinmoy Ghosh- Georgia Tech Emre Özer- ARM Ltd Stuart Biles- ARM Ltd.
Bloom Filters. Lecture on Bloom Filters Not described in the textbook ! Lecture based in part on: Broder, Andrei; Mitzenmacher, Michael (2005), "Network.
Cache Digest Alex Rousskov Duane Wessels National Laboratory for Applied Network Research April 17, 1998 元智大學 資訊工程研究所 系統實驗室 陳桂慧 February 9, 1999.
Internet Cache Protocol Erez Tal Assaf Oren Avner Cohen Submission Date: 5/2/01 Guides: Ran Wolff and Itai Dabran.
Duplicate Detection in Click Streams(2005) SubtitleAhmed Metwally Divyakant Agrawal Amr El Abbadi Tian Wang.
Computer Architecture
The Variable-Increment Counting Bloom Filter
Cache Memory Presentation I
Internet Networking recitation #12
Hash Functions for Network Applications (II)
Lecture 1: Bloom Filters
Presentation transcript:

1Bloom Filters Lookup questions: Does item “ x ” exist in a set or multiset? Data set may be very big or expensive to access. Filter lookup questions with negative results before accessing data. Allow false positive errors, as they only cost us an extra data access. Don ’ t allow false negative errors, because they result in wrong answers.

2Bloom Filters Bloom Filter [B70] Encoding an attribute a  U Maintain a Bit Vector V of size m Use k hash functions (h 1..h k ), h i : U  [1..m] Encoding: For item x, “ turn on ” bits V[h 1 (x)]..V[h k (x)]. Lookup: Check bits V[h 1 (i)]..V[h k (i)]. If all equal 1, return “ Probably Yes ”. Else “ Definitely No ”.

3Bloom Filters Bloom Filter x h 1 (x)h 2 (x)h k (x) V0V0 V m-1 h 3 (x)

4Bloom Filters Bloom Errors h 1 (x)h 2 (x)h k (x) V0V0 V m-1 h 3 (x) abcd x didn’t appear, yet its bits are already set

5Bloom Filters Error Estimation Assumption: Hash functions are perfectly random Probability of a bit being 0 after hashing all elements: Let p=e -kn/m, probability of a false positive is: Assuming we are given m and n, the optimal k is: k=#hash functions n=# keys m=vector length k random bits are of value 1

6Bloom Filters Bloom Filter Tradeoffs Three factors: m,k and n. Normally, n and m are given, and we select k. Small k –Less computations. –Actual number of bits accessed (nk) is smaller, so the chance of a “ step over ” is smaller too. –However, less bits (k) need to be stepped over to generate an error. For big k, the exact opposite holds. Not surprisingly, when k is optimal, the “ hit ratio ” (ratio of bits flipped in the array) is exactly 0.5 k=#hash functions n=# keys m=vector length

7Bloom Filters Summary Cache [FCAB00] Proxy servers maintain local cache to minimize expensive internet requests. Proxy must maintain an efficient lookup method into the cache. The lookup structure must be stored in DRAM for performance. Structure must be compact, as DRAM is expensive and is used for “ Hot Items ” storage and more. Pages are usually replaced in the cache using an LRU algorithm.

8Bloom Filters ICP – Request Handling Client Proxy Cache Proxy Cache Proxy Cache Proxy Cache Internet

9Bloom Filters Internet Cache Protocol (ICP) Allows for scaling-out when using proxies. Protocol that supports discovery and retrieval of documents from neighboring caches. Establish an hierarchy of proxy caches If page not found in local proxy cache, it searches for the page in neighboring proxies. If page not found anywhere, fetch it from the internet.

10Bloom Filters ICP – Request Handling Client Proxy Cache Proxy Cache Proxy Cache Proxy Cache Internet

11Bloom Filters Summary Cache Each proxy maintains a Bloom Filter representing its local cache. Also, it holds Bloom Filters representing caches of other proxies. Updates to Bloom Filters are exchanged periodically or after a certain percentage of the documents in the cache was replaced. ICP request is sent only to proxy who supposedly holds the requested document.

12Bloom Filters ICP – With Summary Cache Client Internet Proxy Cache Proxy Cache Proxy Cache Proxy Cache

13Bloom Filters Summary Cache – Bloom Filters To support deletions and updates, the proxy maintains the Bloom Filter and also an array of counters C, initially set to 0. The Bloom Filter is filled with the contents of the cache. Each bit in the BF is allowed 4 bits for its counter. On insert of item i, all C[h j (i)] are increased (to a maximum of 15). On deletion of item i, counters are decreased. When C[i] increases from 0 to 1, V[i] is turned on. When C[i] decreases from 1 to 0, V[i] is turned off.

14Bloom Filters Summary Cache – Bloom Filters Hashing scheme –Generate 128 bits using MD5 on the URL. –Divide to segments of M bits (usually 32) –Calculate modulus of segments by m, providing 128/M hash values (4, for 32 bit segments) –If 128 bits are not enough, calculate MD5 of URL concatenated with itself. Bloom Filter Exchange –Header contains MD5 properties, size of array. –If refresh rate is high, send only deltas. –Bit counts are internal and not exchanged. –Otherwise, send entire Bloom Filter.

15Bloom Filters Summary Cache - Errors False Misses –Document requested is cached at some remote proxy, but summary does not reflect that fact. –Hit ratio is reduce, a redundant internet access is performed. False Hits –Document is not at a remote proxy, but summary suggests that it is. –An Inter-Proxy query message is wasted. Remote Stale Hits –Document is cached at a remote proxy, but is stale. –Occurs in both ICP and Summary Cache. –Might not be a totally wasted effort, as delta compression can be used.

16Bloom Filters Implementation - Squid Squid – A publicly available web proxy cache software. Summary Cache is implemented in Squid v A variation called cache digest is implemented in Squid 1.2b20