Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa.

Similar presentations


Presentation on theme: "A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa."— Presentation transcript:

1 A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa

2 We wish to count the number of occurrences of various items from a very large domain. To gain space efficiency, we are willing to tolerate an “approximate count” only. Approximate Counting

3 Bloom Filters An array BF of m bits and k hash functions {h 1,…,h k } over the domain [0,…,m-1] Adding an object obj to the Bloom filter is done by computing h 1 (obj),…, h k (obj) and setting the corresponding bits in BF Checking for set membership for an object cand is done by computing h 1 (cand),…, h k (cand) and verifying that all corresponding bits are set m=11, k=3, 111 h 1 (o1)=0, h 2 (o1)=7, h 3 (o1)=5 BF= h 1 (o2)=0, h 2 (o2)=7, h 3 (o2)=4 √ ×

4 Counting Bloom Filters A vector of counters (instead of bits) A counting Bloom filter supports the operations: – Increment Increment by 1 all entries that correspond to the results of the k hash functions – Decrement Decrement by 1 all entries that correspond to the results of the k hash functions – Estimate (instead of get) Return the minimal value of all corresponding entries m=11 368 k=3, h 1 (o1)=0, h 2 (o1)=7, h 3 (o1)=5 CBF= Estimate(o1)=4 4 9 7

5 Give up the ability to Decrement in favor of accuracy/space efficiency – During an Increment operation, only update the lowest counters m=11 368 k=3, h 1 (o1)=0, h 2 (o1)=7, h 3 (o1)=5 SBF-MI= Increment(o1) only adds to the first entry (3->4) 4 Empirically shown to improve accuracy! Up to two orders of magnitude for some workloads. – But not formally understood. Conservative Update Technique

6 Motivation Applications: – Network messurements and heavy hitters. – Network security: anomaly detection. – Cache admission policy Additional applications in other fields: e.g. databases and natural language processing.

7 TinyLFU - Cache Admission Policy (PDP 2014) Frequency Rank The access distribution of most content is skewed ▫ Often modeled using Zipf-like functions, power-law, etc. Long Heavy Tail For example~(50% of the weight) A small number of very popular items For example~(50% of the weight)

8 Cache Victim Winner Eviction and Admission Policies Eviction Policy Admission Policy New Item One of you guys should leave… is the new item any better than the victim? What is the common Answer?

9 Conservative Update allows counting just the head items, with high accuracy, so our cache can make educated admission decisions. Undesired Desired Items Conservative Update - Intuition

10 Admission Policy Example More memory Better cache management Without admission policy Frequency based admission policy Cache Size Hit Rate

11 The Basic Observation CBF = LCS = 111 111 2 2 2 111 1 1 If we can quantify how many items are inserted to each level in the LCS we can bound the error. A CBF is exactly like

12 Simple Observations It is useful to discuss the number of items that are inserted to each level of the LCS. Since all levels are considered the same – the false positive probability of each level is determined only by the number of items inserted to that level. A false positive at a higher level implies false positive at all lower levels.

13 Known (constant) distribution Large enough sample – We assume that we can make a ‘characteristic’ histogram. Formally we know how many items are going to appear every number of times. The Model

14 Denote A[i] - the number of items that are actually inserted to level i. By definition: A min/max argument about the lowest level that could have experienced a false positive yields the following: Lower Bound

15 Upper Bound Is derived similar by upper bounding A[i]. Requires a bit further assumptions. Technical details in the paper.

16 Accurate Configuration – Uniform

17 Accurate Configuration – Zipf 1

18 Inaccurate Configuration – Uniform

19 Inaccurate Configuration – Zipf 1

20 Real Trace – Counting TCP packets

21 Summery A simple analysis to an extensively used approximate counting optimization. First to analyze it for general distributions Lower and upper bounds on model Good indicator on real workloads. An extended version published as tech report. Thank You


Download ppt "A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa."

Similar presentations


Ads by Google