Approximate Counting Algorithm

Slides:



Advertisements
Similar presentations
Truthful Mechanisms for Combinatorial Auctions with Subadditive Bidders Speaker: Shahar Dobzinski Based on joint works with Noam Nisan & Michael Schapira.
Advertisements

Raghavendra Madala. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality Guarantee Performance Evaluation Conclusion 2 ICICLES: Self-tuning.
Counting Distinct Objects over Sliding Windows Presented by: Muhammad Aamir Cheema Joint work with Wenjie Zhang, Ying Zhang and Xuemin Lin University of.
Order Statistics Sorted
Algorithms for data streams Foundations of Data Science 2014 Indian Institute of Science Navin Goyal.
Randomized Algorithms Randomized Algorithms CS648 Lecture 6 Reviewing the last 3 lectures Application of Fingerprinting Techniques 1-dimensional Pattern.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 12 June 18, 2006
Relationship Between Sample Data and Population Values You will encounter many situations in business where a sample will be taken from a population, and.
Beyond Bloom Filters: From Approximate Membership Checks to Approximate State Machines By F. Bonomi et al. Presented by Kenny Cheng, Tonny Mak Yui Kuen.
Quantum Computation and Error Correction Ali Soleimani.
Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions.
Estimating Set Expression Cardinalities over Data Streams Sumit Ganguly Minos Garofalakis Rajeev Rastogi Internet Management Research Department Bell Labs,
PROBABILITY AND SAMPLES: THE DISTRIBUTION OF SAMPLE MEANS.
Chapter 9 Numerical Integration Numerical Integration Application: Normal Distributions Copyright © The McGraw-Hill Companies, Inc. Permission required.
Spring 2015 Mathematics in Management Science Binary Linear Codes Two Examples.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Cryptanalysis. The Speaker  Chuck Easttom  
ETM 607 – Random Number and Random Variates
Statistics for Engineer Week II and Week III: Random Variables and Probability Distribution.
Efficient Minimal Perfect Hash Language Models David Guthrie, Mark Hepple, Wei Liu University of Sheffield.
CEDAR Counter-Estimation Decoupling for Approximate Rates Erez Tsidon (Technion, Israel) Joint work with Iddo Hanniel and Isaac Keslassy ( Technion ) 1.
CEDAR Counter-Estimation Decoupling for Approximate Rates Erez Tsidon Joint work with Iddo Hanniel and Isaac Keslassy Technion, Israel 1.
Summer 2004CS 4953 The Hidden Art of Steganography A Brief Introduction to Information Theory  Information theory is a branch of science that deals with.
Outline Transmitters (Chapters 3 and 4, Source Coding and Modulation) (week 1 and 2) Receivers (Chapter 5) (week 3 and 4) Received Signal Synchronization.
ACADs (08-006) Covered Keywords Errors, accuracy, count rate, background, count time, equipment efficiency, sample volume, sample geometry, moisture absorption,
Simulating Probabilistic Behaviour
CS 111 – Sept. 13 Error detection Error correction Review/practice chapter 1 questions Commitment: –Please read sections 2.1 and 2.2.
Streaming Algorithms for Robust, Real-Time Detection of DDoS Attacks S. Ganguly M. Garofalakis R. Rastogi K.Sabnani Indian Inst. Of Tech. India Yahoo!
Calculating frequency moments of Data Stream
Relevance of Complex Network Properties Philippe Giabbanelli «Impact of complex network properties on routing in backbone networks» Philippe Giabbanelli,
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
1 35 th International Colloquium on Automata, Languages and Programming July 8, 2008 Randomized Self-Assembly for Approximate Shapes Robert Schweller University.
REU 2009-Traffic Analysis of IP Networks Daniel S. Allen, Mentor: Dr. Rahul Tripathi Department of Computer Science & Engineering Data Streams Data streams.
The Law of Averages. What does the law of average say? We know that, from the definition of probability, in the long run the frequency of some event will.
Lossless Compression-Statistical Model Lossless Compression One important to note about entropy is that, unlike the thermodynamic measure of entropy,
Discrete Methods in Mathematical Informatics Kunihiko Sadakane The University of Tokyo
Ariel Rosenfeld.  Counter ranges from 0 to M requiers log 2 M bits.  For large data log 2 M is still a lot.  Using probability to reduce to log 2 log.
Floating Point Numbers
SIMILARITY SEARCH The Metric Space Approach
Shannon Entropy Shannon worked at Bell Labs (part of AT&T)
Updating SF-Tree Speaker: Ho Wai Shing.
A Resource-minimalist Flow Size Histogram Estimator
Lecture 22: Linearity Testing Sparse Fourier Transform
A Level Computing Component 2
Chapter 8 Arrays Objectives
CH 8. Image Compression 8.1 Fundamental 8.2 Image compression models
Bloom filters Probability and Computing Michael Mitzenmacher Eli Upfal
COMS E F15 Lecture 2: Median trick + Chernoff, Distinct Count, Impossibility Results Left to the title, a presenter can insert his/her own image.
Lecture 4: CountSketch High Frequencies
Counting Statistics HPT Revision 3 Page of
A Brief Introduction to Information Theory
Stream Ciphers Day 18.
Algorithms Analysis Section 3.3 of Rosen Spring 2017
Creating Subnets – Network Requirements
Farzaneh Mirzazadeh Fall 2007
More Multiplication Properties of Exponents
Chapter 5: Probabilistic Analysis and Randomized Algorithms
Binary “There are 10 types of people in the world: Those who understand binary, and those who don't.”
Decision Trees for Mining Data Streams
Lecture 6: Counting triangles Dynamic graphs & sampling
Non-parametric Filters: Particle Filters
By: Ran Ben Basat, Technion, Israel
Bloom filters From Probability and Computing
Non-parametric Filters: Particle Filters
一種兼顧影像壓縮與資訊隱藏之技術 張 真 誠 國立中正大學資訊工程學系 講座教授
Chapter 5: Probabilistic Analysis and Randomized Algorithms
Algorithms Analysis Section 3.3 of Rosen Spring 2018
Accuracy of Averages.
Maintaining Stream Statistics over Sliding Windows
1/2555 สมศักดิ์ ศิวดำรงพงศ์
Presentation transcript:

Approximate Counting Algorithm Ariel Rosenfeld

Counter Counter ranges from 0 to M requiers log2M bits. For large data log2M is still a lot. Using probability to reduce to log2log2M bits. Small probability of errors.

The Idea Counting of a large number of events using a small amount of memory, while incorporating some probability. 1977 by Robert Morris. 1982 analyzed by Philippe Flajolet.

Applications Gathering statistics on a large number of events Streaming data frequency Data compression Etc..

Counting Because we give up accuracy, we use 2k approximation and only keep the exponent. Representing if the approximate number is M, we only keep 2k =M in binary form. Log2log2 M How do we know when to increase k?

Probability! Generate "c" pseudo-random bits If all are 1 "c" = current value of the counter If all are 1 What is the probability? How to check it efficiently? Simply add the result to the counter. 

Example

Another view

Analysis What is the probability of increment? After N increments (probabilistic explanation in article) E(2C) = n+2 Var(2C) = n(n+ 1)/2 Small chance to be “far off”.

Example Increase was called 1024 times. Correct value should be 10. Chance of being more than 1 off is ~8%.