Approximate Counting Algorithm Ariel Rosenfeld
Counter Counter ranges from 0 to M requiers log2M bits. For large data log2M is still a lot. Using probability to reduce to log2log2M bits. Small probability of errors.
The Idea Counting of a large number of events using a small amount of memory, while incorporating some probability. 1977 by Robert Morris. 1982 analyzed by Philippe Flajolet.
Applications Gathering statistics on a large number of events Streaming data frequency Data compression Etc..
Counting Because we give up accuracy, we use 2k approximation and only keep the exponent. Representing if the approximate number is M, we only keep 2k =M in binary form. Log2log2 M How do we know when to increase k?
Probability! Generate "c" pseudo-random bits If all are 1 "c" = current value of the counter If all are 1 What is the probability? How to check it efficiently? Simply add the result to the counter.
Example
Another view
Analysis What is the probability of increment? After N increments (probabilistic explanation in article) E(2C) = n+2 Var(2C) = n(n+ 1)/2 Small chance to be “far off”.
Example Increase was called 1024 times. Correct value should be 10. Chance of being more than 1 off is ~8%.