# When Random Sampling Preserves Privacy Kamalika Chaudhuri U.C.Berkeley Nina Mishra U.Virginia.

## Presentation on theme: "When Random Sampling Preserves Privacy Kamalika Chaudhuri U.C.Berkeley Nina Mishra U.Virginia."— Presentation transcript:

When Random Sampling Preserves Privacy Kamalika Chaudhuri U.C.Berkeley Nina Mishra U.Virginia

The Problem  Setting: Table : Set of rows Sanitizer: Releases each row with probability p  What are the conditions under which this sanitizer preserves privacy? Database Sanitizer Sanitized Database

Search Data  AOL released user search data: Replaced usernames with random ids

Search Data “Berkeley restaurants” “Low degree spanning trees” “Tickets to India” “Privacy sampling” “Airfare Santa Barbara” Kamalika “Traffic on 101N” “Restaurants Mountain View” “Rank Aggregation” “Memory bound functions” “Crypto registration” “Falafel Charlottesville” “Query Auditing” “Clustering streaming” “Tickets to SFO” “Privacy sampling” CynthiaNina

U.S. Census Data  Random sample of preprocessed data: Removing unique values Merging cells with less than a threshold number of individuals

Privacy Definition [DMNS06,…]  -Indistinguishability Two tables T, T’, differ by a single row S : Output of the sanitizer Pr[S | T] ≤ (1 + ) Pr[S | T’] TT’ S

An Example  Cannot always get -Indistinguishability with random sampling T : n rows with value 0 T’ : n-1 rows with value 0, 1 row with value 1 S : 1 row with value 1, s – 1 rows with value 0 TT’ S

Privacy Definition[DKMMiNa06,BDMN05]  (,-Indistinguishability : Two tables T, T’, differ by a single row S : Output of the sanitizer With probability at least 1 - , Pr[S | T] ≤ (1 + ) Pr[S | T’] TT’ S

An Example  Cannot always get (,- Indistinguishability for all tables A table where all rows have unique values TT’ S

When does Random Sampling preserve Privacy?  Parameters: (, )-indistinguishability k : number of distinct values in T t : number of values which occur at most log(k/)/ times in T  Theorem: This can be guaranteed if p <  (if t = 0) p < Õ(  /t)

Classification of Values Rare Value Infrequent Value Common Value Number of rows with value v log(k/)/log(k/)/p For (, )-indistinguishability:

Rare Values  If a rare value v is observed in a random sample, Pr[S|T’]>(1 +  log(k  Pr[S|T] TT’ S

Common Values  For a common value v, Pr[S|T] ≈ Pr[S|T’]  Typically, the number of rows with a common value is close to its expectation TT’ S log(k/ )/ log(k/)/p RareCommonInfrequent

Infrequent Values  For an infrequent value v, Pr[S|T] ≈ Pr[S|T’]  Typically, the number of rows with an infrequent value is at most log(k/) away from its expected value TT’ S log(k/ )/ log(k/)/p RareCommonInfrequent

Properties of a Good Sample  A sample S is -indistinguishable if: No rare values The number of rows with common value v is within a constant factor of expectation The number of rows with infrequent value v is at most an additive O(log(k/)) more than its expected value

When does Random Sampling preserve Privacy?  Such a sample occurs with probability at least 1 -  if p <  (if t=0) p < Õ(  /t)

Utility of Random Sampling  Assuming no rare values: Error in the frequency of each value : additive 1/√n  [DMNS06] Estimates histogram with an additive error of 1/n in each frequency  Sampling may give a compact representation of the histogram

Conclusions  Random sampling preserves privacy only when there are few rare values  With rare values, the probability of failure can be high  = (1/n) as opposed to 1/2^n [DKMMiNa06, BDMN05]  Error in estimating the frequency of each value can be high Additive 1/√n as opposed to 1/n of [DMNS06]

Thank You

The Problem  What are the conditions under which this sanitizer preserves privacy?

Similar presentations