Presentation is loading. Please wait.

Presentation is loading. Please wait.

Foundations of Privacy Lecture 4 Lecturer: Moni Naor.

Similar presentations


Presentation on theme: "Foundations of Privacy Lecture 4 Lecturer: Moni Naor."— Presentation transcript:

1 Foundations of Privacy Lecture 4 Lecturer: Moni Naor

2 Recap of last week’s lecture Differential Privacy Sensitivity: – Global sensitivity of query q:U n → R d GS q = max D,D’ ||q(D) – q(D’)|| 1 – Local sensitivity of query q at point D LS q (D)= max D’ |q(D) – q(D’)| – Smooth sensitivity S f *(X)= max Y {LS f (Y) e -  dist(x,y) } Histograms Differential privacy of median Exponential Mechanism

3 Histograms Inputs x 1, x 2,..., x n in domain U Domain U partitioned into d disjoint bins S 1,…,S d q(x 1, x 2,..., x n ) = (n 1, n 2,..., n d ) where n j = #{i : x i in j-th bin} Can view as d queries: q i counts # spoints in set S i For adjacent D, D’, only one answer can change - it can change by 1 Global sensitivity of answer vector is 1 Sufficient to add Lap(1/ε) noise to each query, still get ε -privacy

4 The Exponential Mechanism [ McSherry Talwar] A general mechanism that yields Differential privacy May yield utility/approximation Is defined and evaluated by considering all possible answers The definition does not yield an efficient way of evaluating it Application/original motivation: Approximate truthfulness of auctions Collusion resistance Compatibility

5 Side bar: Digital Goods Auction Some product with 0 cost of production n individuals with valuation v 1, v 2, … v n Auctioneer wants to maximize profit

6 Example of the Exponential Mechanism Data: x i = website visited by student i today Range: Y = {website names} For each name y, let q(y, X) = #{i : x i = y} Goal: output the most frequently visited site Procedure: Given X, Output website y with probability prop to e  q(y,X) Popular sites exponentially more likely than rare ones Website scores don’t change too quickly Size of subset

7 Setting U n RFor input D 2 U n want to find r 2 R R -Base measure  on R - usually uniform U n £ R Score function q’: U n £ R  R assigns any pair (D,r) a real value –Want to maximize it (approximately) The exponential mechanism R –Assign output r 2 R with probability proportional to e  q’(D,r)  (r) Normalizing factor  r e  q’(D,r)  (r)

8 The exponential mechanism is private Let  = max D,D’,r |q(D,r)-q(D’,r)| Claim: The exponential mechanism yields a 2 ¢  ¢  differentially private solution Prob [output = r on input D ] = e  q’(D,r)  (r)/  r e  q’(D,r)  (r) Prob [output = r on input D’ ] = e  q’(D’,r)  (r)/  r e  q’(D’,r)  (r) adjacent Ratio is bounded by e 

9 Laplace Noise as Exponential Mechanism On query q:U n → R let q’(D,r) = -|q(D)-r| Prob noise = y e -  y / 2  y e -  y =  /2 e -  y Laplace distribution Y=Lap(b) has density function Pr[Y=y] =1/2b e -|y|/b y 0 12345-2-3-4

10 Any Differentially Private Mechanism is an instance of the Exponential Mechanism Let M be a differentially private mechanism Take q’(D,r) to be log Prob[M(D) =r] Remaining issue: Accuracy

11 Private Ranking Each element i 2 {1, … n} has a real valued score S D (i) based on a data set D. Goal: Output k elements with highest scores. Privacy Data set D consists of n entries in domain D. – Differential privacy: Protects privacy of entries in D. Condition: Insensitive Scores –for any element i, for any data sets D, D’ that differ in one entry: |S D (i)- S D’ (i)| · 1

12 Approximate ranking Let S k be the k th highest score based on data set D. An output list is  -useful if: Soundness : No element in the output has score less than S k -  Completeness : Every element with score greater than S k +  is in the output. Score · S k -  S k +  · Score S k -  · Score · S k + 

13 Two Approaches Score perturbation –Perturb the scores of the elements with noise –Pick the top k elements in terms of noisy scores. –Fast and simple implementation Question: what sort of noise should be added? What sort of guarantees? Exponential sampling –Run the exponential mechanism k times. –more complicated and slower implementation What sort of guarantees? Each input affects all scores Homework

14 Exponential Mechanism: Simple Example (almost free) private lunch Database of n individuals, lunch options {1…k}, each individual likes or dislikes each option ( 1 or 0 ) Goal: output a lunch option that many like For each lunch option j 2 [k], ℓ(j) is # of ind. who like j Exponential Mechanism: Output j with probability e εℓ(j) Actual probability: e εℓ(j) /(∑ i e εℓ(i) ) Normalizer

15 query 1, query 2,... Synthetic DB: Output is a DB Database answer 1 answer 3 answer 2 ? Sanitizer Synthetic DB: output also a DB (of entries from same universe X ), user reconstructs answers by evaluating query on output DB Software and people compatible Consistent answers

16 Answering More Queries Using exponential mechanism Differential Privacy for every set C of counting queries Error is Õ(n 2/3 log|C|) Remarkable Hope for rich private analysis of small DBs! Quantitative: #queries >> DB size, Qualitative:output of sanitizer -synthetic DB- output is a DB itself

17 Counting Queries Queries with low sensitivity Counting-queries C is a set of predicates c: U  {0,1} Query : how many D participants satisfy c ? Relaxed accuracy: answer query within α additive error w.h.p Not so bad: error anyway inherent in statistical analysis Assume all queries given in advance U Database D of size n Query c Non-interactive

18 Utility and Privacy Can’t Always Be Achieved Simultaneously Impossibility results for counting queries: DB with n participants can’t have o(√n) error, O(n) queries [DiNi, DwMcTa07,DwYe08] In all these cases, strong privacy violation What can we do? almost entire DB compromised

19 Huge DBs [Dwork Nissim] DB of size n >> # queries |C|: Add independent noise to answer on every query Noise per query ~ #queries For accuracy, need #queries ≤ n huge May be reasonable for huge internet-scale DBs, Privacy “for free”

20 What about smaller DBs? DB of size n < #queries |C|, impossibility results: can’t have o(√n) error Error must be Ω(√n)

21 The BLR Algorithm For DBs F and D dist(F,D) = max q 2 C |q(F) – q(D)| Intuition: far away DBs get smaller probability Algorithm on input DB D : Sample from a distribution on DBs of size m : ( m < n ) DB F gets picked w.p. / e -ε·dist(F,D) Blum Ligett Roth08

22 The BLR Algorithm Idea: In general: Do not use large DB –Sample and answer accordingly DB of size m guaranteeing hitting each query with sufficient accuracy

23 The BLR Algorithm: 2 ε -Privacy For adjacent D, D’ for every F |dist(F,D) – dist(F,D’)| ≤ 1 Probability of F by D : e -ε·dist(F,D) /∑ G of size m e -ε·dist(G,D) Probability of F by D’ : numerator and denominator can change by e ε -factor  2ε -privacy Algorithm on input DB D : Sample from a distribution on DBs of size m : ( m < n ) DB F gets picked w.p. / e -ε·dist(F,D)

24 The BLR Algorithm: Error Õ(n 2/3 log|C|) There exists F good of size m =Õ((n\α) 2· log|C|) s.t. dist(F good,D) ≤ α Pr [F good ] ~ e -εα For any F bad with dist 2α, Pr [F bad ] ~ e -2εα Union bound: ∑ bad DB F bad Pr [F bad ] ~ |U| m e -2εα For α=Õ(n 2/3 log|C|), Pr [F good ] >> ∑ Pr [F bad ] Algorithm on input DB D : Sample from a distribution on DBs of size m : ( m < n ) DB F gets picked w.p. / e -ε·dist(F,D)

25 The BLR Algorithm: Running Time Generating the distribution by enumeration: Need to enumerate every size- m database, where m = Õ((n\α) 2· log|C|) Running time ≈ |U| Õ((n\α) 2 ·log|c|) Algorithm on input DB D : Sample from a distribution on DBs of size m : ( m < n ) DB F gets picked w.p. / e -ε·dist(F,D)

26 Conclusion Offline algorithm, 2ε- Differential Privacy for any set C of counting queries Error α is Õ(n 2/3 log|C|/ε) Super-poly running time: |U| Õ((n\α) 2 ·log|C|)

27 Can we Efficiently Sanitize? The good news If the universe is small, Can sanitize EFFICIENTLY The bad news cannot do much better, namely sanitize in time: sub-poly(|C|) AND sub-poly(|U|) Time poly(|C|,|U|)

28 How Efficiently Can We Sanitize? |C| |U| subpolypoly subpoly poly ? Good news! ? ??

29 The Good News: Can Sanitize When Universe is Small Efficient Sanitizer for query set C DB size n ¸ Õ(|C| o(1) log|U|) error is ~ n 2/3 Runtime poly(|C|,|U|) Output is a synthetic database Compare to [Blum Ligget Roth]: n ¸ Õ(log|C| log|U|), runtime super-poly(|C|,|U|)

30 Recursive Algorithm C 0 =CC1C1 C2C2 CbCb Start with DB D and large query set C Repeatedly choose random subset C i+1 o f C i : shrink query set by (small) factor

31 Recursive Algorithm Start with DB D and large query set C Repeatedly choose random subset C i+1 o f C i : shrink query set by (small) factor End recursion: sanitize D w.r.t. small query set C b Output is good for all queries in small set C i+1 Extract utility on almost-all queries in large set C i Fix remaining “underprivileged” queries in large set C i C 0 =CC1C1 C2C2 CbCb


Download ppt "Foundations of Privacy Lecture 4 Lecturer: Moni Naor."

Similar presentations


Ads by Google