Presentation is loading. Please wait.

Presentation is loading. Please wait.

Boosting and Differential Privacy Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A.

Similar presentations


Presentation on theme: "Boosting and Differential Privacy Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A."— Presentation transcript:

1 Boosting and Differential Privacy Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A AAA

2 The Power of Small, Private, Miracles TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A AAA Joint work with Guy Rothblum and Salil Vadhan

3 Boosting [Schapire, 1989]  General method for improving accuracy of any given learning algorithm  Example: Learning to recognize spam e-mail  “Base learner” receives labeled examples, outputs heuristic  Labels are {+1, -1}  Run many times; combine the resulting heuristics

4 Base Learner S: Labeled examples from D A 1, A 2, … Update D A Combine A 1, A 2, … Does well on ½ + ´ of D Terminate?

5 Base Learner S: Labeled examples from D A 1, A 2, … Update D A Combine A 1, A 2, … Does well on ½ + ´ of D How ? Terminate?

6 Boosting for People [Variant of AdaBoost, FS95]  Initial distribution D is uniform on database rows  S is always a subset of k elements drawn from D k  Combiner is majority  Weight update:  If correctly classified by current A, decrease weight by factor of e  “subtract 1 from exponent”  If incorrectly classified by current A, increase weight by factor of e  “add 1 to exponent”  Re-normalize to obtain updated D

7 Why Does it Work? Update rule: multiply weight by exp(-c t (i)) D t+1 (i) = [D t (i) exp(-c t (i)] / N t N t D t+1 (i) = D t (i) exp(-c t (i)) N t N t-1 …N 1 D t+1 (i) = D 1 (i)exp(-  s c s (i))  s N s D t+1 (i) = (1/m) exp (-  s c s (i))  i   s N s D t+1 (i) = (1/m)  i  exp (-  s c s (i))  s N s = (1/m)  i exp (-  s c s (i)) A t (i) correct?

8 Why Does it Work? Update rule: multiply weight by exp(-c t (i)) D t+1 (i) = [D t (i) exp(-c t (i)] / N t N t D t+1 (i) = D t (i) exp(-c t (i)) N t N t-1 …N 1 D t+1 (i) = D 1 (i)exp(-  s c s (i))  s N s D t+1 (i) = (1/m) exp (-  s c s (i))  i   s N s D t+1 (i) = (1/m)  i  exp (-  s c s (i))  s N s = (1/m)  i exp (-  s c s (i)) A t (i) correct?

9 Why Does it Work? Update rule: multiply weight by exp(-c t (i)) D t+1 (i) = [D t (i) exp(-c t (i)] / N t N t D t+1 (i) = D t (i) exp(-c t (i)) N t N t-1 …N 1 D t+1 (i) = D 1 (i)exp(-  s c s (i))  s N s D t+1 (i) = (1/m) exp (-  s c s (i))  i   s N s D t+1 (i) = (1/m)  i  exp (-  s c s (i))  s N s = (1/m)  i exp (-  s c s (i)) A t (i) correct?

10 Why Does it Work? Update rule: multiply weight by exp(-c t (i)) D t+1 (i) = [D t (i) exp(-c t (i)] / N t N t D t+1 (i) = D t (i) exp(-c t (i)) N t N t-1 …N 1 D t+1 (i) = D 1 (i)exp(-  s c s (i))  s N s D t+1 (i) = (1/m) exp (-  s c s (i))  i   s N s D t+1 (i) = (1/m)  i  exp (-  s c s (i))  s N s = (1/m)  i exp (-  s c s (i)) A t (i) correct?

11  s N s = (1/m)  i exp (-  s c s (i))   s N s is shrinking exponentially (depends on ´ )  Normalizers are sums of weights;  At start of each round these sum to 1  “more” decrease (because the base learner is good) than increase  More weight has the exponent shrink than otherwise   i exp (-  s c s (i)) =  i exp (- y i  s A s (i))  This is an upper bound on # of incorrectly classified examples:  If y i ≠ sign[  s A s (i)] ( = majority{A 1 (i), A 2 (i),…}), then y i  s A s (i) < 0, so exp(-y i  s A s (i)) ≥ 1.  Therefore, the number of incorrectly classified examples is exponentially small in t

12 -1/+1 renormalize Base Learner S: Labeled examples from D A 1, A 2, … Update D A Combine A 1, A 2, … majority Initially: D uniform on DB rows Does well on ½ + ´ of D Privacy? Terminate?

13 Private Boosting for People  Base learner must be differentially private  Main concern is rows whose weight grows too large  Affects termination test, sampling, re-normalizing  Similar to problem arising when learning in the presence of noise  Similar solution: smooth boosting  Remove (give up on) elements that become too heavy  Carefully! Removing one heavy element and re-normalizing may cause another element to become heavy…  Ensure this is rare (else give up on too many elements; hurt accuracy)

14 Iterative Smoothing  Not today.

15 Boosting for Queries?  Goal: Given database DB and a set Q of low-sensitivity queries, produce an object O (eg, synthetic database) such that 8 q 2 Q : can extract from O an approximation of q(DB).  Assume existence of ( ² 0, ± 0 )-dp Base Learner producing an object O that does well on more than half of D  Pr q » D [ |q( O ) – q(DB)| (1/2 + ´ )

16 Base Learner S: Labeled examples from D A 1, A 2, … Update D A Combine A 1, A 2, … Initially: D uniform on Q Does well on ½ + ´ of D

17 -1/+1 renormalize Base Learner S: Labeled examples from D A 1, A 2, … Update D A Combine A 1, A 2, … median Initially: D uniform on Q Does well on ½ + ´ of D Privacy? Individual can affect many queries at once! Terminate?

18 Privacy is Problematic  In smooth boosting for people, at each round an individual has only a small effect on the probability distribution  In boosting for queries, an individual can affect the quality of q(A t ) simultaneously for many q  As time progresses, distributions on neighboring databases could evolve completely differently, yielding very different A t ’s  Slightly ameliorated by sampling (if only a few samples, maybe can avoid the q’s on the edge?)  How can we make the re-weighting less sensitive?

19 Private Boosting for Queries [Variant of AdaBoost]  Initial distribution D is uniform on queries in Q  S is always a set of k elements drawn from Q k  Combiner is median [viz. Freund92]  Weight update for queries  If very well approximated by A t, decrease weight by factor of e (“-1”)  If very poorly approximated by A t, increase weight by factor of e (“+1”)  In between, scale with distance of midpoint (down or up): 2 ( |q(DB) – q(A t )| - ( ¸ + ¹ /2) ) / ¹ (sensitivity 2 ½ / ¹ ) + 

20 Theorem (minus some parameters)  Let all q 2 Q have sensitivity · ½.  Run the query-boost algorithm for T = log | Q |/ ´ 2 rounds with ¹ = ((log | Q |/ ´ 2 ) 2 ½ √k ) / ²  The resulting object Q is ( ( ² + T ² 0 ), T ± 0 ) )-dp and, whp, gives ( ¸ + ¹ )-accurate answers to all the queries in Q.  Better privacy (small ² ) gives worse utility (larger ¹ )  Better base learner (smaller k, larger ´ ) helps

21 Proving Privacy  Technique #1: Pay Your Debt and Move On  Fix A 1, A 2, …, A t (record D vs D’ confidence gain) “Pay Your Debt”  Focus on gain in selection of S 2 Q k in round t+1 “Move On”  Based on distributions D t+1 and D’ t+1 determined in round t  Will call them D, D’  Technique #2: Evolution of Confidence [DiDwN03]  “Delay Payment Until Final Reckoning”  Choose q 1, q 2, …, in turn  For each q 2 Q, bound |ln ( D[q] / D’[q] )| and expectation | E q »D ln ( D[q ] / D’[q] )|  Pr q1,…,qk [|  i ln ( D[q i ] / D’[q i ] )| > z√k (A + B) + k B] < exp(-z 2 /2) A B

22 Bounding E q »D ln ( P[q ] / P’[q] ) Assume D, D’ are A-dp wrt one another, for A < 1. Then 0 · E q » D ln[ D(q)/D’(q) ] · 2A 2 (that is, B · 2A 2 ). KL(D||D’) =  q ln[ D(q)/D’(q) ] D(q); always ¸ 0 So, KL(D||D’) · KL(D||D’) + KL(D’||D) =  q D(q) ( ln[ D(q)/D’(q) ] + ln[ D’(q)/D(q) ] ) + (D’(q)-D(q)) ln[ D’(q)/D(q) ] ·  q 0 + |D’(q)-D(q)| A = A  q [ max (D(q),D’(q)) - min (D(q),D’(q)) ] · A  q e A min (D(q),D’(q)) - min (D(q),D’(q)) · A  q (e A – 1) min (D(q),D’(q)) · 2A 2 when A < 1 Compare DiDwN03

23 Motivation and Application  Boosting for People  Logistic Regression for 3000+ dimensional data  Slight twist on CM did pretty well (eps = 1.5)  Thought about alternatives  Boosting for Queries  Reducing the dependence on the concept class in the work on synthetic databases in DNRRV09 (Salil’s talk)  Over-interpreted the polytime DiNi style attacks (we were spoiled)  Can’t have cn queries with error o(√n)  BLR08: can have cn queries with error O(n 2/3 )  DNNRV09: O(n 1/2 | Q | o(1) )  Now: O(n 1/2 log 2 | Q |)  Result is more general  Only know of base learner for counting queries

24 Base Learner S: Labeled examples from D A 1, A 2, … Update D A Combine A 1, A 2, … Does well on ½ + ´ of D Terminate?


Download ppt "Boosting and Differential Privacy Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A."

Similar presentations


Ads by Google