Boosting and Differential Privacy Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A.

Slides:

Advertisements

Similar presentations

1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

Advertisements

Efficient classification for metric data Lee-Ad GottliebWeizmann Institute Aryeh KontorovichBen Gurion U. Robert KrauthgamerWeizmann Institute TexPoint.

Ensemble Learning Reading: R. Schapire, A brief introduction to boosting.

Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

On-line learning and Boosting

Boosting Approach to ML

1 Kshitij Judah, Alan Fern, Tom Dietterich TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: UAI-2012 Catalina Island,

FilterBoost: Regression and Classification on Large Datasets Joseph K. Bradley 1 and Robert E. Schapire 2 1 Carnegie Mellon University 2 Princeton University.

Games of Prediction or Things get simpler as Yoav Freund Banter Inc.

Foundations of Privacy Lecture 4 Lecturer: Moni Naor.

CMPUT 466/551 Principal Source: CMU

Longin Jan Latecki Temple University

Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei Li,

Introduction to Boosting Slides Adapted from Che Wanxiang( 车万翔 ) at HIT, and Robin Dhamankar of Many thanks!

Seminar in Foundations of Privacy 1.Adding Consistency to Differential Privacy 2.Attacks on Anonymized Social Networks Inbal Talgam March 2008.

Graph Sparsifiers by Edge-Connectivity and Random Spanning Trees Nick Harvey University of Waterloo Department of Combinatorics and Optimization Joint.

Convergent and Correct Message Passing Algorithms Nicholas Ruozzi and Sekhar Tatikonda Yale University TexPoint fonts used in EMF. Read the TexPoint manual.

2D1431 Machine Learning Boosting.

Probably Approximately Correct Model (PAC)

Ensemble Learning: An Introduction

Adaboost and its application

Foundations of Privacy Lecture 11 Lecturer: Moni Naor.

Machine Learning: Ensemble Methods

Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.

Experts and Boosting Algorithms. Experts: Motivation Given a set of experts –No prior information –No consistent behavior –Goal: Predict as the best expert.

Boosting Main idea: train classifiers (e.g. decision trees) in a sequence. a new classifier should focus on those cases which were incorrectly classified.

For Better Accuracy Eick: Ensemble Learning

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

Multiplicative Weights Algorithms CompSci Instructor: Ashwin Machanavajjhala 1Lecture 13 : Fall 12.

Graph Sparsifiers Nick Harvey University of British Columbia Based on joint work with Isaac Fung, and independent work of Ramesh Hariharan & Debmalya Panigrahi.

CSSE463: Image Recognition Day 27 This week This week Last night: k-means lab due. Last night: k-means lab due. Today: Classification by “boosting” Today:

1 Learning CRFs with Hierarchical Features: An Application to Go Scott Sanner Thore Graepel Ralf Herbrich Tom Minka TexPoint fonts used in EMF. Read the.

CS 391L: Machine Learning: Ensembles

Selective Block Minimization for Faster Convergence of Limited Memory Large-scale Linear Models Kai-Wei Chang and Dan Roth Experiment Settings Block Minimization.

Differentially Private Marginals Release with Mutual Consistency and Error Independent of Sample Size Cynthia Dwork, Microsoft TexPoint fonts used in EMF.

ECE 8443 – Pattern Recognition Objectives: Bagging and Boosting Cross-Validation ML and Bayesian Model Comparison Combining Classifiers Resources: MN:

Benk Erika Kelemen Zsolt

Boosting of classifiers Ata Kaban. Motivation & beginnings Suppose we have a learning algorithm that is guaranteed with high probability to be slightly.

Ensemble Methods: Bagging and Boosting

Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.

PODC Distributed Computation of the Mode Fabian Kuhn Thomas Locher ETH Zurich, Switzerland Stefan Schmid TU Munich, Germany TexPoint fonts used in.

Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.

Learning with AdaBoost

Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 AdaBoost.. Binary Classification. Read 9.5 Duda,

CSSE463: Image Recognition Day 33 This week This week Today: Classification by “boosting” Today: Classification by “boosting” Yoav Freund and Robert Schapire.

1 January 24, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 7 — Classification Ensemble Learning.

Boosting and Additive Trees (Part 1) Ch. 10 Presented by Tal Blum.

Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.

Boosting ---one of combining models Xin Li Machine Learning Course.

AdaBoost Algorithm and its Application on Object Detection Fayin Li.

Output Perturbation with Query Relaxation By: XIAO Xiaokui and TAO Yufei Presenter: CUI Yingjie.

1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.

Machine Learning: Ensemble Methods

Private Data Management with Verification

Understanding Generalization in Adaptive Data Analysis

The Boosting Approach to Machine Learning

Machine Learning: Ensembles

The Boosting Approach to Machine Learning

Differential Privacy in Practice

Adaboost Team G Youngmin Jun

Vitaly (the West Coast) Feldman

CSCI B609: “Foundations of Data Science”

Privacy-preserving Prediction

Generalization bounds for uniformly stable algorithms

Published in: IEEE Transactions on Industrial Informatics

CS 391L: Machine Learning: Ensembles

Presentation transcript:

Boosting and Differential Privacy Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A AAA

The Power of Small, Private, Miracles TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A AAA Joint work with Guy Rothblum and Salil Vadhan

Boosting [Schapire, 1989]  General method for improving accuracy of any given learning algorithm  Example: Learning to recognize spam  “Base learner” receives labeled examples, outputs heuristic  Labels are {+1, -1}  Run many times; combine the resulting heuristics

Base Learner S: Labeled examples from D A 1, A 2, … Update D A Combine A 1, A 2, … Does well on ½ + ´ of D Terminate?

Base Learner S: Labeled examples from D A 1, A 2, … Update D A Combine A 1, A 2, … Does well on ½ + ´ of D How ? Terminate?

Boosting for People [Variant of AdaBoost, FS95]  Initial distribution D is uniform on database rows  S is always a subset of k elements drawn from D k  Combiner is majority  Weight update:  If correctly classified by current A, decrease weight by factor of e  “subtract 1 from exponent”  If incorrectly classified by current A, increase weight by factor of e  “add 1 to exponent”  Re-normalize to obtain updated D

Why Does it Work? Update rule: multiply weight by exp(-c t (i)) D t+1 (i) = [D t (i) exp(-c t (i)] / N t N t D t+1 (i) = D t (i) exp(-c t (i)) N t N t-1 …N 1 D t+1 (i) = D 1 (i)exp(-  s c s (i))  s N s D t+1 (i) = (1/m) exp (-  s c s (i))  i   s N s D t+1 (i) = (1/m)  i  exp (-  s c s (i))  s N s = (1/m)  i exp (-  s c s (i)) A t (i) correct?

Why Does it Work? Update rule: multiply weight by exp(-c t (i)) D t+1 (i) = [D t (i) exp(-c t (i)] / N t N t D t+1 (i) = D t (i) exp(-c t (i)) N t N t-1 …N 1 D t+1 (i) = D 1 (i)exp(-  s c s (i))  s N s D t+1 (i) = (1/m) exp (-  s c s (i))  i   s N s D t+1 (i) = (1/m)  i  exp (-  s c s (i))  s N s = (1/m)  i exp (-  s c s (i)) A t (i) correct?

Why Does it Work? Update rule: multiply weight by exp(-c t (i)) D t+1 (i) = [D t (i) exp(-c t (i)] / N t N t D t+1 (i) = D t (i) exp(-c t (i)) N t N t-1 …N 1 D t+1 (i) = D 1 (i)exp(-  s c s (i))  s N s D t+1 (i) = (1/m) exp (-  s c s (i))  i   s N s D t+1 (i) = (1/m)  i  exp (-  s c s (i))  s N s = (1/m)  i exp (-  s c s (i)) A t (i) correct?

Why Does it Work? Update rule: multiply weight by exp(-c t (i)) D t+1 (i) = [D t (i) exp(-c t (i)] / N t N t D t+1 (i) = D t (i) exp(-c t (i)) N t N t-1 …N 1 D t+1 (i) = D 1 (i)exp(-  s c s (i))  s N s D t+1 (i) = (1/m) exp (-  s c s (i))  i   s N s D t+1 (i) = (1/m)  i  exp (-  s c s (i))  s N s = (1/m)  i exp (-  s c s (i)) A t (i) correct?

 s N s = (1/m)  i exp (-  s c s (i))   s N s is shrinking exponentially (depends on ´ )  Normalizers are sums of weights;  At start of each round these sum to 1  “more” decrease (because the base learner is good) than increase  More weight has the exponent shrink than otherwise   i exp (-  s c s (i)) =  i exp (- y i  s A s (i))  This is an upper bound on # of incorrectly classified examples:  If y i ≠ sign[  s A s (i)] ( = majority{A 1 (i), A 2 (i),…}), then y i  s A s (i) < 0, so exp(-y i  s A s (i)) ≥ 1.  Therefore, the number of incorrectly classified examples is exponentially small in t

-1/+1 renormalize Base Learner S: Labeled examples from D A 1, A 2, … Update D A Combine A 1, A 2, … majority Initially: D uniform on DB rows Does well on ½ + ´ of D Privacy? Terminate?

Private Boosting for People  Base learner must be differentially private  Main concern is rows whose weight grows too large  Affects termination test, sampling, re-normalizing  Similar to problem arising when learning in the presence of noise  Similar solution: smooth boosting  Remove (give up on) elements that become too heavy  Carefully! Removing one heavy element and re-normalizing may cause another element to become heavy…  Ensure this is rare (else give up on too many elements; hurt accuracy)

Iterative Smoothing  Not today.

Boosting for Queries?  Goal: Given database DB and a set Q of low-sensitivity queries, produce an object O (eg, synthetic database) such that 8 q 2 Q : can extract from O an approximation of q(DB).  Assume existence of ( ² 0, ± 0 )-dp Base Learner producing an object O that does well on more than half of D  Pr q » D [ |q( O ) – q(DB)| (1/2 + ´ )

Base Learner S: Labeled examples from D A 1, A 2, … Update D A Combine A 1, A 2, … Initially: D uniform on Q Does well on ½ + ´ of D

-1/+1 renormalize Base Learner S: Labeled examples from D A 1, A 2, … Update D A Combine A 1, A 2, … median Initially: D uniform on Q Does well on ½ + ´ of D Privacy? Individual can affect many queries at once! Terminate?

Privacy is Problematic  In smooth boosting for people, at each round an individual has only a small effect on the probability distribution  In boosting for queries, an individual can affect the quality of q(A t ) simultaneously for many q  As time progresses, distributions on neighboring databases could evolve completely differently, yielding very different A t ’s  Slightly ameliorated by sampling (if only a few samples, maybe can avoid the q’s on the edge?)  How can we make the re-weighting less sensitive?

Private Boosting for Queries [Variant of AdaBoost]  Initial distribution D is uniform on queries in Q  S is always a set of k elements drawn from Q k  Combiner is median [viz. Freund92]  Weight update for queries  If very well approximated by A t, decrease weight by factor of e (“-1”)  If very poorly approximated by A t, increase weight by factor of e (“+1”)  In between, scale with distance of midpoint (down or up): 2 ( |q(DB) – q(A t )| - ( ¸ + ¹ /2) ) / ¹ (sensitivity 2 ½ / ¹ ) + 

Theorem (minus some parameters)  Let all q 2 Q have sensitivity · ½.  Run the query-boost algorithm for T = log | Q |/ ´ 2 rounds with ¹ = ((log | Q |/ ´ 2 ) 2 ½ √k ) / ²  The resulting object Q is ( ( ² + T ² 0 ), T ± 0 ) )-dp and, whp, gives ( ¸ + ¹ )-accurate answers to all the queries in Q.  Better privacy (small ² ) gives worse utility (larger ¹ )  Better base learner (smaller k, larger ´ ) helps

Proving Privacy  Technique #1: Pay Your Debt and Move On  Fix A 1, A 2, …, A t (record D vs D’ confidence gain) “Pay Your Debt”  Focus on gain in selection of S 2 Q k in round t+1 “Move On”  Based on distributions D t+1 and D’ t+1 determined in round t  Will call them D, D’  Technique #2: Evolution of Confidence [DiDwN03]  “Delay Payment Until Final Reckoning”  Choose q 1, q 2, …, in turn  For each q 2 Q, bound |ln ( D[q] / D’[q] )| and expectation | E q »D ln ( D[q ] / D’[q] )|  Pr q1,…,qk [|  i ln ( D[q i ] / D’[q i ] )| > z√k (A + B) + k B] < exp(-z 2 /2) A B

Bounding E q »D ln ( P[q ] / P’[q] ) Assume D, D’ are A-dp wrt one another, for A < 1. Then 0 · E q » D ln[ D(q)/D’(q) ] · 2A 2 (that is, B · 2A 2 ). KL(D||D’) =  q ln[ D(q)/D’(q) ] D(q); always ¸ 0 So, KL(D||D’) · KL(D||D’) + KL(D’||D) =  q D(q) ( ln[ D(q)/D’(q) ] + ln[ D’(q)/D(q) ] ) + (D’(q)-D(q)) ln[ D’(q)/D(q) ] ·  q 0 + |D’(q)-D(q)| A = A  q [ max (D(q),D’(q)) - min (D(q),D’(q)) ] · A  q e A min (D(q),D’(q)) - min (D(q),D’(q)) · A  q (e A – 1) min (D(q),D’(q)) · 2A 2 when A < 1 Compare DiDwN03

Motivation and Application  Boosting for People  Logistic Regression for dimensional data  Slight twist on CM did pretty well (eps = 1.5)  Thought about alternatives  Boosting for Queries  Reducing the dependence on the concept class in the work on synthetic databases in DNRRV09 (Salil’s talk)  Over-interpreted the polytime DiNi style attacks (we were spoiled)  Can’t have cn queries with error o(√n)  BLR08: can have cn queries with error O(n 2/3 )  DNNRV09: O(n 1/2 | Q | o(1) )  Now: O(n 1/2 log 2 | Q |)  Result is more general  Only know of base learner for counting queries

Base Learner S: Labeled examples from D A 1, A 2, … Update D A Combine A 1, A 2, … Does well on ½ + ´ of D Terminate?