# Raef Bassily Adam Smith Abhradeep Thakurta Penn State Yahoo! Labs Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds Penn.

## Presentation on theme: "Raef Bassily Adam Smith Abhradeep Thakurta Penn State Yahoo! Labs Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds Penn."— Presentation transcript:

Raef Bassily Adam Smith Abhradeep Thakurta Penn State Yahoo! Labs Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds Penn State (work done at BU/Harvard) (work done at MSR/Stanford)

Privacy in Statistical Databases internet social networks anonymized datasets A A queries answers ) ( Government, researchers, businesses (or) malicious adversary Trusted curator x1x1 x2x2 xnxn...... Users Two conflicting goals: Utility vs. Privacy Balancing these goals is tricky:  No control over external sources of information  Anonymization is unreliable:  [Narayanan-Shmatikov’08],  [Korolova’11],  [Calendrino et al.’12], …

Differential privacy [Dwork-McSherry-Nissim-Smith’06] local random coins A A x1x1 x2x2 xnxn x2’x2’ x1x1 Datasets x and x ’ are called neighbors if they differ in one record. xnxn Require: Neighbor datasets induce close distributions on outputs Def.: A randomized algorithm A is -differentially private if, for all data sets and that differ in one element, for all events, “Almost same” conclusions will be reached independent of whether any individual opts into or opts out of the data set. Think of Two regimes:  -differential privacy  -differential privacy,

This work Construct efficient and differentially private algorithms for convex empirical risk minimization with optimal excess risk

Convex empirical risk minimization Dataset. Convex set. Loss function. where is convex for all. C

Convex empirical risk minimization Dataset. Convex set. Loss function. where is convex for all. Actual minimizer C Goal: Find a “parameter” that minimizes the empirical risk: Goal: Find a “parameter” that minimizes the empirical risk:

Convex empirical risk minimization Dataset. Convex set. Loss function. where is convex for all. Output such that Excess risk Output Goal: Find a “parameter” that minimizes the empirical risk: Goal: Find a “parameter” that minimizes the empirical risk: Actual minimizer C

Examples Median Linear regression Support vector machine where

Private convex ERM [Chaudhuri-Monteleoni 08 & -- Sarwate 11] Studied by [Chaudhuri-et-al ‘11, Rubinstein-et-al ’11, Kifer- Smith-Thakurta‘12, Smith-Thakurta ’13, …] Privacy: A is differentially private in input Utility measured by (worst-case) expected excess risk: (Recall that ) A -diff. private Dataset Convex setLoss, Random coins

Why care about privacy in ERM? Dual form of SVM: typically contains a subset of the exact data points in the clear. Median: Minimizer is always a data point.

Contributions 1.New algorithms with optimal excess risk assuming: Loss function is Lipschitz. Parameter set C is bounded. (Separate set of algorithms for strongly convex loss.) 2.Matching lower bounds Best previous work [Chaudhuri-et-al’11, Kifer et al.’12] additionally assumes is smooth (bounded 2 nd derivative) This work improves bounds by factor of  Non-smooth loss is common: SVM, median, …  Applying their technique in general requires smoothing the loss, introducing extra error.

Lipschitz Λ -strongly convex PrivacyExcess riskTechnique -DP Exponential sampling (inspired by [McSherry-Talwar’07]) -DP Noisy stochastic gradient descent (rigorous analysis of & improvements to [McSherry-Williams’10], [Jain-Kothari-Thakurta’12] and [Chaudhuri-Sarwate-Song’13]) -DP Localization (new technique) -DP Noisy stochastic gradient descent (or localization) is 1-Lipschitz on parameter set C of diameter 1. Results – upper bounds ( dataset size =, C )

Lipschitz Λ -strongly convex PrivacyExcess riskTechnique -DP Exponential sampling (inspired by [McSherry-Talwar’07]) -DP Noisy stochastic gradient descent (rigorous analysis of & improvements to [McSherry-Williams’10], [Jain-Kothari-Thakurta’12] and [Chaudhuri-Sarwate-Song’13]) -DP Localization (new technique) -DP Noisy stochastic gradient descent (or localization) is 1-Lipschitz on parameter set C of diameter 1. Results – upper bounds ( dataset size =, C )

Lipschitz Λ -strongly convex PrivacyExcess riskTechnique -DP Exponential sampling (inspired by [McSherry-Talwar’07]) -DP Noisy stochastic gradient descent (rigorous analysis of & improvements to [McSherry-Williams’10], [Jain-Kothari-Thakurta’12] and [Chaudhuri-Sarwate-Song’13]) -DP Localization (new technique) -DP Noisy stochastic gradient descent (or localization) is 1-Lipschitz on parameter set C of diameter 1. Results – upper bounds ( dataset size =, C )

Lipschitz Λ -strongly convex PrivacyExcess riskTechnique -DP Localization (new technique) -DP Noisy stochastic gradient descent (or localization) is 1-Lipschitz on parameter set C of diameter 1. Results – upper bounds ( dataset size =, C )

Results – lower bounds PrivacyExcess risk Form of used -DP Linear: -DP Quadratic: -DP Lipschitz Strongly convex Reduction from -DP release of 1-way marginals [HT’10] Reduction from -DP release of 1-way marginals [BUV’13]

Lipschitz Λ -strongly convex PrivacyExcess riskTechnique -DP Exponential sampling (inspired by [McSherry-Talwar’07]) -DP Noisy stochastic gradient descent (rigorous analysis of & improvements to [McSherry-Williams’10] and [Chaudhuri-Sarwate-Song’13]) -DP Localization (new technique) -DP Noisy stochastic gradient descent (or, Localization) is 1-Lipschitz on parameter set C of diameter 1. Results – upper bounds

Optimal Noisy Stochastic Gradient Descent Algorithm

Noisy stochastic gradient descent algorithm Inputs: Data, 1-Lipschitz loss, convex set C,

Noisy stochastic gradient descent algorithm Choose arbitrary

Noisy stochastic gradient descent algorithm For :

Noisy stochastic gradient descent algorithm At iteration t :

Noisy stochastic gradient descent algorithm At iteration t :

Noisy stochastic gradient descent algorithm At iteration t : Learning rate

Noisy stochastic gradient descent algorithm At iteration t :

Noisy stochastic gradient descent algorithm Fresh data sample At iteration t+1 :

Noisy stochastic gradient descent algorithm Repeat for iterations, then output. Fresh data sample

Privacy of the noisy SGD Noisy SGD algorithm is -differentially private. After iterations, by strong composition [DRV’10], privacy degrades from to. Sampling amplifies privacy [KLNRS’08]: Key point:

Optimal Exponential Sampling Algorithm

Exponential sampling algorithm Define a probability distribution over C : Output a sample from Define a probability distribution over C : Output a sample from An instance of the exponential mechanism [McSherry-Talwar’08]  Efficient construction based on rapidly mixing MCMC  Uses [Applegate-Kannan’91] as a subroutine.  Provides purely multiplicative convergence guarantee.  Does not follow directly from existing results.  Exploits structure of convex functions: A 1, A 2, … are decreasing in volume  Shows that when  Tight utility analysis via a “peeling” argument

1.New algorithms with optimal excess risk assuming: Loss function is Lipschitz. Parameter set C is bounded. (Separate set of algorithms for strongly convex loss.) 2.Matching lower bounds Summary New Localization technique: optimal algorithm for strongly convex loss. Generalization error guarantees: Not known to be tight in general. Not in this talk:

Download ppt "Raef Bassily Adam Smith Abhradeep Thakurta Penn State Yahoo! Labs Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds Penn."

Similar presentations