Presentation is loading. Please wait.

Presentation is loading. Please wait.

Generalization bounds for uniformly stable algorithms

Similar presentations


Presentation on theme: "Generalization bounds for uniformly stable algorithms"β€” Presentation transcript:

1 Generalization bounds for uniformly stable algorithms
Vitaly Feldman Brain with Jan Vondrak

2 Uniform stability Uniform stability [Bousquet,Elisseeff 02]:
Domain 𝑍 (e.g. π‘‹Γ—π‘Œ) Dataset 𝑆= 𝑧 1 ,…, 𝑧 𝑛 ∈ 𝑍 𝑛 Learning algorithm 𝐴: 𝑍 𝑛 β†’π‘Š Loss function β„“:π‘ŠΓ—π‘β†’ ℝ + Uniform stability [Bousquet,Elisseeff 02]: 𝐴 has uniform stability 𝛾 w.r.t. β„“ if For all neighboring 𝑆, 𝑆 β€² ,π‘§βˆˆπ‘ β„“ 𝐴 𝑆 ,𝑧 βˆ’β„“ 𝐴 𝑆 β€² ,𝑧 ≀𝛾

3 Generalization error Probability distribution 𝑃 over 𝑍, π‘†βˆΌ 𝑃 𝑛
Population loss: 𝐄 𝑃 β„“ 𝑀 = 𝐄 π‘§βˆΌπ‘ƒ β„“ 𝑀,𝑧 Empirical loss: 𝓔 𝑆 β„“ 𝑀 = 1 𝑛 𝑖=1 𝑛 β„“ 𝑀, 𝑧 𝑖 Generalization error/gap: Ξ” 𝑆 β„“ 𝐴 = 𝐄 𝑃 β„“ 𝐴 𝑆 βˆ’ 𝓔 𝑆 β„“ 𝐴 𝑆

4 Stochastic convex optimization
π‘Š= 𝔹 2 𝑑 1 ≐ 𝑀 𝑀 2 ≀1} For all π‘§βˆˆπ‘, β„“(𝑀,𝑧) is convex 1-Lipschitz in 𝑀 Minimize 𝐄 𝑃 β„“ 𝑀 ≐ 𝐄 π‘§βˆΌπ‘ƒ [β„“(𝑀,𝑧)] over π‘Š 𝐿 βˆ— = min π‘€βˆˆπ‘Š 𝐄 𝑃 β„“ 𝑀 For all 𝑃, 𝐴 being SGD with rate πœ‚=1/ 𝑛 : Pr π‘†βˆΌ 𝑃 𝑛 𝐄 𝑃 β„“ 𝐴(𝑆) β‰₯ 𝐿 βˆ— +𝑂 log 1/𝛿 𝑛 ≀𝛿 Uniform convergence error: ≳ 𝑑 𝑛 ERM might have generalization error: ≳ 𝑑 𝑛 [Shalev-Shwartz,Shamir,Srebro,Sridharan ’09; Feldman 16]

5 Stable optimization Strongly convex ERM [BE 02, SSSS 09]
𝐴 πœ† (𝑆)=argmi n π‘€βˆˆπ‘Š 𝓔 𝑆 β„“ 𝑀 + πœ† 2 𝑀 2 2 𝐴 πœ† is 1 πœ†π‘› uniformly stable and minimizes 𝓔 𝑆 β„“ 𝑀 within πœ† 2 Gradient descent on smooth losses [Hardt,Recht,Singer 16] 𝐴 𝑇 (𝑆) : 𝑇 steps of GD on 𝓔 𝑆 β„“ 𝑀 with πœ‚= 1 𝑇 𝐴 𝑇 is 𝑇 𝑛 uniformly stable and minimizes 𝓔 𝑆 β„“ 𝑀 within 2 𝑇

6 Generalization bounds
For 𝐴,β„“ w/ range 0,1 and uniform stability π›Ύβˆˆ 1 𝑛 ,1 1 𝑛 𝐄 π‘†βˆΌ 𝑃 𝑛 Ξ” 𝑆 β„“(𝐴 ) ≀𝛾 [Rogers,Wagner 78] 𝐏𝐫 π‘†βˆΌ 𝑃 𝑛 Ξ” 𝑆 β„“(𝐴) β‰₯𝛾 𝒏 log 1/𝛿 ≀𝛿 [Bousquet,Elisseeff 02] Vacuous when 𝛾β‰₯1/ 𝑛 𝐏𝐫 π‘†βˆΌ 𝑃 𝑛 Ξ” 𝑆 β„“(𝐴) β‰₯ 𝛾 log 1/𝛿 ≀𝛿 NEW!

7 Comparison [BE 02] This work 𝑛=100 Generalization error 1 𝑛 Ξ³

8 Second moment 𝐄 π‘†βˆΌ 𝑃 𝑛 Ξ” 𝑆 β„“(𝐴) 2 ≀ 𝛾 2 + 1 𝑛 TIGHT! NEW!
𝐄 π‘†βˆΌ 𝑃 𝑛 Ξ” 𝑆 β„“(𝐴) 2 ≀ 𝛾 𝑛 Chebyshev: 𝐏𝐫 π‘†βˆΌ 𝑃 𝑛 Ξ” 𝑆 β„“ 𝐴 β‰₯ 𝛾+1/ 𝑛 𝛿 ≀𝛿 NEW! TIGHT! 1 𝑛 Ξ³ Generalization error Previously [Devroye,Wagner 79; BE 02] 𝐄 π‘†βˆΌ 𝑃 𝑛 Ξ” 𝑆 β„“(𝐴) 2 ≀𝛾+ 1 𝑛 [BE 02] This work

9 Implications Differentially-private prediction [Dwork,Feldman 18]
𝐏𝐫 π‘†βˆΌ 𝑃 𝑛 𝐄 𝑃 β„“ 𝐴(𝑆) β‰₯ 𝐿 βˆ— +𝑂 1 𝛿 1/4 𝑛 ≀𝛿 𝐏𝐫 π‘†βˆΌ 𝑃 𝑛 𝐄 𝑃 β„“ 𝐴(𝑆) β‰₯ 𝐿 βˆ— +𝑂 log 1/𝛿 𝑛 1/3 ≀𝛿 For any loss 𝐿:π‘ŒΓ—π‘Œβ†’[0,1], 𝐄 𝐴 [𝐿(𝐴 𝑆,π‘₯ ,𝑦)] has uniform stability 𝑒 πœ– βˆ’1β‰ˆπœ– Stronger generalization bounds for DP prediction algorithms Differentially-private prediction [Dwork,Feldman 18] 𝑍=π‘‹Γ—π‘Œ. A randomized algorithm 𝐴(𝑆,π‘₯) has πœ–-DP prediction if for all 𝑆, 𝑆 β€² ,π‘₯βˆˆπ‘‹ 𝐷 ∞ 𝐴 𝑆,π‘₯ ||𝐴 𝑆 β€² ,π‘₯ β‰€πœ–

10 Data-dependent functions
Consider 𝑀: 𝑍 𝑛 ×𝑍→ℝ . E.g. 𝑀 𝑆,𝑧 ≑ℓ(𝐴 𝑆 ,𝑧) 𝑀 has uniform stability 𝛾 if For all neighboring 𝑆, 𝑆 β€² ,π‘§βˆˆπ‘ 𝑀 𝑆,𝑧 βˆ’π‘€ 𝑆 β€² ,𝑧 ≀𝛾 𝑀 𝑆,β‹… βˆ’π‘€ 𝑆 β€² ,β‹… ∞ ≀𝛾 Generalization error/gap: Ξ” 𝑆 𝑀 = 𝐄 𝑃 𝑀 𝑆 βˆ’ 𝓔 𝑆 𝑀 𝑆

11 Generalization in expectation
Goal: 𝐄 π‘†βˆΌ 𝑃 𝑛 𝐄 𝑃 𝑀(𝑆) βˆ’ 𝓔 𝑆 𝑀(𝑆) ≀𝛾 𝐄 π‘†βˆΌ 𝑃 𝑛 𝓔 𝑆 𝑀(𝑆) = 𝐄 π‘†βˆΌ 𝑃 𝑛 ,π‘–βˆΌ[𝑛] 𝑀(𝑆, 𝑧 𝑖 ) For all 𝑖 and π‘§βˆˆπ‘, 𝑀 𝑆, 𝑧 𝑖 ≀𝑀 𝑆 𝑖←𝑧 , 𝑧 𝑖 +𝛾 𝑀 𝑆, 𝑧 𝑖 ≀ 𝐄 π‘§βˆΌπ‘ƒ 𝑀 𝑆 𝑖←𝑧 , 𝑧 𝑖 +𝛾 𝐄 π‘†βˆΌ 𝑃 𝑛 ,π‘–βˆΌ[𝑛] 𝑀(𝑆, 𝑧 𝑖 ) ≀ 𝐄 π‘†βˆΌ 𝑃 𝑛 , π‘§βˆΌπ‘ƒ,π‘–βˆΌ 𝑛 𝑀 𝑆 𝑖←𝑧 , 𝑧 𝑖 +𝛾 = 𝐄 π‘†βˆΌ 𝑃 𝑛 ,π‘§βˆΌπ‘ƒ 𝑀 𝑆,𝑧 +𝛾 = 𝐄 π‘†βˆΌ 𝑃 𝑛 𝐄 𝑃 𝑀(𝑆) +𝛾 𝑆 𝑖←𝑧 , 𝑧 𝑖 ≑ 𝑃 𝑛+1 ≑ 𝑆,𝑧

12 Concentration via McDiarmid
For all neighboring 𝑆, 𝑆 β€² Ξ” 𝑆 𝑀 βˆ’ Ξ” 𝑆 β€² 𝑀 ≀2𝛾+ 1 𝑛 1. 𝐄 π‘§βˆΌπ‘ƒ 𝑀 𝑆,𝑧 βˆ’ 𝐄 π‘§βˆΌπ‘ƒ 𝑀 𝑆′,𝑧 ≀𝛾 2. 𝓔 𝑆 𝑀 𝑆 βˆ’ 𝓔 𝑆 β€² 𝑀 𝑆′,𝑧 ≀ 𝓔 𝑆 𝑀 𝑆 βˆ’ 𝓔 𝑆 𝑀 𝑆 β€² ,𝑧 + 𝓔 𝑆 𝑀 𝑆 β€² βˆ’ 𝓔 𝑆 β€² 𝑀 𝑆 β€² ,𝑧 ≀𝛾+ 1 𝑛 McDiarmid: 𝐏𝐫 π‘†βˆΌ 𝑃 𝑛 Ξ” 𝑆 𝑀 β‰₯πœ‡+ 2𝛾+ 1 𝑛 𝒏 log 1/𝛿 ≀𝛿 where πœ‡= 𝐄 π‘†βˆΌ 𝑃 𝑛 Ξ” 𝑆 𝑀 ≀𝛾

13 Proof technique UNSTABLE!
Based on [Nissim,Stemmer 15; BNSSSU 16] Let 𝑆 1 ,…, 𝑆 π‘š ~ 𝑃 𝑛 for π‘š= ln 2 𝛿 Need to bound 𝐄 𝑆 1 ,…, 𝑆 π‘š ~ 𝑃 𝑛 ma x π‘—βˆˆ π‘š Ξ” 𝑆 𝑗 𝑀 𝐄 𝑆 1 ,…, 𝑆 π‘š ~ 𝑃 𝑛 , β„“=argma x 𝑗 { Ξ” 𝑆 𝑗 𝑀 } 𝓔 𝑆 β„“ 𝑀 𝑆 β„“ β‰ˆ? 𝐄 𝑆 1 ,…, 𝑆 π‘š ~ 𝑃 𝑛 , β„“=argma x 𝑗 { Ξ” 𝑆 𝑗 𝑀 } 𝐄 𝑃 𝑀 𝑆 β„“ Let 𝑄 be a distribution over ℝ: Pr π‘£βˆΌπ‘„ 𝑣β‰₯2 𝐄 𝑣 1 ,…, 𝑣 π‘š βˆΌπ‘„ max 0, 𝑣 1 ,…, 𝑣 π‘š ≀ ln 2 π‘š UNSTABLE!

14 Stable max Exponential mechanism [McSherry,Talwar 07]
EM 𝛼 : sample π‘—βˆ 𝑒 𝛼 Ξ” 𝑆 𝑗 𝑀 Stable: 2𝛼 2𝛾+ 1 𝑛 - differentially private 𝐄 𝑆 1 ,…, 𝑆 π‘š ~ 𝑃 𝑛 ,β„“=E M 𝛼 𝓔 𝑆 β„“ 𝑀 𝑆 β„“ ≀ 𝐄 𝑆 1 ,…, 𝑆 π‘š ~ 𝑃 𝑛 ,β„“=E M 𝛼 𝐄 𝑃 𝑀 𝑆 β„“ exp 2𝛼 2𝛾+ 1 𝑛 βˆ’1 +𝛾 Approximates max 𝐄 𝑆 1 ,…, 𝑆 π‘š ~ 𝑃 𝑛 ,β„“=E M 𝛼 Ξ” 𝑆 β„“ 𝑀 β‰₯ 𝐄 𝑆 1 ,…, 𝑆 π‘š ~ 𝑃 𝑛 ma x π‘—βˆˆ π‘š Ξ” 𝑆 𝑗 𝑀 βˆ’ ln π‘š 𝛼

15 Game over 𝐄 𝑆 1 ,…, 𝑆 π‘š ~ 𝑃 𝑛 ma x π‘—βˆˆ π‘š Ξ” 𝑆 𝑗 𝑀 ≀ ln π‘š 𝛼 + exp 2𝛼 2𝛾+ 1 𝑛 βˆ’1 +𝛾 Pick 𝛼= ln π‘š 𝛾+1/𝑛 get 𝑂 𝛾+ 1 𝑛 ln 1 𝛿 Let 𝑄 be a distribution over ℝ: Pr π‘£βˆΌπ‘„ 𝑣β‰₯2 𝐄 𝑣 1 ,…, 𝑣 π‘š βˆΌπ‘„ max 0, 𝑣 1 ,…, 𝑣 π‘š ≀ ln 2 π‘š

16 Conclusions Better understanding of uniform stability New technique
Open Gap between upper and lower bounds High probability generalization without strong uniformity


Download ppt "Generalization bounds for uniformly stable algorithms"

Similar presentations


Ads by Google