Download presentation
Presentation is loading. Please wait.
Published byNora Daniels Modified over 6 years ago
1
StingyCD: Safely Avoiding Wasteful Updates in Coordinate Descent
Tyler B. Johnson and Carlos Guestrin University of Washington
2
Coordinate descent Simple and good optimization algorithm
Fast in practice Understood with theory No learning rate or other parameters 😃
3
Lasso objective Solution is sparse—majority of weights equal 0
4
Nonnegative Lasso objective
5
Nonnegative Lasso objective
StingyCD can also solve normal Lasso Also straightforward to extend to linear SVM
6
Inside an iteration of CD
Residuals vector For chosen coordinate, compute
7
Major drawback of CD “Zero updates” Zero updates are wasteful!
Due to sparsity, zero updates are very common! Computing gradient requires time
8
StingyCD Skip updates guaranteed to be zero
Skip condition requires just constant time
9
Geometry of a zero update
Residuals vector
10
StingyCD StingyCD makes 3 simple changes to CD
11
Change 1: Reference residuals vector
Reference updated infrequently (once every several epochs)
12
Change 1: Reference residuals vector
13
Change 1: Reference residuals vector
14
Change 1: Reference residuals vector
15
Change 1: Reference residuals vector
16
Change 1: Reference residuals vector
17
Change 1: Reference residuals vector
18
Change 1: Reference residuals vector
19
Change 2: Track reference distance
20
Change 2: Track reference distance
21
Change 2: Track reference distance
22
Change 2: Track reference distance
23
Change 2: Track reference distance
24
Change 2: Track reference distance
25
Change 2: Track reference distance
26
Change 3: Threshold reference distance
27
Change 3: Threshold reference distance
28
Summary of StingyCD changes
Before each iteration, check skip condition Constant time \ Constant time
29
Reference update trade-off
30
Scheduling reference updates
Relative time to converge
31
StingyCD empirical performance
Time (s) Relative suboptimality CD CD + Safe screening StingyCD
32
Skipping more updates with StingyCD+
\
33
Skipping more updates with StingyCD+
\
34
Skipping more updates with StingyCD+
\
35
Probability of useful update
StingyCD+ models the probability each update is useful (i.e. nonzero) Efficiently compute probability with lookup table
36
StingyCD+ empirical performance
Time (s) Relative suboptimality CD CD + Safe screening StingyCD StingyCD+ \
37
StingyCD+ empirical performance
Time (s) Relative suboptimality CD CD + Safe screening StingyCD StingyCD+ \
38
Combining StingyCD+ with other methods
Popular sparse logistic regression algorithms: Approximate proximal newton Working set algorithms Both rely on Lasso subproblem solvers Compare CD, StingyCD+ as subproblem solvers \
39
Sparse logistic regression results
Time (s) Relative suboptimality CD ProxNewt CD ProxNewt w/ Working Sets StingyCD+ ProxNewt StingyCD+ ProxNewt w/ Working Sets \
40
Sparse logistic regression results
Time (min) Relative suboptimality CD ProxNewt CD ProxNewt w/ Working Sets StingyCD+ ProxNewt StingyCD+ ProxNewt w/ Working Sets \
41
Takeaways Thank you! StingyCD makes simple changes to CD
Avoids wasteful computation Further gains possible with relaxations Can combine with other methods Future directions Extend to more problem settings Apply ”stingy updates” to other algorithms Thank you!
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.