Presentation is loading. Please wait.

Presentation is loading. Please wait.

StingyCD: Safely Avoiding Wasteful Updates in Coordinate Descent

Similar presentations


Presentation on theme: "StingyCD: Safely Avoiding Wasteful Updates in Coordinate Descent"— Presentation transcript:

1 StingyCD: Safely Avoiding Wasteful Updates in Coordinate Descent
Tyler B. Johnson and Carlos Guestrin University of Washington

2 Coordinate descent Simple and good optimization algorithm
Fast in practice Understood with theory No learning rate or other parameters 😃

3 Lasso objective Solution is sparse—majority of weights equal 0

4 Nonnegative Lasso objective

5 Nonnegative Lasso objective
StingyCD can also solve normal Lasso Also straightforward to extend to linear SVM

6 Inside an iteration of CD
Residuals vector For chosen coordinate, compute

7 Major drawback of CD “Zero updates” Zero updates are wasteful!
Due to sparsity, zero updates are very common! Computing gradient requires time

8 StingyCD Skip updates guaranteed to be zero
Skip condition requires just constant time

9 Geometry of a zero update
Residuals vector

10 StingyCD StingyCD makes 3 simple changes to CD

11 Change 1: Reference residuals vector
Reference updated infrequently (once every several epochs)

12 Change 1: Reference residuals vector

13 Change 1: Reference residuals vector

14 Change 1: Reference residuals vector

15 Change 1: Reference residuals vector

16 Change 1: Reference residuals vector

17 Change 1: Reference residuals vector

18 Change 1: Reference residuals vector

19 Change 2: Track reference distance

20 Change 2: Track reference distance

21 Change 2: Track reference distance

22 Change 2: Track reference distance

23 Change 2: Track reference distance

24 Change 2: Track reference distance

25 Change 2: Track reference distance

26 Change 3: Threshold reference distance

27 Change 3: Threshold reference distance

28 Summary of StingyCD changes
Before each iteration, check skip condition Constant time \ Constant time

29 Reference update trade-off

30 Scheduling reference updates
Relative time to converge

31 StingyCD empirical performance
Time (s) Relative suboptimality CD CD + Safe screening StingyCD

32 Skipping more updates with StingyCD+
\

33 Skipping more updates with StingyCD+
\

34 Skipping more updates with StingyCD+
\

35 Probability of useful update
StingyCD+ models the probability each update is useful (i.e. nonzero) Efficiently compute probability with lookup table

36 StingyCD+ empirical performance
Time (s) Relative suboptimality CD CD + Safe screening StingyCD StingyCD+ \

37 StingyCD+ empirical performance
Time (s) Relative suboptimality CD CD + Safe screening StingyCD StingyCD+ \

38 Combining StingyCD+ with other methods
Popular sparse logistic regression algorithms: Approximate proximal newton Working set algorithms Both rely on Lasso subproblem solvers Compare CD, StingyCD+ as subproblem solvers \

39 Sparse logistic regression results
Time (s) Relative suboptimality CD ProxNewt CD ProxNewt w/ Working Sets StingyCD+ ProxNewt StingyCD+ ProxNewt w/ Working Sets \

40 Sparse logistic regression results
Time (min) Relative suboptimality CD ProxNewt CD ProxNewt w/ Working Sets StingyCD+ ProxNewt StingyCD+ ProxNewt w/ Working Sets \

41 Takeaways Thank you! StingyCD makes simple changes to CD
Avoids wasteful computation Further gains possible with relaxations Can combine with other methods Future directions Extend to more problem settings Apply ”stingy updates” to other algorithms Thank you!


Download ppt "StingyCD: Safely Avoiding Wasteful Updates in Coordinate Descent"

Similar presentations


Ads by Google