Download presentation

Presentation is loading. Please wait.

1
**Concurrency Control for Machine Learning**

Joseph E. Gonzalez Post-doc, UC Berkeley AMPLab In Collaboration with Xinghao Pan, Stefanie Jegelka, Tamara Broderick, Michael I. Jordan

2
**Serial Machine Learning Algorithm**

Model Parameters Data

3
**Parallel Machine Learning**

Model Parameters Data

4
**Parallel Machine Learning**

! ! Model Parameters Data Concurrency: more machines = less time Correctness: serial equivalence

5
Coordination-free Model Parameters Data

6
Concurrency Control Model Parameters Data

7
Serializability Model Parameters Data

8
**Research Summary Coordination Free (e.g., Hogwild):**

Provably fast and correct under key assumptions. Concurrency Control (e.g., Mutual Exclusion): Provably correct and fast under key assumptions. Research Focus

9
**? Optimistic Concurrency Control Mechanism for ensuring correctness**

Coordination- free Optimistic Concurrency Control ? High Conflicts are rare Low Mutual exclusion Stability & Correctness Low High

10
Optimistic Concurrency Control to parallelize: Non-Parametric Clustering and Sub-modular Maximization

11
**Optimistic Concurrency Control**

! ! Model Parameters Data Optimistic updates Validation: detect conflict Resolution: fix conflict Concurrency Correctness Hsiang-Tsung Kung and John T Robinson. On optimistic methods for concurrency control. ACM Transactions on Database Systems (TODS), 6(2):213–226, 1981.

12
**Example: Serial DP-means Clustering**

Sequential! Brian Kulis and Michael I. Jordan. Revisiting k-means: New algorithms via Bayesian nonparametrics. In Proceedings of 23rd International Conference on Machine Learning, 2012.

13
**Example: OCC DP-means Clustering**

Assumption No new cluster created nearby Validation Resolution First proposal wins

14
**Optimistic Concurrency Control for DP-means**

Theorem: OCC DP-means is serializable. Corollary: OCC DP-means preserves theoretical properties of DP-means. Theorem: Expected overhead of OCC DP-means, in terms of number of rejected proposals, does not depend on size of data set. Correctness Concurrency

15
**~140 million data points; 1, 2, 4, 8 machines**

Evaluation: Amazon EC2 ~140 million data points; 1, 2, 4, 8 machines OCC DP-means Runtime Projected Linear Scaling

16
**Sub-modular Maximization**

Summary Optimistic Concurrency Control to parallelize Non-Parametric Clustering Sub-modular Maximization Next

17
**Motivating Example Bidding on Keywords: Keywords Common Queries Apple**

iPhone Android Games xBox Samsung Microwave Appliances Keywords “How big is Apple iPhone” “iPhone vs Android” “best Android and iPhone games” “Samsung sues Apple over iPhone” “Samsung Microwaves” “Appliance stores in SF” “Playing games on a Samsung TV” “xBox game of the year” Common Queries

18
**Motivating Example Bidding on Keywords: Keywords Common Queries**

Apple iPhone Android Games xBox Samsung Microwave Appliances Keywords “How big is Apple iPhone” “iPhone vs Android” “best Android and iPhone games” “Samsung sues Apple over iPhone” “Samsung Microwaves” “Appliance stores in SF” “Playing games on a Samsung TV” “xBox game of the year” Common Queries Keywords Queries A 1 B 2 C 3 D 4 E 5 F 6 G 7 H 8

19
**Motivating Example Bidding on Keywords: Keywords Queries $2 $5 $1 $4 A**

$3 $6 $5 $1 B 2 C 3 D 4 Costs Value E 5 F 6 G 7 H 8

20
**Motivating Example $12 $5 $7 Bidding on Keywords: Revenue: - Cost:**

Queries $2 $5 $1 $4 A 1 $2 $4 $3 $6 $5 $1 $12 Revenue: Cover Purchase B 2 $5 - Cost: C 3 D 4 $7 Profit: Costs Value E 5 F 6 G 7 H 8

21
**Motivating Example $12 $5 +1 $6 $7 Bidding on Keywords: Revenue:**

Queries $2 $5 $1 $4 A 1 $2 $4 $3 $6 $5 $1 $12 Cover Revenue: Purchase B 2 $5 +1 - Cost: Purchase C 3 D 4 $7 $6 Costs Value Profit: E 5 Submodularity = Diminishing Returns F 6 G 7 H 8

22
**Motivating Example $20 $10 $10 Bidding on Keywords: Revenue: - Cost:**

Queries Purchase $20 $2 A 1 $2 Revenue: Purchase $5 B 2 $2 $10 - Cost: Purchase $1 C 3 $4 Purchase Costs Value $2 D 4 $4 $10 Profit: $5 E 5 $3 $1 F 6 $6 $4 G 7 $5 $2 H 8 $1

23
**Motivating Example $20 +6 $10 - 4 $10 $20 Bidding on Keywords:**

Queries Purchase $20 +6 $2 A 1 $2 Revenue: $5 B 2 $2 $10 - 4 - Cost: Purchase $1 C 3 $4 Purchase Costs Value $2 D 4 $4 $10 $20 Profit: $5 E 5 $3 Purchase $1 F 6 $6 NP-Hard in General $4 G 7 $5 $2 H 8 $1

24
**Submodular Maximization**

NP-Hard in General Buchbinder et al. [FOCS’12] proposed the double-greedy randomized algorithm which is provably optimal.

25
**Double Greedy Algorithm**

Process keywords serially Set X Set Y Keywords Queries A 1 A A f( , X, Y ) = A B 2 B rand C 3 Add X Rem. Y 1 C D 4 D E 5 Keywords to purchase E F 6 F

26
**Double Greedy Algorithm**

Process keywords serially Set X Set Y Keywords Queries A 1 A A f( , X, Y ) = B B 2 B rand C 3 Add X Rem. Y 1 C D 4 D E 5 Keywords to purchase E F 6 F

27
**Double Greedy Algorithm**

Process keywords serially Set X Set Y Keywords Queries A 1 A A f( , X, Y ) = C B 2 rand C 3 Add X Rem. Y 1 C C D 4 D E 5 Keywords to purchase E F 6 F

28
**Concurrency Control Double Greedy Algorithm**

Process keywords in parallel Set X Set Y Within each processor: Keywords Queries f( , Xbnd,Ybnd)= A A 1 A B 2 B Subset of true X Superset of true Y C 3 C Add X Rem. Y 1 Uncertainty D 4 D E 5 Keywords to purchase E F 6 F Sets X and Y are shared by all processors.

29
**Concurrency Control Double Greedy Algorithm**

Process keywords in parallel Set X Set Y Within each processor: Keywords Queries f( , Xbnd,Ybnd)= A A 1 A A B 2 B Subset of true X Superset of true Y C 3 C rand rand Add X Rem. Y 1 Uncertainty D 4 D E 5 Keywords to purchase E Unsafe Must Validate F 6 F Safe Sets X and Y are shared by all processors.

30
**Concurrency Control Double Greedy Algorithm System Design**

Implemented in multicore (shared memory): Model Server (Validator) Set X Set Y A C D E F Validation Queue Published Bounds (X,Y) Bound (X,Y) D Thread 1 f( , Xbnd,Ybnd)= Add X Rem. Y 1 D Uncertainty Trx. Add X D Bound (X,Y) E Thread 2 f( , Xbnd,Ybnd)= Add X Rem. Y 1 E Uncertainty Fail E

31
**Provable Properties Theorem: CC double greedy is serializable.**

Corollary: CC double greedy preserves optimal approximation guarantee of ½OPT. Lemma: CC has bounded overhead. set cover with costs: 2τ sparse max cut: 2cτ/n Correctness Concurrency

32
**Provable Properties – coord free?**

Theorem: CF double greedy is serializable. Lemma: CF double greedy achieves approximation guarantee of ½OPT – ¼ Lemma: CC has bounded overhead. set cover with costs: 2τ sparse max cut: 2cτ/n Correctness depends on uncertainty region similar order of CC overhead! Concurrency

33
**Provable Properties – coord free?**

Theorem: CF double greedy is serializable. Lemma: CF double greedy achieves approximation guarantee of ½OPT – ¼ CF: no coordination overhead. Correctness depends on uncertainty region similar order of CC overhead! Concurrency

34
Early Results

35
**Runtime and Strong-Scaling**

Concurrency Ctrl. Coordination Free IT-2004: Italian Web-graph (41M Vertices, 1.1B Edges) UK-2005: UK Web-graph (39M, 921M Edges) Arabic-2005: Arabic Web-graph (22M, 631M Edges)

36
**Coordination and Guarantees**

Increase in Coordination Bad Decrease in Objective IT-2004: Italian Web-graph (41M Vertices, 1.1B Edges) UK-2005: UK Web-graph (39M, 921M Edges) Arabic-2005: Arabic Web-graph (22M, 631M Edges)

37
**Summary New primitives for robust parallel algorithm design**

Exploit properties in ML algorithms Introduced parallel algorithms for: DP-Means Submodular Maximization Future Work: Integrate with Velox Model Server

Similar presentations

OK

Juan Mendivelso. Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time. Parallel Algorithms:

Juan Mendivelso. Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time. Parallel Algorithms:

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on statistics in maths what does median Ppt on ms office training Ppt on ms excel 2010 Ppt on quality education basketball Ppt on group development wheel Ppt on role of entrepreneur in economic development Ppt on coalition government definition Ppt on red blood cells Ppt on emotional intelligence by daniel goleman Download ppt on rational numbers for class 9