Presentation is loading. Please wait.

Presentation is loading. Please wait.

Machine Learning in DryadLINQ Kannan Achan Mihai Budiu MSR-SVC, 1/30/2008 1.

Similar presentations


Presentation on theme: "Machine Learning in DryadLINQ Kannan Achan Mihai Budiu MSR-SVC, 1/30/2008 1."— Presentation transcript:

1 Machine Learning in DryadLINQ Kannan Achan Mihai Budiu MSR-SVC, 1/30/2008 1

2 2 Goal

3 The Software Stack Windows Server Cluster Services Distributed Filesystem: Cosmos Dryad DryadLINQ Windows Server Large Vector Machine learning Data analysis 3

4 Dryad 4

5 Dryad Jobs RR XXX MMM XX M M Vertices (processes) Channels Output files Input files Stage M RR X 5

6 6 LINQ and C#

7 LINQ Collection collection; bool IsLegal(Key); string Hash(Key); var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value}; 7

8 Collection collection; bool IsLegal(Key k); string Hash(Key); var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value}; DryadLINQ = LINQ + Dryad C# collection results C# 8 Vertex code Query plan (Dryad job) Data

9 Recall: The Software Stack Windows Server Cluster Services Distributed Filesystem: Cosmos Dryad DryadLINQ Windows Server Large Vector Machine learning Data analysis 9

10 Very Large Vector Library PartitionedVector 10 T Scalar TT T

11 Operations on Large Vectors: Map 1 11 U T T U f f f preserves partitioning

12 V Map 2 (Pairwise) 12 T U f V U T f

13 Map 3 (Vector-Scalar) 13 T U f V V U T f

14 Reduce (Fold) 14 UUU U f fff f UUU U

15 Linear Algebra 15 T U V =,, T

16 Linear Regression Data Find S.t. 16

17 Analytic Solution 17 X×X T Y×X T Σ X[0]X[1]X[2]Y[0]Y[1]Y[2] Σ [ ] -1 * A Map Reduce

18 Linear Regression Code 18 Matrices xx = x.PairwiseOuterProduct(x); OneMatrix xxs = xx.Sum(); Matrices yx = y.PairwiseOuterProduct(x); OneMatrix yxs = yx.Sum(); OneMatrix xxinv = xxs.Map(a => a.Inverse()); OneMatrix A = yxs.Map( xxinv, (a, b) => a.Multiply(b));

19 Expectation Maximization 19 160 lines 3 iterations shown

20 Understanding Botnet Traffic using EM 20 3 GB data 15 clusters 60 computers 50 iterations 9000 processes 50 minutes

21 Conclusions Dryad simplifies programming large clusters DryadLINQ = declarative programming for Dryad jobs The Large Vector library provides simple mathematical primitives on top of DryadLINQ Matlab-style coding for writing distributed numeric computations 21 Win Cluster Services Distributed Filesystem Dryad DryadLINQ Win Large Vector ML Data analysis

22 Backup Slides 22

23 Chaining 23 X×X T Y×X T Σ X[0]X[1]X[2]Y[0]Y[1]Y[2] Σ [ ] -1 * A ΣΣΣΣΣΣ

24 EM Structure 24 E stage Input size π σ μ All parameters


Download ppt "Machine Learning in DryadLINQ Kannan Achan Mihai Budiu MSR-SVC, 1/30/2008 1."

Similar presentations


Ads by Google