Download presentation

Presentation is loading. Please wait.

Published byKarley Baptiste Modified over 2 years ago

1
Machine Learning in DryadLINQ Kannan Achan Mihai Budiu MSR-SVC, 1/30/2008 1

2
2 Goal

3
The Software Stack Windows Server Cluster Services Distributed Filesystem: Cosmos Dryad DryadLINQ Windows Server Large Vector Machine learning Data analysis 3

4
Dryad 4

5
Dryad Jobs RR XXX MMM XX M M Vertices (processes) Channels Output files Input files Stage M RR X 5

6
6 LINQ and C#

7
LINQ Collection collection; bool IsLegal(Key); string Hash(Key); var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value}; 7

8
Collection collection; bool IsLegal(Key k); string Hash(Key); var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value}; DryadLINQ = LINQ + Dryad C# collection results C# 8 Vertex code Query plan (Dryad job) Data

9
Recall: The Software Stack Windows Server Cluster Services Distributed Filesystem: Cosmos Dryad DryadLINQ Windows Server Large Vector Machine learning Data analysis 9

10
Very Large Vector Library PartitionedVector 10 T Scalar TT T

11
Operations on Large Vectors: Map 1 11 U T T U f f f preserves partitioning

12
V Map 2 (Pairwise) 12 T U f V U T f

13
Map 3 (Vector-Scalar) 13 T U f V V U T f

14
Reduce (Fold) 14 UUU U f fff f UUU U

15
Linear Algebra 15 T U V =,, T

16
Linear Regression Data Find S.t. 16

17
Analytic Solution 17 X×X T Y×X T Σ X[0]X[1]X[2]Y[0]Y[1]Y[2] Σ [ ] -1 * A Map Reduce

18
Linear Regression Code 18 Matrices xx = x.PairwiseOuterProduct(x); OneMatrix xxs = xx.Sum(); Matrices yx = y.PairwiseOuterProduct(x); OneMatrix yxs = yx.Sum(); OneMatrix xxinv = xxs.Map(a => a.Inverse()); OneMatrix A = yxs.Map( xxinv, (a, b) => a.Multiply(b));

19
Expectation Maximization 19 160 lines 3 iterations shown

20
Understanding Botnet Traffic using EM 20 3 GB data 15 clusters 60 computers 50 iterations 9000 processes 50 minutes

21
Conclusions Dryad simplifies programming large clusters DryadLINQ = declarative programming for Dryad jobs The Large Vector library provides simple mathematical primitives on top of DryadLINQ Matlab-style coding for writing distributed numeric computations 21 Win Cluster Services Distributed Filesystem Dryad DryadLINQ Win Large Vector ML Data analysis

22
Backup Slides 22

23
Chaining 23 X×X T Y×X T Σ X[0]X[1]X[2]Y[0]Y[1]Y[2] Σ [ ] -1 * A ΣΣΣΣΣΣ

24
EM Structure 24 E stage Input size π σ μ All parameters

Similar presentations

Presentation is loading. Please wait....

OK

Title Subtitle.

Title Subtitle.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on earthquake resistant design of structures Ppt on cse related topics such Ppt on credit policy and procedure Ppt on effect of western culture on indian youth Ppt on historical places in jaipur Ppt on robert frost poems Ppt on credit policy Ppt on beer lambert law states Ppt on unity in diversity slogans Ppt on artificial intelligence in electrical engineering