Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed computing using Dryad Michael Isard Microsoft Research Silicon Valley.

Similar presentations


Presentation on theme: "Distributed computing using Dryad Michael Isard Microsoft Research Silicon Valley."— Presentation transcript:

1 Distributed computing using Dryad Michael Isard Microsoft Research Silicon Valley

2 Distributed Data-Parallel Computing Cloud – Transparent scaling – Resource virtualization Commodity clusters – Fault tolerance with good performance Workloads beyond standard SQL, HPC – Data-mining, graph analysis, … – Semi-structured/unstructured data

3 Execution layer This talk: system-level middleware – Yuan Yu will describe DryadLINQ programming model on Saturday Algorithm -> execution plan by magic

4 Problem domain Large inputs – Tens of GB “small test dataset” – Single job up to hundreds of TB – Semi-structured data is common Not latency sensitive – Overhead of seconds for trivial job – Large job could take days – Batch computation not online queries Simplifies fault tolerance, caching, etc.

5 Talk overview Some typical computations DAG implementation choices The Dryad execution engine Comparison with MapReduce Discussion

6 Map Independent transformation of dataset – for each x in S, output x’ = f(x) E.g. simple grep for word w – output line x only if x contains w

7 Map Independent transformation of dataset – for each x in S, output x’ = f(x) E.g. simple grep for word w – output line x only if x contains w S f S’

8 Map Independent transformation of dataset – for each x in S, output x’ = f(x) E.g. simple grep for word w – output line x only if x contains w S1S1 f S1’S1’ S2S2 f S2’S2’ S3S3 f S3’S3’

9 Reduce Grouping plus aggregation – 1) Group x in S according to key selector k(x) – 2) For each group g, output r(g) E.g. simple word count – group by k(x) = x – for each group g output key (word) and count of g

10 Reduce Grouping plus aggregation – 1) Group x in S according to key selector k(x) – 2) For each group g, output r(g) E.g. simple word count – group by k(x) = x – for each group g output key (word) and count of g S G S’ r

11 Reduce S G S’ r

12 Reduce S1S1 S1’S1’ GD S2’S2’ G S2S2 D S3S3 D D is distribute, e.g. by hash or range r r S3’S3’ Gr

13 Reduce S1S1 G S1’S1’ G D S2’S2’ G S2S2 GD S3S3 GD ir is initial reduce, e.g. compute a partial sum r r ir

14 K-means Set of points P, initial set of cluster centres C Iterate until convergence: – For each c in C Initialize count c, centre c to 0 – For each p in P Find c in C that minimizes dist(p,c) Update: count c += 1, centre c += p – For each c in C Replace c <- centre c /count c

15 K-means C0C0 accc P accc accc C1C1 C2C2 C3C3

16 K-means C0C0 ac P1P1 P2P2 P3P3 cc C1C1 ac cc C2C2 ac cc C3C3

17 Graph algorithms Set N of nodes with data (n,x) Set E of directed edges (n,m) Iterate until convergence: – For each node (n,x) in N For each outgoing edge n->m in E, n m = f(x,n,m) – For each node (m,x) in N Find set of incoming updates i m = {n m : n->m in E} Replace (m,x) <- (m,r(i m )) E.g. power iteration (PageRank)

18 PageRank N0N0 aede E aede aede N1N1 N2N2 N3N3

19 PageRank N01N01 ae E1E1 E2E2 E3E3 cc N02N02 N03N03 D D D N11N11 N12N12 N13N13 ae cc D D D N21N21 N22N22 N23N23 ae cc D D D N31N31 N32N32 N33N33

20 DAG abstraction Absence of cycles – Allows re-execution for fault-tolerance – Simplifies scheduling: no deadlock Cycles can often be replaced by unrolling – Unsuitable for fine-grain inner loops Very popular – Databases, functional languages, …

21 Rewrite graph at runtime Loop unrolling with convergence tests Adapt partitioning scheme at run time – Choose #partitions based on runtime data volume – Broadcast Join vs. Hash Join, etc. Adaptive aggregation and distribution trees – Based on data skew and network topology Load balancing – Data/processing skew (cf work-stealing)

22 Push vs Pull Databases typically ‘pull’ using iterator model – Avoids buffering – Can prevent unnecessary computation But DAG must be fully materialized – Complicates rewriting – Prevents resource virtualization in shared cluster S1S1 S1’S1’ GD S2’S2’ G S2S2 D r r

23 Fault tolerance Buffer data in (some) edges Re-execute on failure using buffered data Speculatively re-execute for stragglers ‘Push’ model makes this very simple

24 Dryad General-purpose execution engine – Batch processing on immutable datasets – Well-tested on large clusters Automatically handles – Fault tolerance – Distribution of code and intermediate data – Scheduling of work to resources

25 Dryad System Architecture R Scheduler

26 Dryad System Architecture R R Scheduler

27 Dryad System Architecture R R Scheduler

28 Dryad Job Model Directed acyclic graph (DAG) Clean abstraction – Hides cluster services – Clients manipulate graphs Flexible and expressive – General-purpose programs – Complicated execution plans

29 Dryad Inputs and Outputs Partitioned data set – Records do not cross partition boundaries – Data on compute machines: NTFS, SQLServer, … Optional semantics – Hash-partition, range-partition, sorted, etc. Loading external data – Partitioning “automatic” – File system chooses sensible partition sizes – Or known partitioning from user

30 Channel abstraction S1S1 G S1’S1’ G D S2’S2’ G S2S2 GD S3S3 GD r r ir

31 Push vs Pull Channel types define connected component – Shared-memory or TCP must be gang-scheduled Pull within gang, push between gangs

32 MapReduce (Hadoop) MapReduce restricts – Topology of DAG – Semantics of function in compute vertex Sequence of instances for non-trivial tasks G S1’S1’ G D S2’S2’ G GD GD r r ir S1S1 S2S2 S3S3 f f f

33 MapReduce complexity Simple to describe MapReduce system Can be hard to map algorithm to framework – cf k-means: combine C+P, broadcast C, iterate, … – HIVE, PigLatin etc. mitigate programming issues Implementation not uniform – Different fault-tolerance for mappers, reducers – Add more special cases for performance Hadoop introducing TCP channels, pipelines, … – Dryad has same state machine everywhere

34 Discussion DAG abstraction supports many computations – Can be targeted by high-level languages! – Run-time rewriting extends applicability DAG-structured jobs scale to large clusters – Over 10k computers in large Dryad clusters – Transient failures common, disk failures daily Trade off fault-tolerance against performance – Buffer vs TCP, still manual choice in Dryad system – Also external vs in-memory working set

35 Conclusion Dryad well-tested, scalable – Daily use supporting Bing for over 3 years Applicable to large number of computations – 250 computer cluster at MSR SVC, Mar->Nov 09 47 distinct users (~50 lab members + interns) 15k jobs (tens of millions of processes executed) Hundreds of distinct programs – Network trace analysis, privacy-preserving inference, light- transport simulation, decision-tree training, deep belief network training, image feature extraction, …


Download ppt "Distributed computing using Dryad Michael Isard Microsoft Research Silicon Valley."

Similar presentations


Ads by Google