Presentation is loading. Please wait.

Presentation is loading. Please wait.

Programming clusters with DryadLINQ Mihai Budiu Microsoft Research, Silicon Valley Association of C and C++ Users (ACCU) Mountain View, CA, April 13, 2011.

Similar presentations


Presentation on theme: "Programming clusters with DryadLINQ Mihai Budiu Microsoft Research, Silicon Valley Association of C and C++ Users (ACCU) Mountain View, CA, April 13, 2011."— Presentation transcript:

1 Programming clusters with DryadLINQ Mihai Budiu Microsoft Research, Silicon Valley Association of C and C++ Users (ACCU) Mountain View, CA, April 13, 2011

2 Goal 2

3 Design Space 3 Throughput (batch) Latency (interactive) Internet Data center Data- parallel Shared memory

4 Execution Application Data-Parallel Computation 4 Storage Language Parallel Databases Map- Reduce GFS BigTable Cosmos Azure SQL Server Dryad DryadLINQ Scope Sawzall,FlumeJava Hadoop HDFS S3 Pig, Hive SQL≈SQLLINQ, SQLSawzall, Java

5 Software Stack: Talk Outline 5 Windows Server Cluster services Cluster storage Dryad DryadLINQ Windows Server Applications

6 DRYAD 6 Windows Server Cluster services Cluster storage Dryad DryadLINQ Windows Server Applications

7 Dryad Continuously deployed since 2006 Running on >> 10 4 machines Sifting through > 10Pb data daily Runs on clusters > 3000 machines Handles jobs with > 10 5 processes each Platform for rich software ecosystem Used by >> 100 developers Written at Microsoft Research, Silicon Valley 7 The Dryad by Evelyn De Morgan. Evelyn De Morgan

8 Dryad = Execution Layer 8 Job (application) Dryad Cluster Pipeline Shell Machine ≈

9 2-D Piping Unix Pipes: 1-D grep | sed | sort | awk | perl Dryad: 2-D grep 1000 | sed 500 | sort 1000 | awk 500 | perl 50 9

10 Virtualized 2-D Pipelines 10

11 Virtualized 2-D Pipelines 11

12 Virtualized 2-D Pipelines 12

13 Virtualized 2-D Pipelines 13

14 Virtualized 2-D Pipelines 14 2D DAG multi-machine virtualized

15 Dryad Job Structure 15 grep sed sort awk perl grep sed sort awk Input files Vertices (processes) Output files Channels Stage

16 Channels 16 X M Items Finite streams of items distributed filesystem files (persistent) SMB/NTFS files (temporary) TCP pipes (inter-machine) memory FIFOs (intra-machine)

17 Dryad System Architecture 17 Files, TCP, FIFO, Network job schedule data plane control plane NS, Sched RE V VV Job managercluster

18 Fault Tolerance

19 DRYADLINQ 19 Windows Server Cluster services Cluster storage Dryad DryadLINQ Windows Server Applications

20 LINQ 20 Dryad => DryadLINQ

21 21 LINQ =.Net+ Queries Collection collection; bool IsLegal(Key); string Hash(Key); var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};

22 Collections and Iterators 22 class Collection : IEnumerable ; Elements of type T Iterator (current element)

23 DryadLINQ Data Model 23 Partition Collection.Net objects

24 Collection collection; bool IsLegal(Key k); string Hash(Key); var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value}; 24 DryadLINQ = LINQ + Dryad C# collection results C# Vertex code Query plan (Dryad job) Data

25 Demo 25

26 Example: counting lines var table = PartitionedTable.Get (file); int count = table.Count(); Parse, Count Sum

27 Example: counting words var table = PartitionedTable.Get (file); int count = table.SelectMany(l => l.line.Split(‘ ‘)).Count(); Parse, SelectMany, Count Sum

28 Example: counting unique words var table = PartitionedTable.Get (file); int count = table.SelectMany(l => l.line.Split(‘ ‘)).GroupBy(w => w).Count(); GroupBy; Count HashPartition

29 Example: word histogram var table = PartitionedTable.Get (file); var result = table.SelectMany(l => l.line.Split(' ')).GroupBy(w => w).Select(g => new { word = g.Key, count = g.Count() }); GroupBy; Count GroupBy Count HashPartition

30 Example: high-frequency words var table = PartitionedTable.Get (file); var result = table.SelectMany(l => l.line.Split(' ')).GroupBy(w => w).Select(g => new { word = g.Key, count = g.Count() }).OrderByDescending(t => t.count).Take(100); Sort; Take Mergesort; Take

31 Example: words by frequency var table = PartitionedTable.Get (file); var result = table.SelectMany(l => l.line.Split(' ')).GroupBy(w => w).Select(g => new { word = g.Key, count = g.Count() }).OrderByDescending(t => t.count); Sample Histogram Broadcast Range-partition Sort

32 Example: Map-Reduce public static IQueryable MapReduce ( IQueryable input, Func > mapper, Func keySelector, Func,S> reducer) { var map = input.SelectMany(mapper); var group = map.GroupBy(keySelector); var result = group.Select(reducer); return result; }

33 Map-Reduce Plan 33 M R G M Q G1G1 R D MS G2G2 R X X M Q G1G1 R D G2G2 R X M Q G1G1 R D G2G2 R X M Q G1G1 R D M Q G1G1 R D G2G2 R X M Q G1G1 R D G2G2 R X M Q G1G1 R D G2G2 R G2G2 R map sort groupby reduce distribute mergesort groupby reduce mergesort groupby reduce consumer map partial aggregation reduce dynamic

34 Expectation Maximization 34 160 lines 3 iterations shown

35 Probabilistic Index Maps 35 Images features

36 Language Summary 36 Where Select GroupBy OrderBy Aggregate Join

37 What Is It Good For? 37

38 What is Kinect? 38

39 Input device 39

40 The Innards Source: iFixit 40

41 Projected IR pattern Source: www.ros.org 41

42 Depth computation Source: http://nuit-blanche.blogspot.com/2010/11/unsing-kinect-for-compressive-sensing.html 42

43 Kinect video output 30 HZ frame rate 57deg field-of-view 8-bit VGA RGB 640 x 480 11-bit depth 320 x 240 43

44 Depth map Source: www.insidekinect.com 44

45 Vision Problem: What is a human 45 Recognize players from depth map At frame rate Minimal resource usage

46 XBox 360 Hardware Source: http://www.pcper.com/article.php?aid=940&type=expert 46 Triple Core PowerPC 970, 3.2GHz Hyperthreaded, 2 threads/core 500 MHz ATI graphics card DirectX 9.5 512 MB RAM 2005 performance envelope Must handle  real-time vision AND  a modern game

47 Why is it hard?

48 Generic Extensible Architecture 48 Expert 1 Expert 2 Expert 3 Arbiter Stateless Raw data Sensor Skeleton estimates Final estimate probabilistic fuses the hypotheses Stateful

49 Background segmentation Player separation Body Part Classifier One Expert: Pipeline Stages 49 Depth mapSensor Body Part Identification Skeleton

50 Sample test frames 50

51 The Classifier 51 Input Depth map Output Body parts Classifier Runs on GPU @ 320x240

52 52 Start from ground-truth data – depth paired with body parts Train classifier to work across – pose – scene position – Height, body shape Getting the Ground Truth

53 53 Use synthetic data (3D avatar model) Inject noise

54  suit / sensors  expensive  very accurate  high frame rate  large space  calibration

55 Learn from Data 55 Classifier Training examples Machine learning

56 Cluster-based training 56 Classifier Training examples Dryad DryadLINQ Machine learning > Millions of input frames > 10 20 objects manipulated Sparse, multi-dimensional data Complex datatypes (images, video, matrices, etc.)

57 Highly efficient parallellization 57 time machine

58 CONCLUSIONS 58

59 Conclusions 59 =

60 I can finally explain to my son what I do for a living… 60


Download ppt "Programming clusters with DryadLINQ Mihai Budiu Microsoft Research, Silicon Valley Association of C and C++ Users (ACCU) Mountain View, CA, April 13, 2011."

Similar presentations


Ads by Google