Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Software and Services Group 1 Concurrent Collections (CnC) Kath Knobe Technology Pathfinding and Innovation (TPI) Software and Services Group (SSG) Intel.

Similar presentations


Presentation on theme: "1 Software and Services Group 1 Concurrent Collections (CnC) Kath Knobe Technology Pathfinding and Innovation (TPI) Software and Services Group (SSG) Intel."— Presentation transcript:

1 1 Software and Services Group 1 Concurrent Collections (CnC) Kath Knobe Technology Pathfinding and Innovation (TPI) Software and Services Group (SSG) Intel

2 2 Software and Services Group 2 Cholesky Performance Intel 2-socket x 4-core 2.8 GHz + Intel MKL 10.2 IPDPS10 (best paper) Aparna Chandramowlishwaran

3 3 Software and Services Group 3 Eigensolver Performance Intel 2-socket x 4-core 2.8 GHz + Intel® Math Kernel Libraries 10.2 Multithreaded Intel® MKL Baseline Concurrent Collections Higher is better

4 4 Software and Services Group 4 The Problem Most serial languages over-constrain orderings Require arbitrary serialization Allow for overwriting of data The decision of if and when to execute are bound together Most parallel programming languages are embedded within serial languages Inherit problems of serial languages They are too specific wrt type of parallelism in the application and wrt the type target architecture For the tuning expert, they dont provide quite enough control

5 5 Software and Services Group 5 explicitly parallel languages (over-constrained) Concurrent Collections (only semantically required constraints) explicitly serial languages (over-constrained) Raise the level of the programming model just enough to avoid over-constraints The Solution

6 6 Software and Services Group 6 6 Introduction The Big Idea Simple C++ Example Execution Agenda

7 7 Software and Services Group 7 7 So Whats the Big Idea? Dont specify what operations run in parallel Difficult and depends on target Specify the semantic ordering constraints only Easier and depends only on application Semantic ordering constraints: The meaning of the program require some computations to execute before others.

8 8 Software and Services Group 8 8 Exactly Two Sources of Semantic Ordering Requirements Producer / Consumer (Data Dependence) Producer must execute before consumer Controller / Controllee (Control Dependence) Controller must execute before controllee

9 9 Software and Services Group 9 9 -The domain expert does not need to know about parallelism Separation of Concerns Between Domain Expert and Tuning Expert -The tuning expert does not need to know about the domain - CnC maximizes flexibility for good performance Goal: separation of concerns The work of the domain expert Semantic correctness Constraints required by the application The work of the tuning expert Architecture Actual parallelism Locality Overhead Load balancing Distribution among processors Scheduling within a processor

10 10 Software and Services Group 10 Concurrent Collections Spec -The domain expert does not need to know about parallelism Separation of Concerns Between Domain Expert and Tuning Expert -The tuning expert does not need to know about the domain - CnC maximizes flexibility for good performance Goal: separation of concerns The work of the domain expert Semantic correctness Constraints required by the application The work of the tuning expert Architecture Actual parallelism Locality Overhead Load balancing Distribution among processors Scheduling within a processor

11 11 Software and Services Group 11 Concurrent Collections Spec The application problem -The domain expert does not need to know about parallelism Separation of Concerns Between Domain Expert and Tuning Expert -The tuning expert does not need to know about the domain - CnC maximizes flexibility for good performance Goal: separation of concerns The work of the domain expert Semantic correctness Constraints required by the application The work of the tuning expert Architecture Actual parallelism Locality Overhead Load balancing Distribution among processors Scheduling within a processor

12 12 Software and Services Group 12 Mapping to target platform Concurrent Collections Spec The application problem -The domain expert does not need to know about parallelism Separation of Concerns Between Domain Expert and Tuning Expert -The tuning expert does not need to know about the domain - CnC maximizes flexibility for good performance Goal: separation of concerns The work of the domain expert Semantic correctness Constraints required by the application The work of the tuning expert Architecture Actual parallelism Locality Overhead Load balancing Distribution among processors Scheduling within a processor

13 13 Software and Services Group 13 Notation (foo) [x] Computation Step Data Item Control Tag (foo) [x] TextualGraphical

14 14 Software and Services Group 14 Producer - consumer (step1)(step2) [item] Exactly Two Sources of Ordering Requirements Producer / Consumer (Data Dependence) Producer must execute before consumer Controller / Controllee (Control Dependence) Controller must execute before controllee

15 15 Software and Services Group 15 Producer - consumer Controller - controllee (step1)(step2) [item] (step1)(step2) Exactly Two Sources of Ordering Requirements Producer / Consumer (Data Dependence) Producer must execute before consumer Controller / Controllee (Control Dependence) Controller must execute before controllee

16 16 Software and Services Group 16 loop k = … loop j = … loop i = … Z(i, j, k) = = A(k, j) end Tag collections: new concept (step1) (step2) Loop iteration space is a sequence,, … Restricted to integer indices. Body accesses values. IF = WHEN

17 17 Software and Services Group 17 Tag collections: new concept (step1) (step2) (Step2) components of tag are k, j and i loop k = … loop j = … loop i = … Z(i, j, k) = … … = A(k, j) end A tag collection is a generalization of an iteration space An unordered set, not an ordered sequence Not just integer indices. Also graph nodes, tree nodes, set elements, … Body accesses values. IF WHEN

18 18 Software and Services Group 18 Tag collections: new concept (step1) (step2) (Step2) components of tag are k, j and i loop k = … loop j = … loop i = … Z(i, j, k) = Z(i-1, j, k) + … … = A(k, j) end [z]

19 19 Software and Services Group 19 Tag collections: new concept (step1) (step2) (Step2) Single component of tag is k loop k = … loop j = … loop i = … Z(i, j, k) = = A(k, j) end

20 20 Software and Services Group 20 Introduction The Big Idea Simple C++ Example Execution Agenda

21 21 Software and Services Group 21 Cholesky factorization

22 22 Software and Services Group 22 Cholesky Cholesky factorization

23 23 Software and Services Group 23 Cholesky Trisolve Cholesky factorization

24 24 Software and Services Group 24 Cholesky TrisolveUpdate Cholesky factorization

25 25 Software and Services Group 25 Cholesky Cholesky factorization

26 26 Software and Services Group 26 Example: The White Board Drawing What are the high level operations? What are the chunks of data? What are the producer/consumer relationships? What are the inputs and outputs? (Cholesky) [array] (Trisolve) (Update)

27 27 Software and Services Group 27 (Cholesky: iter) [array: iter, row, col] (Trisolve: row, iter) (Update: col, row, iter) Make it precise enough to execute Distinguish among the instances? What are the distinct control tag collections? What steps produce them?

28 28 Software and Services Group 28 Essentially making it ready for parallel execution But without explicit thinking about parallelism Focus on domain/application knowledge Result is: Parallel Deterministic (wrt results) Race-free

29 29 Software and Services Group 29 (Correction filter: K, cell) (Cell tracker: K) (Arbitrator initial: K) (Arbitrator final: K) [Input Image: K] [Histograms: K] [Motion corrections: K, cell] [Labeled cells initial: K] [Predicted states: K, cell] [Cell candidates: K ] [State measurements: K] [Labeled cells final: K] [Final states: K] (Prediction filter: K, cell) (Cell detector: K) Cell tracker

30 30 Software and Services Group 30 Another Application: Face Tracker Look for face in all possible windows of an image Example: in a 3 x 3 image there are 14 windows Sequence of classifiers. Any can determine: not face 30 Four 2 X 2 windows Nine 1 x 1 windows One 3 x 3 window

31 31 Software and Services Group 31 Example: The White Board Drawing What are the high level operations? What are the chunks of data? What are the producer/consumer relationships? What are the inputs and outputs? 31 [image] (classifier1) (classifier2) (classifier3)

32 32 Software and Services Group 32 Make it precise enough to execute (classifier1) (classifier2) (classifier3) Distinguish among the instances? What are the distinct control tag collections? What steps produce them? [image]

33 33 Software and Services Group 33 Objects – summary Step Collections - Are tagged. A step has access to its tag value - Performs gets and puts - Functional. Only side-effects are put objects Item Collections –Means of communication among step instances (data dependence) –Dynamic single assignment Each instance is associated with exactly one contents. –Are tagged Tag Collections –Means of communication among step instances (control dependence) –A tag collection may control multiple step collections –Determines what step instances will execute (foo) [x]

34 34 Software and Services Group 34 Relationships – summary Prescription >Every step collection is prescribed. >The relationship is always the identity function. >A tag collection may prescribe multiple step collections. Consumer - Corresponds to gets in steps - A step may consume multiple distinct item collections [x], [y] -> (foo) - A step may consume multiple instances of items from a given collection [x: neighbors(i)] -> (foo: i) Producer - Corresponds to puts in steps - A step may produce to multiple distinct collections (foo) -> [x], [y] - A step may produce multiple instances to a given collection (foo: i) -> [x: neighbors(i)] (step1) (step2) [item] (step1) [item] (step1)

35 35 Software and Services Group 35.cnc file // delcarations [BlockedMatrix * array: int, int, int]; // Control relations :: (Trisolve: row, iter); // Producer and consumer relations [array: row, iter, iter], [array: iter, iter, iter+1] -> (Trisolve: row, iter); (Trisolve: row, iter) -> [array: row, iter, iter+1]; [array: iter, row, col] (Trisolve)

36 36 Software and Services Group 36 Coding Hints: Trisolve StepReturnValue_t Trisolve( cholesky_graph_t& graph, const Tag_t& TS_tag) { // For each input item for this step retrieve the item using the proper tag // User code to create item tag here BlockedMatrix *... = graph.array.Get(Tag_t(...)); string span = // Step implementation logic goes here... // For each output item for this step, put the new item using the proper tag // User code to create item tag here graph.array.Put(Tag_t(...),...); span); } return CNC_Success; } [array: iter, row, col] (Trisolve)

37 37 Software and Services Group 37 A model not a language: 1 Variety of ways to access the model >Textual representation > Class library > Graphical user interface

38 38 Software and Services Group 38 A model not a language: 2 There are variants within the model – Distinct trade-offs >Efficiency >Ease-of-use >Generality >Guarantees >… Possible different functionality >Continuous (or not) >Tag functions are analyzable (or not) >Real-time (or not) >…

39 39 Software and Services Group 39 Program execution: Attributes are monotonically acquired Item instance

40 40 Software and Services Group 40 Program execution: Attributes are monotonically acquired available Item instance

41 41 Software and Services Group 41 Program execution: Attributes are monotonically acquired available Item instance Tag instance

42 42 Software and Services Group 42 Program execution: Attributes are monotonically acquired available Item instance Tag instance available

43 43 Software and Services Group 43 Program execution: Attributes are monotonically acquired available Item instance Tag instance available Step instance Inputs-available prescribed enabledexecuted

44 44 Software and Services Group 44 Program execution: Attributes are monotonically acquired available Item instance Tag instance available Step instance Inputs-available prescribed enabledexecuted

45 45 Software and Services Group 45 Program execution: Attributes are monotonically acquired available Item instance Tag instance available Step instance Inputs-available prescribed enabledexecuted

46 46 Software and Services Group 46 Program execution: Attributes are monotonically acquired available Item instance Tag instance available Step instance Inputs-available prescribed enabledexecuted

47 47 Software and Services Group 47 Program execution: Attributes are monotonically acquired available Item instance Tag instance available Step instance Inputs-available prescribed enabledexecuted

48 48 Software and Services Group 48 Program execution: Attributes are monotonically acquired available Item instance Tag instance available Step instance Inputs-available prescribed enabledexecuted

49 49 Software and Services Group 49 Program execution: Attributes are monotonically acquired dead Item instance Tag instance available Step instance Inputs-available prescribed enabledexecuted

50 50 Software and Services Group 50 Program execution: Attributes are monotonically acquired dead Item instance Tag instance dead Step instance Inputs-available prescribed enabledexecuted

51 51 Software and Services Group 51 Introduction and history The Big Idea Simple C++ Examples Execution Agenda

52 52 Software and Services Group 52 Plays well with a variety of runtime approaches graindistributionschedule HP TStreams Intel CnC Georgia Tech CnC HP TStreams Rice CnC static dynamic staticdynamicdynamic 1 staticdynamic dynamic 2 dynamic 1 We want to allow you to plug your own scheduler Mandviwala et. al. LCPC 07 [] 2

53 53 Software and Services Group 53 Plays well with a variety of tuning experts The domain expert / later time Different person / different expertise Static analysis Dynamic Runtime (This is the option currently available on our website.) …

54 54 Software and Services Group 54 Plays well with various types of parallelism Possible parallelism Data / loop parallelism Task parallelism Pipeline parallelism Tree based parallelism Fork/join Rescheduling serial executions Memory hierarchy opt Power Could have different runtime schedulers for distinct subgraphs of the application

55 55 Software and Services Group 55 CnC Plays Nicely With Serial Languages Intel: C++ / TBB Rice: Java / Habanaro Intel: Haskell (preliminary) Rice:.NET (preliminary) … 55

56 56 Software and Services Group 56 CnC Plays Nicely With target architectures Shared memory / distributed memory Homogeneous / heterogeneous Flat / hierarchical … 56

57 57 Software and Services Group 57 TPI Kath Knobe Geoff Lowney Mark Hampton Ryan Newton Frank Schlimbach ICL Chih-Ping Chen Melanie Blower Shin Lee Steve Rose Leo Treggiari Ganesh Rao Nikolay Kurtov Jeff Arnold (soon) Mario Deilmann The academic community Rice University Vivek Sarkar Zoran Budimlic Sagnak Tasirlar David Peixotto Mike Burke (+ IBM) Georgia Tech Rich Vuduc Aparna Chandramowlishwaran Novosibirsk Nikolay Kurtov UCLA Jens Palsberg The HP community Carl Offner Alex Nelson The Intel community

58 58 Software and Services Group 58 CnC10 Workshop Co-located with Languages and Compilers for parallel Computers (LCPC) at Rice University in October 2010

59 59 Software and Services Group 59 Intel (C++): Rice (Java):

60 60 Software and Services Group 60


Download ppt "1 Software and Services Group 1 Concurrent Collections (CnC) Kath Knobe Technology Pathfinding and Innovation (TPI) Software and Services Group (SSG) Intel."

Similar presentations


Ads by Google