Download presentation
Presentation is loading. Please wait.
Published byAusten Phelps Modified over 10 years ago
1
1 Software and Services Group 1 Concurrent Collections (CnC) Kath Knobe Technology Pathfinding and Innovation (TPI) Software and Services Group (SSG) Intel
2
2 Software and Services Group 2 Cholesky Performance Intel 2-socket x 4-core Nehalem @ 2.8 GHz + Intel MKL 10.2 IPDPS10 (best paper) Aparna Chandramowlishwaran
3
3 Software and Services Group 3 Eigensolver Performance Intel 2-socket x 4-core Nehalem @ 2.8 GHz + Intel® Math Kernel Libraries 10.2 Multithreaded Intel® MKL Baseline Concurrent Collections Higher is better
4
4 Software and Services Group 4 The Problem Most serial languages over-constrain orderings Require arbitrary serialization Allow for overwriting of data The decision of if and when to execute are bound together Most parallel programming languages are embedded within serial languages Inherit problems of serial languages They are too specific wrt type of parallelism in the application and wrt the type target architecture For the tuning expert, they dont provide quite enough control
5
5 Software and Services Group 5 explicitly parallel languages (over-constrained) Concurrent Collections (only semantically required constraints) explicitly serial languages (over-constrained) Raise the level of the programming model just enough to avoid over-constraints The Solution
6
6 Software and Services Group 6 6 Introduction The Big Idea Simple C++ Example Execution Agenda
7
7 Software and Services Group 7 7 So Whats the Big Idea? Dont specify what operations run in parallel Difficult and depends on target Specify the semantic ordering constraints only Easier and depends only on application Semantic ordering constraints: The meaning of the program require some computations to execute before others.
8
8 Software and Services Group 8 8 Exactly Two Sources of Semantic Ordering Requirements Producer / Consumer (Data Dependence) Producer must execute before consumer Controller / Controllee (Control Dependence) Controller must execute before controllee
9
9 Software and Services Group 9 9 -The domain expert does not need to know about parallelism Separation of Concerns Between Domain Expert and Tuning Expert -The tuning expert does not need to know about the domain - CnC maximizes flexibility for good performance Goal: separation of concerns The work of the domain expert Semantic correctness Constraints required by the application The work of the tuning expert Architecture Actual parallelism Locality Overhead Load balancing Distribution among processors Scheduling within a processor
10
10 Software and Services Group 10 Concurrent Collections Spec -The domain expert does not need to know about parallelism Separation of Concerns Between Domain Expert and Tuning Expert -The tuning expert does not need to know about the domain - CnC maximizes flexibility for good performance Goal: separation of concerns The work of the domain expert Semantic correctness Constraints required by the application The work of the tuning expert Architecture Actual parallelism Locality Overhead Load balancing Distribution among processors Scheduling within a processor
11
11 Software and Services Group 11 Concurrent Collections Spec The application problem -The domain expert does not need to know about parallelism Separation of Concerns Between Domain Expert and Tuning Expert -The tuning expert does not need to know about the domain - CnC maximizes flexibility for good performance Goal: separation of concerns The work of the domain expert Semantic correctness Constraints required by the application The work of the tuning expert Architecture Actual parallelism Locality Overhead Load balancing Distribution among processors Scheduling within a processor
12
12 Software and Services Group 12 Mapping to target platform Concurrent Collections Spec The application problem -The domain expert does not need to know about parallelism Separation of Concerns Between Domain Expert and Tuning Expert -The tuning expert does not need to know about the domain - CnC maximizes flexibility for good performance Goal: separation of concerns The work of the domain expert Semantic correctness Constraints required by the application The work of the tuning expert Architecture Actual parallelism Locality Overhead Load balancing Distribution among processors Scheduling within a processor
13
13 Software and Services Group 13 Notation (foo) [x] Computation Step Data Item Control Tag (foo) [x] TextualGraphical
14
14 Software and Services Group 14 Producer - consumer (step1)(step2) [item] Exactly Two Sources of Ordering Requirements Producer / Consumer (Data Dependence) Producer must execute before consumer Controller / Controllee (Control Dependence) Controller must execute before controllee
15
15 Software and Services Group 15 Producer - consumer Controller - controllee (step1)(step2) [item] (step1)(step2) Exactly Two Sources of Ordering Requirements Producer / Consumer (Data Dependence) Producer must execute before consumer Controller / Controllee (Control Dependence) Controller must execute before controllee
16
16 Software and Services Group 16 loop k = … loop j = … loop i = … Z(i, j, k) = = A(k, j) end Tag collections: new concept (step1) (step2) Loop iteration space is a sequence,, … Restricted to integer indices. Body accesses values. IF = WHEN
17
17 Software and Services Group 17 Tag collections: new concept (step1) (step2) (Step2) components of tag are k, j and i loop k = … loop j = … loop i = … Z(i, j, k) = … … = A(k, j) end A tag collection is a generalization of an iteration space An unordered set, not an ordered sequence Not just integer indices. Also graph nodes, tree nodes, set elements, … Body accesses values. IF WHEN
18
18 Software and Services Group 18 Tag collections: new concept (step1) (step2) (Step2) components of tag are k, j and i loop k = … loop j = … loop i = … Z(i, j, k) = Z(i-1, j, k) + … … = A(k, j) end [z]
19
19 Software and Services Group 19 Tag collections: new concept (step1) (step2) (Step2) Single component of tag is k loop k = … loop j = … loop i = … Z(i, j, k) = = A(k, j) end
20
20 Software and Services Group 20 Introduction The Big Idea Simple C++ Example Execution Agenda
21
21 Software and Services Group 21 Cholesky factorization
22
22 Software and Services Group 22 Cholesky Cholesky factorization
23
23 Software and Services Group 23 Cholesky Trisolve Cholesky factorization
24
24 Software and Services Group 24 Cholesky TrisolveUpdate Cholesky factorization
25
25 Software and Services Group 25 Cholesky Cholesky factorization
26
26 Software and Services Group 26 Example: The White Board Drawing What are the high level operations? What are the chunks of data? What are the producer/consumer relationships? What are the inputs and outputs? (Cholesky) [array] (Trisolve) (Update)
27
27 Software and Services Group 27 (Cholesky: iter) [array: iter, row, col] (Trisolve: row, iter) (Update: col, row, iter) Make it precise enough to execute Distinguish among the instances? What are the distinct control tag collections? What steps produce them?
28
28 Software and Services Group 28 Essentially making it ready for parallel execution But without explicit thinking about parallelism Focus on domain/application knowledge Result is: Parallel Deterministic (wrt results) Race-free
29
29 Software and Services Group 29 (Correction filter: K, cell) (Cell tracker: K) (Arbitrator initial: K) (Arbitrator final: K) [Input Image: K] [Histograms: K] [Motion corrections: K, cell] [Labeled cells initial: K] [Predicted states: K, cell] [Cell candidates: K ] [State measurements: K] [Labeled cells final: K] [Final states: K] (Prediction filter: K, cell) (Cell detector: K) Cell tracker
30
30 Software and Services Group 30 Another Application: Face Tracker Look for face in all possible windows of an image Example: in a 3 x 3 image there are 14 windows Sequence of classifiers. Any can determine: not face 30 Four 2 X 2 windows Nine 1 x 1 windows One 3 x 3 window
31
31 Software and Services Group 31 Example: The White Board Drawing What are the high level operations? What are the chunks of data? What are the producer/consumer relationships? What are the inputs and outputs? 31 [image] (classifier1) (classifier2) (classifier3)
32
32 Software and Services Group 32 Make it precise enough to execute (classifier1) (classifier2) (classifier3) Distinguish among the instances? What are the distinct control tag collections? What steps produce them? [image]
33
33 Software and Services Group 33 Objects – summary Step Collections - Are tagged. A step has access to its tag value - Performs gets and puts - Functional. Only side-effects are put objects Item Collections –Means of communication among step instances (data dependence) –Dynamic single assignment Each instance is associated with exactly one contents. –Are tagged Tag Collections –Means of communication among step instances (control dependence) –A tag collection may control multiple step collections –Determines what step instances will execute (foo) [x]
34
34 Software and Services Group 34 Relationships – summary Prescription >Every step collection is prescribed. >The relationship is always the identity function. >A tag collection may prescribe multiple step collections. Consumer - Corresponds to gets in steps - A step may consume multiple distinct item collections [x], [y] -> (foo) - A step may consume multiple instances of items from a given collection [x: neighbors(i)] -> (foo: i) Producer - Corresponds to puts in steps - A step may produce to multiple distinct collections (foo) -> [x], [y] - A step may produce multiple instances to a given collection (foo: i) -> [x: neighbors(i)] (step1) (step2) [item] (step1) [item] (step1)
35
35 Software and Services Group 35.cnc file // delcarations [BlockedMatrix * array: int, int, int]; // Control relations :: (Trisolve: row, iter); // Producer and consumer relations [array: row, iter, iter], [array: iter, iter, iter+1] -> (Trisolve: row, iter); (Trisolve: row, iter) -> [array: row, iter, iter+1]; [array: iter, row, col] (Trisolve)
36
36 Software and Services Group 36 Coding Hints: Trisolve StepReturnValue_t Trisolve( cholesky_graph_t& graph, const Tag_t& TS_tag) { // For each input item for this step retrieve the item using the proper tag // User code to create item tag here BlockedMatrix *... = graph.array.Get(Tag_t(...)); string span = // Step implementation logic goes here... // For each output item for this step, put the new item using the proper tag // User code to create item tag here graph.array.Put(Tag_t(...),...); span); } return CNC_Success; } [array: iter, row, col] (Trisolve)
37
37 Software and Services Group 37 A model not a language: 1 Variety of ways to access the model >Textual representation > Class library > Graphical user interface
38
38 Software and Services Group 38 A model not a language: 2 There are variants within the model – Distinct trade-offs >Efficiency >Ease-of-use >Generality >Guarantees >… Possible different functionality >Continuous (or not) >Tag functions are analyzable (or not) >Real-time (or not) >…
39
39 Software and Services Group 39 Program execution: Attributes are monotonically acquired Item instance
40
40 Software and Services Group 40 Program execution: Attributes are monotonically acquired available Item instance
41
41 Software and Services Group 41 Program execution: Attributes are monotonically acquired available Item instance Tag instance
42
42 Software and Services Group 42 Program execution: Attributes are monotonically acquired available Item instance Tag instance available
43
43 Software and Services Group 43 Program execution: Attributes are monotonically acquired available Item instance Tag instance available Step instance Inputs-available prescribed enabledexecuted
44
44 Software and Services Group 44 Program execution: Attributes are monotonically acquired available Item instance Tag instance available Step instance Inputs-available prescribed enabledexecuted
45
45 Software and Services Group 45 Program execution: Attributes are monotonically acquired available Item instance Tag instance available Step instance Inputs-available prescribed enabledexecuted
46
46 Software and Services Group 46 Program execution: Attributes are monotonically acquired available Item instance Tag instance available Step instance Inputs-available prescribed enabledexecuted
47
47 Software and Services Group 47 Program execution: Attributes are monotonically acquired available Item instance Tag instance available Step instance Inputs-available prescribed enabledexecuted
48
48 Software and Services Group 48 Program execution: Attributes are monotonically acquired available Item instance Tag instance available Step instance Inputs-available prescribed enabledexecuted
49
49 Software and Services Group 49 Program execution: Attributes are monotonically acquired dead Item instance Tag instance available Step instance Inputs-available prescribed enabledexecuted
50
50 Software and Services Group 50 Program execution: Attributes are monotonically acquired dead Item instance Tag instance dead Step instance Inputs-available prescribed enabledexecuted
51
51 Software and Services Group 51 Introduction and history The Big Idea Simple C++ Examples Execution Agenda
52
52 Software and Services Group 52 Plays well with a variety of runtime approaches graindistributionschedule HP TStreams Intel CnC Georgia Tech CnC HP TStreams Rice CnC static dynamic staticdynamicdynamic 1 staticdynamic dynamic 2 dynamic 1 We want to allow you to plug your own scheduler Mandviwala et. al. LCPC 07 [] 2
53
53 Software and Services Group 53 Plays well with a variety of tuning experts The domain expert / later time Different person / different expertise Static analysis Dynamic Runtime (This is the option currently available on our website.) …
54
54 Software and Services Group 54 Plays well with various types of parallelism Possible parallelism Data / loop parallelism Task parallelism Pipeline parallelism Tree based parallelism Fork/join Rescheduling serial executions Memory hierarchy opt Power Could have different runtime schedulers for distinct subgraphs of the application
55
55 Software and Services Group 55 CnC Plays Nicely With Serial Languages Intel: C++ / TBB Rice: Java / Habanaro Intel: Haskell (preliminary) Rice:.NET (preliminary) … 55
56
56 Software and Services Group 56 CnC Plays Nicely With target architectures Shared memory / distributed memory Homogeneous / heterogeneous Flat / hierarchical … 56
57
57 Software and Services Group 57 TPI Kath Knobe Geoff Lowney Mark Hampton Ryan Newton Frank Schlimbach ICL Chih-Ping Chen Melanie Blower Shin Lee Steve Rose Leo Treggiari Ganesh Rao Nikolay Kurtov Jeff Arnold (soon) Mario Deilmann The academic community Rice University Vivek Sarkar Zoran Budimlic Sagnak Tasirlar David Peixotto Mike Burke (+ IBM) Georgia Tech Rich Vuduc Aparna Chandramowlishwaran Novosibirsk Nikolay Kurtov UCLA Jens Palsberg The HP community Carl Offner Alex Nelson The Intel community
58
58 Software and Services Group 58 CnC10 Workshop Co-located with Languages and Compilers for parallel Computers (LCPC) at Rice University in October 2010 kath.knobe@intel.com
59
59 Software and Services Group 59 Intel (C++): http://whatif.intel.com Rice (Java): http://habanero.rice.edu/cnc.html
60
60 Software and Services Group 60
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.