Presentation is loading. Please wait.

Presentation is loading. Please wait.

Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:

Similar presentations


Presentation on theme: "Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:"— Presentation transcript:

1 Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors: Haitao Wei, Junqing Yu, Huafei Yu, Mingkang Qin, Guang R. Gao Chih-Sheng Lin

2 Outline Introduction Background ▫DFBrook Stream Language ▫Architecture – Godson-T Software Pipelining Scheduling with Resource Constraints Experiments and Evaluation Related Works Conclusion 2

3 Outline Introduction Background ▫DFBrook Stream Language ▫Architecture – Godson-T Software Pipelining Scheduling with Resource Constraints Experiments and Evaluation Related Works Conclusion 3

4 Multi-core Architectures Multi-core architectures have become the mainstream solution and industry standard from servers to desktop platforms and handheld devices ▫IBM’s Cell, Nvidia’s GPU, ICT’s Godson, MIT’s raw Multi-core processor ▫increases the computation ability ▫pushes the performance burden to the compiler and programmer to effectively exploit the coarse-grained parallelism across the cores 4

5 Stream Programming Model The stream programming model is an approach! Stream languages ▫StreamIt, Brook, CUDA, SPUR and Cg ▫are motivated by applications in media processing domains ▫are based on synchronous dataflow (SDF) or regular stream flow graphs (RSFG) 5

6 Regular Stream Flow Graph (RSFG) Node ▫a computation task (actor) ▫has an independent instruction stream and address space ▫fire repeatedly in a periodic schedule Arc(Edge) ▫the communication (flow of data) between nodes ▫through the communication channel 6

7 Software Pipelining Software pipelining ▫an efficient method to exploit the coarse-grained parallelism in stream programs ▫takes whole program as a loop and periodic schedule as iteration of the loop Stream programs can be easily and naturally mapped to communication-exposed multi-core architecture ▫but the gains through parallel execution can be overshadowed by the cost of communication and synchronization 7

8 Software Pipelining (Cont.) The performance metric of software pipelining ▫the initiation rate of successive iteration Rate optimal schedule ▫The schedule with the maximum initiation rate (minimum initiation interval) Resource limitations ▫Processor capability, the size of memory with each PE, interconnect bandwidth and direct memory access (DMA) 8

9 Goal To orchestrate an efficient software pipelining schedule which obtains optimal computation rate while minimize the communication cost and satisfying the resource constraints under the system 9

10 CMRO and ROMC CMRO (Communication Minimized Rate- Optimal) ▫minimizes the communication cost at optimal computation rate ▫formulated as an unified Integer Linear Programming (ILP) problem ROMC (Rate-Optimal with Memory Constraints) ▫formulated as an unified integer quadratic programming problem ▫transformed to an ILP problem by using stage adjustment optimization 10

11 Outline Introduction Background ▫DFBrook Stream Language ▫Architecture – Godson-T Software Pipelining Scheduling with Resource Constraints Experiments and Evaluation Related Works Conclusion 11

12 DFBrook Steam Language DFBrook: extension of Brook for SDF 12

13 Target Architecture – Godson-T Communication exposed multi-core platform 13

14 Outline Introduction Background ▫DFBrook Stream Language ▫Architecture – Godson-T Software Pipelining Scheduling with Resource Constraints Experiments and Evaluation Related Works Conclusion 14

15 CMRO Schedule – Problem Definition 15

16 CMRO Schedule – Problem Definition (Cont.) 16

17 Example of Stream Graph and DDG Stream Graph Data Dependency Graph 17

18 CMRO Problem 18

19 Continued with the previous example SGMS (Stream Graph Modulo Schedule) ▫lacks the consideration of communication 19

20 Continued with the previous example CMRO 20

21 ILP Formulation - Space 21

22 ILP Formulation - Space(Cont.) 22

23 ILP Formulation - Space(Cont.) 23

24 ILP Formulation - Space(Cont.) 24

25 ILP Formulation - Time 25

26 ILP Formulation – Time(Cont.) 26

27 ILP Formulation for CMRO Problem 27

28 Rate-Optimal Schedule with Memory Constraints (ROMC) 28

29 ROMC(Cont.) Considerations ▫All the buffers used for an instance are allocated statically in the memory of the processor where the instance is assigned to ▫In the software pipelining schedule, multiple buffers are introduced to keep up with the distance in the stages between two connected instances 29

30 Example of Buffer Allocation Schemes 30

31 ROMC(Cont.) 31

32 Solving ROMC Problem 32

33 Solving ROMC Problem 33

34 Stage Assignment and Adjustment Optimization Process 34

35 Stage Assignment and Adjustment Optimization Process(Cont.) 35 Key: The stage of DMA-node can be adjusted to reduced the buffer usage of victim processors

36 Buffer Usage Calculation 36 The number of input buffers in each PE’s memory

37 Buffer Usage Calculation(Cont.) 37 The number of output buffers in each PE’s memory

38 Stage Adjustment Optimization 38

39 Stage Adjustment Optimization(Cont.) 39

40 Stage Adjustment Optimization(Cont.) 40

41 Outline Introduction Background ▫DFBrook Stream Language ▫Architecture – Godson-T Software Pipelining Scheduling with Resource Constraints Experiments and Evaluation Related Works Conclusion 41

42 Experiment Infrastructure and Methodology Scheduler ▫implemented by DFBrook to generate codes for the software pipelining schedules Experimental Platform ▫Godson-T Architecture Simulator Solving ILPs ▫Commercial program CPLEX 42

43 Comparison 43

44 Comparison(Cont.) 44

45 ROMC Schedule Performance Number of processors = 9 MinMem = 16KB for all benchmarks MaxMem = 512KB for imgsmth, Gauss and aveMotion; 32KB for others 45

46 ROMC vs Conservative Estimate Method (CEM) *: both of the two schedulers can find a feasible solution +: only ROMC finds a solution while the solution by CEM is unable to meet the memory constraints 46

47 Scalability (over single processor) 47

48 ROMC ILP Solving Time (in CPU seconds) In 70% of the cases, ROMC scheduler can obtain an optimal solution in less than 6 minutes 48

49 CMRO ILP Solving Time 49

50 CMRO Performance Improvement 50

51 Outline Introduction Background ▫DFBrook Stream Language ▫Architecture – Godson-T Software Pipelining Scheduling with Resource Constraints Experiments and Evaluation Related Works Conclusion 51

52 Related Works The schedule of stream graph ▫Ptolemy: model of computation and scheduling on SDF ▫Regular Stream Flow Graph (RSFG) can be statically schedule at compiler time Stream compilation ▫Coarse-grained task, data, pipeline parallelism have been exploited for StreamIt on raw architecture 52

53 Related Works(Cont.) Software pipelining is a well-known technique for loop optimization and recently used to used to schedule stream programs ▫LP formulation for min buffer requirements of rate optimal software pipelining of RSFGs SGMS for StreamIt applications on multi-core architecture ▫focused on the balance of work partition but lack considering the cost of communication 53

54 Outline Introduction Background ▫DFBrook Stream Language ▫Architecture – Godson-T Software Pipelining Scheduling with Resource Constraints Experiments and Evaluation Related Works Conclusion 54

55 Conclusion A unified ILP formulation that combines the requirement of rate-optimal software pipelining and the min inter-core communication overhead Consideration of memory constraints Implementation on DFBrook language and Godson-T architecture Good performance improvement comparing with other schedules 55

56 Thanks for your listening~ 56


Download ppt "Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:"

Similar presentations


Ads by Google