Presentation is loading. Please wait.

Presentation is loading. Please wait.

提升循环级并行 陈健2002/11 Copyright © 2002 Intel Corporation.

Similar presentations

Presentation on theme: "提升循环级并行 陈健2002/11 Copyright © 2002 Intel Corporation."— Presentation transcript:

1 提升循环级并行 陈健2002/11 Copyright © 2002 Intel Corporation

2 Agenda  Introduction  Who Cares?  Definition  Loop Dependence and Removal  Dependency Identification Lab  Summary

3 Introduction  Loops must meet certain criteria… –Iteration Independence –Memory Disambiguation –High Loop Count –Etc…

4 Who Cares  实现真正的并行 : –OpenMP –Auto Parallelization…  显式的指令级并行 ILP (Instruction Level Parallelism) –Streaming SIMD (MMX, SSE, SSE2, …) –Software Pipelining on Intel® Itanium™ Processor –Remove Dependencies for the Out-of-Order Core –More Instructions run in parallel on Intel Itanium- Processor  自动编译器并行 –High Level Optimizations

5 Definition int a[MAX]; for (J=0;J

6 图例 OpenMP: True Parallelism SIMD: Vectorization SWP: Software Pipelining OOO: Out-of-Order Core ILP: Instruction Level Parallelism Green: Benefits from concept Yellow: Some Benefits from Concept Red: No Benefit from Concept

7 Agenda

8 Flow Dependency  Read After Write  Cross-Iteration Flow Dependence: Variables written then read in different iterations for (J=1; J

9 Anti-Dependency  Write After Read  Cross-Iteration Anti-Dependence: Variables written then read in different iterations for (J=1; J

10 Output Dependency  Write After Write  Cross-Iteration Output Dependence: Variables written then written again in a different iteration for (J=1; J

11 IntraIteration Dependency  Dependency within an iteration  Hurts ILP  May be automatically removed by compiler K = 1; for (J=1; J

12 for (J=1; J

13 for (J=1;J

14 Induction Variables  Induction variables are incremented on each trip through the loop  Fix by replacing increment expressions with pure function of loop index i1 = 0; i2 = 0; for(J=0,J

15 Reductions  Reductions collapse array data to scalar data via associative operations:  Take advantage of associativity and compute partial sums or local maximum in private storage  Next, combine partial results into shared result, taking care to synchronize access for (J=0; J

16 Data Ambiguity and the Compiler void func(int *a, int *b) { for (J=0;J

17 Function Calls for (J=0;J

18 Function Calls with State   Many routines maintain state across calls: – –Memory allocation – –Pseudo-random number generators – –I/O routines – –Graphics libraries – –Third-party libraries  Parallel access to such routines is unless synchronized  Parallel access to such routines is unsafe unless synchronized  Check documentation for specific functions to determine thread-safety

19 for(J=MAX-1;J>=0;J--){ compute(J,...) } A Simple Test 1.Reverse the loop order and rerun in serial 2.If results are unchanged, the loop is Independent* for(J=0;J compute(J,...) } *Exception: Loops with induction variables Reverse

20 Summary  Loop Independence: Loop Iterations are independent of each other.  Explained it’s importance –ILP and Parallelism  Identified common causes of loop dependence –Flow Dependency, Anti-Dependency, Output Dependency  Taught some methods of fixing loop dependence  Reinforced concepts through lab

Download ppt "提升循环级并行 陈健2002/11 Copyright © 2002 Intel Corporation."

Similar presentations

Ads by Google