Presentation is loading. Please wait.

Presentation is loading. Please wait.

提升循环级并行 陈健2002/11 Copyright © 2002 Intel Corporation.

Similar presentations


Presentation on theme: "提升循环级并行 陈健2002/11 Copyright © 2002 Intel Corporation."— Presentation transcript:

1 提升循环级并行 陈健2002/11 Copyright © 2002 Intel Corporation

2 Agenda  Introduction  Who Cares?  Definition  Loop Dependence and Removal  Dependency Identification Lab  Summary

3 Introduction  Loops must meet certain criteria… –Iteration Independence –Memory Disambiguation –High Loop Count –Etc…

4 Who Cares  实现真正的并行 : –OpenMP –Auto Parallelization…  显式的指令级并行 ILP (Instruction Level Parallelism) –Streaming SIMD (MMX, SSE, SSE2, …) –Software Pipelining on Intel® Itanium™ Processor –Remove Dependencies for the Out-of-Order Core –More Instructions run in parallel on Intel Itanium- Processor  自动编译器并行 –High Level Optimizations

5 Definition int a[MAX]; for (J=0;J<MAX;J++) { a[J] = b[J]; }  Loop Independence: Iteration Y of a loop is independent of when or whether iteration X happens

6 图例 OpenMP: True Parallelism SIMD: Vectorization SWP: Software Pipelining OOO: Out-of-Order Core ILP: Instruction Level Parallelism Green: Benefits from concept Yellow: Some Benefits from Concept Red: No Benefit from Concept

7 Agenda

8 Flow Dependency  Read After Write  Cross-Iteration Flow Dependence: Variables written then read in different iterations for (J=1; J<MAX; J++) { A[J]=A[J-1]; } A[1]=A[0]; A[2]=A[1];

9 Anti-Dependency  Write After Read  Cross-Iteration Anti-Dependence: Variables written then read in different iterations for (J=1; J<MAX; J++) { A[J]=A[J+1]; } A[1]=A[2]; A[2]=A[3];

10 Output Dependency  Write After Write  Cross-Iteration Output Dependence: Variables written then written again in a different iteration for (J=1; J<MAX; J++) { A[J]=B[J]; A[J+1]=C[J]; } A[1]=B[1]; A[2]=C[1]; A[2]=B[1]; A[3]=C[1];

11 IntraIteration Dependency  Dependency within an iteration  Hurts ILP  May be automatically removed by compiler K = 1; for (J=1; J<MAX; J++) { A[J]=A[J] + 1; B[K]=A[K] + 1; K = K + 2; } A[1] = A[1] + 1; B[1]= A[1] + 1;

12 for (J=1; J<MAX; J++) { A[J]= A[0] + J; } Remove Dependencies  Best Choice  Requirement for true Parallelism  Not all dependencies can be removed for (J=1; J<MAX; J++) { A[J]=A[J-1] + 1; }

13 for (J=1;J<MAX;J+=2) { A[J]=A[J-1] + B[J]; A[J+1]=A[J-1] + (B[J] + B[J+1]); } Increasing ILP, without removing dependencies  Good: Unroll Loop  Make sure the compiler can’t or didn’t do this for you  Compiler should not apply common sub- expression elimination  Also notice that if this is floating point data - precision could be altered for (J=1;J<MAX;J++) { A[J] =A[J-1] + B[J]; }

14 Induction Variables  Induction variables are incremented on each trip through the loop  Fix by replacing increment expressions with pure function of loop index i1 = 0; i2 = 0; for(J=0,J<MAX,J++) { i1 = i1 + 1; B(i1) = … i2 = i2 + J; A(i2) = … } for(J=0,J<MAX,J++) { B(J) =... A((J**2 + J)/2)=... }

15 Reductions  Reductions collapse array data to scalar data via associative operations:  Take advantage of associativity and compute partial sums or local maximum in private storage  Next, combine partial results into shared result, taking care to synchronize access for (J=0; J<MAX; J++) sum = sum + c[J];

16 Data Ambiguity and the Compiler void func(int *a, int *b) { for (J=0;J<MAX;J++) { a[J] = b[J]; }  Are the loop iterations independent?  The C++ compiler has no idea  No chance for optimization - In order to run error free the compiler assumes that a and b overlap

17 Function Calls for (J=0;J<MAX;J++) { compute(a[J],b[J]); a[J][1]=sin(b[J]); }  Generally function calls inhibit ILP  Exceptions: –Transcendentals –IPO compiles

18 Function Calls with State   Many routines maintain state across calls: – –Memory allocation – –Pseudo-random number generators – –I/O routines – –Graphics libraries – –Third-party libraries  Parallel access to such routines is unless synchronized  Parallel access to such routines is unsafe unless synchronized  Check documentation for specific functions to determine thread-safety

19 for(J=MAX-1;J>=0;J--){ compute(J,...) } A Simple Test 1.Reverse the loop order and rerun in serial 2.If results are unchanged, the loop is Independent* for(J=0;J compute(J,...) } *Exception: Loops with induction variables Reverse

20 Summary  Loop Independence: Loop Iterations are independent of each other.  Explained it’s importance –ILP and Parallelism  Identified common causes of loop dependence –Flow Dependency, Anti-Dependency, Output Dependency  Taught some methods of fixing loop dependence  Reinforced concepts through lab


Download ppt "提升循环级并行 陈健2002/11 Copyright © 2002 Intel Corporation."

Similar presentations


Ads by Google