Presentation is loading. Please wait.

Presentation is loading. Please wait.

INSTRUCTION-LEVEL PARALLEL PROCESSORS Chapter No. 4.

Similar presentations


Presentation on theme: "INSTRUCTION-LEVEL PARALLEL PROCESSORS Chapter No. 4."— Presentation transcript:

1 INSTRUCTION-LEVEL PARALLEL PROCESSORS Chapter No. 4

2 Evolution of ILP- processors

3 What is instruction-level parallelism? zThe potential for executing certain instructions in parallel, because they are independent. zAny technique for identifying and exploiting such opportunities.

4 Instruction-level parallelism (ILP) zBasic blocks y Sequences of instruction that appear between branches yUsually no more than 5 or 6 instructions! zLoops yfor ( i=0; i

5 Principle Operation of ILP- Processors

6 Pipelined Operation

7

8 zA number of functional units are employed in sequence to perform a single computation. zEach functional unit represent a certain stage of computation. zPipeline allows overlapped execution of instructions. zIt increases the overall processor’s throughput.

9 Superscalar Processors zIncrease the ability of the processor to use instruction level parallelism. zMultiple instructions are issued every cycle ymultiple pipelines operating in parallel. zreceived sequential stream of instructions. zThe decode and execute unit then issues multiple instructions for the multiple execution units in each cycle.

10 Superscalar Approach

11 VLIW architecture zAbbreviation of VLIW is Very Large Instruction Word. zVLIW architecture receives multi-operation instructions, i.e. with multiple fields for the control of EUs. zBasic structure of superscalar processors and VLIW is same ymultiple EUs, each capable of parallel execution on data fetched from a register file.

12 VLIW Approach

13 Dependencies Between Instructions zData dependency zcontrol dependency zresource dependency

14 Dependencies Between Instructions

15 Data Dependency zAn instruction j depends on data from previous instruction i ycannot execute j before the earlier instruction i ycannot execute j and i simultaneously. zData dependence's are properties of the program ywhether this leads to data hazard or stall in a pipeline depending upon the pipeline organization.

16 Data Dependency zData can be differentiated according to the data involved and according to their type. yThe data involved in dependency may be from register or from memory ythe type of data dependency may be either in a straight-line code or in a loop.

17 Data Dependency

18 Data Dependency in Straight- line Code zStraight-line code may contain three different types of dependencies: yRAW (read after write) yWAR (write after read) yWAW (write after write)

19 RAW (Read after Write) zRead After Write (RAW) z i1: load r1, a; i2: add r2, r1, r1; zAssume a pipeline of Fetch/Decode/Execute/Mem/Writeback

20 Name Dependencies zA “name dependence” occurs when 2 instructions use the same register or memory location (a name) but there is no flow of data between the 2 instructions zThere are 2 types: yAntidependencies: Occur when an instruction j writes a register or memory location that instruction i reads – and i is executed first xCorresponds to a WAR hazard yOutput dependencies: Occur when instruction i and instruction j write the same register or memory location xProtected against by checking for WAW hazards

21 Write after Read (WAR) zWrite after Read (WAR) z i1: mul r1, r2, r3; r1 <= r2 * r3 i2: add r2, r4, r5; r2 <= r4 + r5 zIf instruction i2 (add) is executed before instruction i1 (mul) for some reason, then i1 (mul) could read the wrong value for r2.

22 Write after Read (WAR) zOne reason for delaying i1 would be a stall for the ‘r3’ value being produced by a previous instruction. Instruction i2 could proceed because it has all its operands, thus causing the WAR hazard. zUse register renaming to eliminate WAR dependency. Replace r2 with some other register that has not been used yet.

23 Write after Write (WAW) zWrite after Write (WAW) z i1: mul r1, r2, r3; r1 <= r2 * r3 i2: add r1, r4, r5; r2 <= r4 + r5 zIf instruction i1 (mul) finishes AFTER instruction i2 (add), then register r1 would get the wrong value. Instruction i1 could finish after instruction i2 if separate execution units were used for instructions i1 and i2.

24 Write after Write (WAW) zOne way to solve this hazard is to simply let instruction i1 proceed normally, but disable its write stage.

25 Data Dependency in Loops zInstruction belonging to a particular loop iteration may be dependent on the the instructions belonging to previous loop iterations. zThis type of dependency is referred as recurrences or inter-iteration data dependency. do X(I)= A*X(I-1) + B end do

26 Data Dependency in Loops zLoops are a “common case” in pretty much any program so this is worth mentioning… Consider: for (j=0; j<=100; j++) { A[j+1] = A[j] + C[j]; /*S1*/ B[j+1] = B[j] + A[j+1]; /*S2*/ }

27 Data Dependency in Loops zNow, look at the dependence of S1 on an earlier iteration of S1  This is a loop-carried dependence ; in other words, the dependence exists b/t different iterations of the loop  Successive iterations of S1 must execute in order zS2 depends on S1 within an iteration and is not loop carried yMultiple iterations of just this statement in a loop could execute in parallel

28 Data Dependency in Graphs zData dependency can also be represented by graphs   ,   t  or   a to denote RAW, WAR or WAR respectively. i1i2 I2 is dependent on i1

29 Control Dependence zControl dependence determines the ordering of an instruction with respect to a branch instruction yif the branch is taken, the instruction is executed yif the branch is not taken, the instruction is not executed zAn instruction that is not control dependent on a branch can not be moved before the branch yinstructions from then part of if-statement cannot be executed before the branch

30 Control Dependence zAn instruction that is not control dependent on a branch cannot be moved after the branch zother instructions cannot be moved into the then part of an if statement

31 Control Dependence

32 zFrequent conditional branches impose a heavy performance constraint on ILP-processors. zHigher rate of instruction issue per cycle raise the probability of encountering conditional control dependency in each cycle.

33 Control Dependence

34 Control Dependency Graph

35 Resource Dependency zAn instruction is resource dependent on a previously issued instruction if it requires hardware resource which is still being used by previously issued instruction xIf, for instance, there is only a single multiplication unit available, then in the code sequence i1: div r1, r2, r3 i2: div r4, r2, r5 xi2 is resource dependent on i1

36 Instruction Scheduling zWhy instruction scheduling is needed? zInstruction scheduling involves: zDetection ydetect where dependency occurs in a code zResolution yremoving dependencies from the code ztwo approaches for instruction scheduling ystatic approach ydynamic approach

37 Instruction Scheduling

38 Static Scheduling zDetection and resolution is accomplished by the compiler which avoids dependencies by reordering the code. zVLIW processors expects dependency free code generated by ILP compiler.

39 Dynamic Scheduling zPerformed by the processor zcontains two windows yissue window xcontains all fetched instructions which are intended for issue in the next cycle. xIssue window’s width is equal to issue rate xall instructions are checked for dependencies that may exists in an instruction.

40 Dynamic Scheduling yExecution window xcontains all those instructions which are still in execution and whose results have not yet been produced are retained in execution window.

41 Instruction Scheduling in ILP- processors Detection and resolution of dependencies Parallel optimization ILP-instruction scheduling

42 Parallel Optimization zParallel optimization is achieved by reordering the sequence of instructions by appropriate code transformation for parallel execution. zAlso, known as code restructuring or code reorganization.

43 ILP-instruction Scheduling

44 Preserving Sequential Consistency zTo maintain the logical integrity of program div r1, r2, r3; ad r5, r6, r7; jz anywhere;

45 Preserving Sequential Consistency


Download ppt "INSTRUCTION-LEVEL PARALLEL PROCESSORS Chapter No. 4."

Similar presentations


Ads by Google