Presentation is loading. Please wait.

Presentation is loading. Please wait.


Similar presentations

Presentation on theme: "INSTRUCTION-LEVEL PARALLEL PROCESSORS"— Presentation transcript:

Chapter No. 4

2 Evolution of ILP-processors

3 What is instruction-level parallelism?
The potential for executing certain instructions in parallel, because they are independent. Any technique for identifying and exploiting such opportunities.

4 Instruction-level parallelism (ILP)
Basic blocks Sequences of instruction that appear between branches Usually no more than 5 or 6 instructions! Loops for ( i=0; i<N; i++) x[i] = x[i] + s; We can only realize ILP by finding sequences of independent instructions Dependent instructions must be separated by a sufficient amount of time

5 Principle Operation of ILP- Processors

6 Pipelined Operation

7 Pipelined Operation

8 Pipelined Operation A number of functional units are employed in sequence to perform a single computation. Each functional unit represent a certain stage of computation. Pipeline allows overlapped execution of instructions. It increases the overall processor’s throughput.

9 Superscalar Processors
Increase the ability of the processor to use instruction level parallelism. Multiple instructions are issued every cycle multiple pipelines operating in parallel. received sequential stream of instructions. The decode and execute unit then issues multiple instructions for the multiple execution units in each cycle.

10 Superscalar Approach

11 VLIW architecture Abbreviation of VLIW is Very Large Instruction Word.
VLIW architecture receives multi-operation instructions, i.e. with multiple fields for the control of EUs. Basic structure of superscalar processors and VLIW is same multiple EUs, each capable of parallel execution on data fetched from a register file.

12 VLIW Approach

13 Dependencies Between Instructions
Data dependency control dependency resource dependency

14 Dependencies Between Instructions

15 Data Dependency An instruction j depends on data from previous instruction i cannot execute j before the earlier instruction i cannot execute j and i simultaneously. Data dependence's are properties of the program whether this leads to data hazard or stall in a pipeline depending upon the pipeline organization.

16 Data Dependency Data can be differentiated according to the data involved and according to their type. The data involved in dependency may be from register or from memory the type of data dependency may be either in a straight-line code or in a loop.

17 Data Dependency

18 Data Dependency in Straight- line Code
Straight-line code may contain three different types of dependencies: RAW (read after write) WAR (write after read) WAW (write after write)

19 RAW (Read after Write) Read After Write (RAW)
i1: load r1, a; i2: add r2, r1, r1; Assume a pipeline of Fetch/Decode/Execute/Mem/Writeback

20 Name Dependencies A “name dependence” occurs when 2 instructions use the same register or memory location (a name) but there is no flow of data between the 2 instructions There are 2 types: Antidependencies: Occur when an instruction j writes a register or memory location that instruction i reads – and i is executed first Corresponds to a WAR hazard Output dependencies: Occur when instruction i and instruction j write the same register or memory location Protected against by checking for WAW hazards

21 Write after Read (WAR) Write after Read (WAR)
i1: mul r1, r2, r3; r1 <= r2 * r i2: add r2, r4 , r5; r2 <= r4 + r5 If instruction i2 (add) is executed before instruction i1 (mul) for some reason, then i1 (mul) could read the wrong value for r2.

22 Write after Read (WAR) One reason for delaying i1 would be a stall for the ‘r3’ value being produced by a previous instruction. Instruction i2 could proceed because it has all its operands, thus causing the WAR hazard. Use register renaming to eliminate WAR dependency. Replace r2 with some other register that has not been used yet.

23 Write after Write (WAW)
i1: mul r1, r2, r3; r1 <= r2 * r i2: add r1, r4 , r5; r2 <= r4 + r5 If instruction i1 (mul) finishes AFTER instruction i2 (add), then register r1 would get the wrong value. Instruction i1 could finish after instruction i2 if separate execution units were used for instructions i1 and i2.

24 Write after Write (WAW)
One way to solve this hazard is to simply let instruction i1 proceed normally, but disable its write stage.

25 Data Dependency in Loops
Instruction belonging to a particular loop iteration may be dependent on the the instructions belonging to previous loop iterations. This type of dependency is referred as recurrences or inter-iteration data dependency. do X(I)= A*X(I-1) + B end do

26 Data Dependency in Loops
Loops are a “common case” in pretty much any program so this is worth mentioning… Consider: for (j=0; j<=100; j++) { A[j+1] = A[j] + C[j]; /*S1*/ B[j+1] = B[j] + A[j+1]; /*S2*/ }

27 Data Dependency in Loops
Now, look at the dependence of S1 on an earlier iteration of S1 This is a loop-carried dependence; in other words, the dependence exists b/t different iterations of the loop Successive iterations of S1 must execute in order S2 depends on S1 within an iteration and is not loop carried Multiple iterations of just this statement in a loop could execute in parallel

28 Data Dependency in Graphs
Data dependency can also be represented by graphs d0,, dt or ,da to denote RAW, WAR or WAR respectively. i1 i2 I2 is dependent on i1

29 Control Dependence Control dependence determines the ordering of an instruction with respect to a branch instruction if the branch is taken, the instruction is executed if the branch is not taken, the instruction is not executed An instruction that is not control dependent on a branch can not be moved before the branch instructions from then part of if-statement cannot be executed before the branch

30 Control Dependence An instruction that is not control dependent on a branch cannot be moved after the branch other instructions cannot be moved into the then part of an if statement

31 Control Dependence

32 Control Dependence Frequent conditional branches impose a heavy performance constraint on ILP-processors. Higher rate of instruction issue per cycle raise the probability of encountering conditional control dependency in each cycle.

33 Control Dependence

34 Control Dependency Graph

35 Resource Dependency An instruction is resource dependent on a previously issued instruction if it requires hardware resource which is still being used by previously issued instruction If, for instance, there is only a single multiplication unit available, then in the code sequence i1: div r1, r2, r3 i2: div r4, r2, r5 i2 is resource dependent on i1

36 Instruction Scheduling
Why instruction scheduling is needed? Instruction scheduling involves: Detection detect where dependency occurs in a code Resolution removing dependencies from the code two approaches for instruction scheduling static approach dynamic approach

37 Instruction Scheduling

38 Static Scheduling Detection and resolution is accomplished by the compiler which avoids dependencies by reordering the code. VLIW processors expects dependency free code generated by ILP compiler.

39 Dynamic Scheduling Performed by the processor contains two windows
issue window contains all fetched instructions which are intended for issue in the next cycle. Issue window’s width is equal to issue rate all instructions are checked for dependencies that may exists in an instruction.

40 Dynamic Scheduling Execution window
contains all those instructions which are still in execution and whose results have not yet been produced are retained in execution window.

41 Instruction Scheduling in ILP-processors
ILP-instruction scheduling Detection and resolution of dependencies Parallel optimization

42 Parallel Optimization
Parallel optimization is achieved by reordering the sequence of instructions by appropriate code transformation for parallel execution. Also, known as code restructuring or code reorganization.

43 ILP-instruction Scheduling

44 Preserving Sequential Consistency
To maintain the logical integrity of program div r1, r2, r3; ad r5, r6, r7; jz anywhere;

45 Preserving Sequential Consistency


Similar presentations

Ads by Google