Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer Architecture Pipelines & Superscalars. Pipelines Data Hazards Code: lw $4, 0($1) add $15, $1, $1 sub$2, $1, $3 and $12, $2, $5 or $13, $6, $2.

Similar presentations


Presentation on theme: "Computer Architecture Pipelines & Superscalars. Pipelines Data Hazards Code: lw $4, 0($1) add $15, $1, $1 sub$2, $1, $3 and $12, $2, $5 or $13, $6, $2."— Presentation transcript:

1 Computer Architecture Pipelines & Superscalars

2 Pipelines Data Hazards Code: lw $4, 0($1) add $15, $1, $1 sub$2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15,100($2) The last four instructions all depend on a result produced by the first! MIPS instructions have the format op dest, src a, src b

3 Pipelines - Data hazards Examine the pipeline (ignore first 2!) r2 only updated in time for add!

4 Pipelines - Data Hazards Compiler solution Insert NOOPs Inefficient!

5 Pipelines - Data Hazards Second compiler solution Reorder lw $4, 0($1) add $15, $1, $1 sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15,100($2) sub $2, $1, $3 lw $4, 0($1) add $15, $1, $1 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15,100($2) These two must not define $1 or $3! Read Written

6 Pipelines - Data Hazards Second compiler solution Reorder sub $2, $1, $3 lw $4, 0($1) add $15, $1, $1 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15,100($2) Read Written First use of $2

7 Pipelines - Data Hazards Compiler analyses dependencies Register definitions Register use Read After Write (RAW) dependency No dependencies Instruction can be moved! sub $2, $1, $3 lw $4, 0($1) add $15, $1, $1 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15,100($2) Written Uses of $2

8 Pipelines - Data Hazards Hardware solution Value forwarding Hardware detects dependency scoreboard Forwards result from WB to EX for subsequent use Hardware Transparent to software!

9 Data Hazards - classification Read after Write (RAW) Instruction 1 must write before instruction 2 reads Write after Write (WAW) Instructions 1 and 2 both write Instruction 2 must write after 1 Write after Read (WAR) Instruction 1 reads Instruction 2 writes (overwrites) Instruction 2 must not write before 1 reads Reordering algorithms must consider all three!

10 Lecture 5 - Key Points Data Hazards RAW - most common WAW WAR Compiler looks for dependencies then re-orders Hardware Scoreboard Monitors dependencies ensures correct operation Value forwarding hardware Forwards results from EX stage

11 Pipelines - Exceptions Caused by overflow, underflow Example add $1, $2, $1 Overflow detected in EX stage Causes jump to exception handler as branch - remainder of pipeline flushed but Compiler needs original $1 causing overflow  Register must not be overwritten EX stage needs to squash WB operation Precise Exception problem - more later!

12 Pipelines - Depth Pipeline can’t be too deep Hazards are frequent èmany stalls in deep pipelines Relative Performance Pipeline Depth Too Deep!

13 Pipelines - Depth Pipeline can’t be too deep Hazards are frequent èmany stalls in deep pipelines Relative Performance Pipeline Depth Too Deep! Superpipelined

14 CISC and pipelines High Speed CISC processors are pipelined Overlap IF, EX Variable instruction length running time (number of microcode cycles) èpipeline imbalance è“backup” in pipe stages ècomplicate hazard detection Complex addressing modes èauto-increment updates address register èmultiple memory accesses required èsmooth pipeline flow more difficult!

15 Instruction Queues Vital performance determinant Rate of instruction fetch High Performance processors Fetch multiple instructions in each cycle common Use wide datapath to memory PowerPC bits = 4 instructions Despatch unit Examine dependencies Determine which instructions can be despatched

16 Instruction Queues Q “matches” fetch/despatch rates General Strategy for matching Producers - Consumers Use of FIFO-style Queues Absorb Asynchronous Delivery / Consumption Rates Provides Elasticity in pipelines Producer FIFO Consumer Differing Instantaneous Rates

17 Superscalar Processors

18 PowerPC organisation PowerPC 601 ~1993 Boundary of the Si die New - Look in the “Example Processors” section of the Web notes 3-way SuperScalar Integer Branch Floating Point A newer machine will have more functional units here!

19 Superscalar Processors Multiple Functional Units PowerPC 604 ð6-way superscalar Despatch Unit Sends “ready” instructions to all free units PowerPC 604: potential 4 instructions/cycle (pipeline lengths are different!) reality: 2-3 instructions/cycle? (program dependent!) Branch Unit LoadStore Unit 3 Integer Units Floating Point Unit

20 Superscalar Processors Mix of functional units Up to 8-way superscalar common now 2 Floating point units Usually have ~3 cycle latency 3 Integer Arithmetic Branch unit Load / store unit + ….? Marketing departments can play some games with the ‘ n ’ of a n -way superscalar!

21 Superscalar – Maximum throughput Instruction Issue Unit is the key! If IIU only issues 4 instructions per cycle, An n -way superscalar ( n >> 4 ) can still only complete 4 instructions / cycle! IIU has many tasks Pre-fetch instructions At least one cache line! Check dependencies Has data required by this instruction been computed yet? Keeps register ‘scoreboard’ Mark registers which will be written by instructions already issued It’s a small dataflow machine (see later!) Check availability of functional units


Download ppt "Computer Architecture Pipelines & Superscalars. Pipelines Data Hazards Code: lw $4, 0($1) add $15, $1, $1 sub$2, $1, $3 and $12, $2, $5 or $13, $6, $2."

Similar presentations


Ads by Google