Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pipelining.

Similar presentations


Presentation on theme: "Pipelining."— Presentation transcript:

1 Pipelining

2 Pipelining s1 s2 s3 Without pipeline With pipeline stages stages s3 s3
time time Without pipeline With pipeline

3 Pipelining Without pipeline With pipeline T1 = s . t . n
stages stages Without pipeline With pipeline s3 s3 s2 s2 s1 s1 time time T1 = s . t . n Ts = s . t + (n-1).t Speedup = T1 / Ts = s.n = s s+(n-1) s/n +(1-1/n) Speedup = s n s – stages n – tasks t – time per stage Throughput = n Ts

4 Pipelining Without pipeline With pipeline T1 = s . t . n
stages stages Without pipeline With pipeline s3 s3 s2 s2 s1 s1 T1 = s . t . n Ts = s . t + (n-1).t s = 3 n T1 Ts Speedup Throughput 1 3t 1/3t 10 30t 3t+9t = 12t 30/12 = 2.5 10/12t 100 300t 3t+99t = 102t 300/102 = 2.9 100/102t t 3t t = t =  1/t Speedup = T1 / Ts Speedup = s n Throughput = n Ts

5 Pipelining Slowest stage determines the pipeline performance s1 s2 s3
stages stages s3 s3 s2 s2 s1 s1 time time Without pipeline With pipeline Slowest stage determines the pipeline performance

6 Pipelining Deep pipeline s1 s2 s3 3 stages 6 stages s1 s21 s22 s23 s31
s1 s21 s22 s23 s31 s32 stages stages s1 s2 s3 s4 s5 s6 s1 s2 s3 time time 3 stages 6 stages Deep pipeline

7 Computational Pipelines
Combinatorial logic Reg clock R R R Comb.log. A Comb.log. B Comb.log. C clock

8 Limitations of Pipelining
Nonuniform partitioning Stage delays may be nonuniform Throughput is limited by the slowest stage Deep pipelining Large number of stages Modern processors have deep pipelines (15 or more) to increase the clock rate. 50ps ps ps ps ps ps Comb.log. A R B C clock 50ps ps ps ps ps ps R R R Comb.log. A Comb.log. B Comb.log. C clock

9 Parallel Adder FA FA FA FA a4 a3 a2 a1 b4 b3 b2 b1 +------------
x4 x3 x2 x1 FA a2,b2 x1 FA a3,b3 x2 FA a4,b4 x3 FA x4

10 Pipelined Parallel Adder
a4,b4 a3,b3 a2,b2 a1,b1 a4 a3 a2 a1 b4 b3 b2 b1 x4 x3 x2 x1 c4 c3 c2 c1 d4 d3 d2 d1 y4 y3 y2 y1 e4 e3 e2 e1 f4 f3 f2 f1 z4 z3 z2 z1 g4 g3 g2 g1 h4 h3 h2 h1 w4 w3 w2 w1 FA FA FA FA

11 Pipelined Parallel Adder
c4,d4 c3,d3 c2,d2 c1,d1 a4 a3 a2 a1 b4 b3 b2 b1 x4 x3 x2 x1 c4 c3 c2 c1 d4 d3 d2 d1 y4 y3 y2 y1 e4 e3 e2 e1 f4 f3 f2 f1 z4 z3 z2 z1 g4 g3 g2 g1 h4 h3 h2 h1 w4 w3 w2 w1 FA a4,b4 a3,b3 a2,b2 x1 FA FA FA

12 Pipelined Parallel Adder
e4,f4 e3,f3 e2,f2 e1,f1 a4 a3 a2 a1 b4 b3 b2 b1 x4 x3 x2 x1 c4 c3 c2 c1 d4 d3 d2 d1 y4 y3 y2 y1 e4 e3 e2 e1 f4 f3 f2 f1 z4 z3 z2 z1 g4 g3 g2 g1 h4 h3 h2 h1 w4 w3 w2 w1 FA c2,d2 y1 c4,d4 c3,d3 FA a3,b3 x2 x1 a4,b4 FA FA

13 Pipelined Parallel Adder
g4,h4 g3,h3 g2,h2 g1,h1 a4 a3 a2 a1 b4 b3 b2 b1 x4 x3 x2 x1 c4 c3 c2 c1 d4 d3 d2 d1 y4 y3 y2 y1 e4 e3 e2 e1 f4 f3 f2 f1 z4 z3 z2 z1 g4 g3 g2 g1 h4 h3 h2 h1 w4 w3 w2 w1 FA e4,f4 e3,f3 e2,f2 z1 FA c4,d4 c3,d3 y2 y1 FA x3 a4,b4 x2 x1 FA

14 Pipelined Parallel Adder
a4 a3 a2 a1 b4 b3 b2 b1 x4 x3 x2 x1 c4 c3 c2 c1 d4 d3 d2 d1 y4 y3 y2 y1 e4 e3 e2 e1 f4 f3 f2 f1 z4 z3 z2 z1 g4 g3 g2 g1 h4 h3 h2 h1 w4 w3 w2 w1 FA g3,h3 g2,h2 w1 g4,h4 FA e4,f4 e3,f3 z2 z1 FA c4,d4 y3 y2 y1 FA x4 x3 x2 x1

15 Floating-point Arithmeric Pipeline
Pipelined Floating-point Addition Subtract exponents (E) Subtract exponents to check if they are equal Compare exponents and Align mantissas (M) Shift mantissas until the exponents are equal Add mantissas (A) Normalize result (N) n1 E M A N n2

16 Instruction Execution Pipeline
Instruction Fetch Cycle (IF) Fetch current instruction from memory Increment PC Instruction decode / register fetch cycle (ID) Decode instruction Compute possible branch target Read registers from the register file Execution / effective address cycle (EX) Form the effective address ALU performs the operation specified by the opcode Memory access (MEM) Memory read for load instruction Memory write for store instruction Write-back cycle (WB) Write result into register file IF ID EX MEM WB

17 Instruction Execution Pipeline
IF ID EX MEM WB stages WB MEM EX ID IF time

18 Control (Branch) Hazards
Pipeline Hazards Control (Branch) Hazards Arise from pipelining of instructions (e.g. branch) that change PC. LOOP: LOAD 100,X ADD 200,X STORE 300,X DECX BNE LOOP ... for i=n to 1 ci = ai + bi stages WB MEM EX ID IF time

19 A Modern Processor Intel Core i7


Download ppt "Pipelining."

Similar presentations


Ads by Google