Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 4: CPU Performance. A Modern Processor Intel Core i7.

Similar presentations


Presentation on theme: "Lecture 4: CPU Performance. A Modern Processor Intel Core i7."— Presentation transcript:

1 Lecture 4: CPU Performance

2 A Modern Processor Intel Core i7

3 Processor Performance Lower bounds that characterize the maximum performance: Latency Bound Occurs when operations must be performed in strict sequence (e.g. data dependency) Minimum time to perform the operations sequentially Throughput Bound Characterizes the raw computing capacity of the processor’s functional units. Maximum operations per cycle

4 Pipelining s1s2s3 time stages s1 s2 s3 time stages s1 s2 s3 Without pipeline With pipeline

5 Pipelining s – stages n – tasks t – time per stage time stages s1 s2 s3 time stages s1 s2 s3 Without pipeline With pipeline T 1 = s. t. nT p = s. t + (n-1).t Speedup = T 1 / T p = s.n = s. s+(n-1) s/n +(1-1/n) Speedup = s n   Throughput = n. T p

6 Pipelining Slowest stage determines the pipeline performance s1s2s3 time stages s1 s2 s3 time stages s1 s2 s3 Without pipeline With pipeline

7 Computational Pipelines Combinatorial logic Reg Comb.log. A R Comb.log. B R Comb.log. C R clock

8 Limitations of Pipelining Nonuniform partitioning Stage delays may be nonuniform Throughput is limited by the slowest stage Deep pipelining Large number of stages Modern processors have deep pipelines (15 or more) to increase the clock rate. Comb.log. A R Comb.log. B R Comb.log. C R clock 50ps 20ps 150ps 20ps 100ps 20ps Comb.log. A R Comb.log. B R Comb.log. C R clock 50ps 20ps 50ps 20ps 50ps 20ps …

9 Pipelined Parallel Adder a1,b1a4,b4a3,b3a2,b2

10 Pipelined Parallel Adder c1,d1 a1+b1 c4,d4c3,d3c2,d2 a4,b4a3,b3a2,b2

11 Pipelined Parallel Adder c1+d1 a1+b1 c4,d4c3,d3 c2,d2 a4,b4 a3,b3a2+b2 e1,f1e2,f2e3,f3e4,f4

12 Pipelined Parallel Adder c1+d1 a1+b1 c4,d4c3,d3c2+d2 a4,b4 a3+b3 a2+b2 e1+f1e2,f2e3,f3e4,f4 g1,h1g2,h2g3,h3g4,h4

13 Pipelined Parallel Adder c1+d1 a1+b1 c4,d4c3+d3c2+d2 a4+b4a3+b3a2+b2 e1+f1e2+f2e3,f3e4,f4 g1+h1g2,h2g3,h3 g4,h4

14 Instruction Execution Pipeline IF 1.Instruction Fetch Cycle (IF) Fetch current instruction from memory Increment PC 2.Instruction decode / register fetch cycle (ID) Decode instruction Compute possible branch target Read registers from the register file 3.Execution / effective address cycle (EX) Form the effective address ALU performs the operation specified by the opcode 4.Memory access (MEM) Memory read for load instruction Memory write for store instruction 5.Write-back cycle (WB) Write result into register file WBMEMEXID

15 Instruction Execution Pipeline time stages IF WBMEMEXID WB MEM EX ID

16 Pipeline Hazards 1.Structural hazards 2.Data Hazards 3.Control Hazards

17 Pipeline Hazards Structural Hazards Arise from resource conflicts when the hardware cannot support all possible combinations of instructions simultaneously in overlapped execution. time stages IF WBMEMEXID WB MEM EX ID stall (bubble) MemRegALUMemReg

18 Pipeline Hazards Data Hazards Arise when an instruction depends on the results of a previous instruction in a way that is exposed by the overlapping of instructions. time stages IF WBMEMEXID WB MEM EX ID MemRegALUMemReg ADDR1, R2, R3 SUBR4, R1, R5 ANDR6, R1, R7 ORR8, R1, R9 XORR10, R1, R11

19 Pipeline Hazards Data Hazards Forwarding (by-passing) IFWBMEMEXID Mem Reg ALU Mem Reg IFWBMEMEXID IFWBMEMEXID IFWBMEMEXID Mem Reg ALU Mem Reg

20 Pipeline Hazards Control (Branch) Hazards Arise from pipelining of instructions (e.g. branch) that change PC. time stages IF WB MEM EX ID LOOP: LOAD 100,X ADD 200,X STORE 300,X DECX BNE LOOP... for i=n to 1 c i = a i + b i

21 Pipeline Hazards Control (Branch) Hazards Freeze (flush) time stages IF WB MEM EX ID BRA L1... L1: NEXT NEXT

22 Pipeline Hazards Control (Branch) Hazards Predicted-not-taken time stages IF WB MEM EX ID BNE L1 NEXT... L1: NEXT NEXT Not takenTaken

23 Pipeline Hazards Control (Branch) Hazards Predicted-taken time stages IF WB MEM EX ID BNE L1 NEXT... L1: NEXT NEXT Not takenTaken

24 Pipeline Hazards Control (Branch) Hazards Delayed branch time stages IF WB MEM EX ID ADD R1,R2,R3 if (R2=0) branch L1 delay slot NEXT... L1: NEXT NEXT Not takenTaken branch instruction sequential successor Branch target if taken if (R2=0) branch L1 ADD R1,R2,R3 NEXT... L1: NEXT NEXT

25 Levels of Parallelism Bit level parallelism Within arithmetic logic circuits Instruction level parallelism Multiple instructions execute per clock cycle Memory system parallelism Overlap of memory operations with computation Operating system parallelism More than one processor Multiple jobs run in parallel on SMP Loop level Procedure level


Download ppt "Lecture 4: CPU Performance. A Modern Processor Intel Core i7."

Similar presentations


Ads by Google