Download presentation

1
**Lecture 4: CPU Performance**

2
A Modern Processor Intel Core i7

3
**Processor Performance**

Lower bounds that characterize the maximum performance: Latency Bound Occurs when operations must be performed in strict sequence (e.g. data dependency) Minimum time to perform the operations sequentially Throughput Bound Characterizes the raw computing capacity of the processor’s functional units. Maximum operations per cycle

4
**Pipelining s1 s2 s3 Without pipeline With pipeline stages stages s3 s3**

time time Without pipeline With pipeline

5
**Pipelining Without pipeline With pipeline T1 = s . t . n**

stages stages s3 s3 s2 s2 s1 s1 time time Without pipeline With pipeline T1 = s . t . n Tp = s . t + (n-1).t Speedup = T1 / Tp = s.n = s s+(n-1) s/n +(1-1/n) Speedup = s n s – stages n – tasks t – time per stage Throughput = n Tp

6
**Pipelining Slowest stage determines the pipeline performance s1 s2 s3**

s1 s2 s3 stages stages s3 s3 s2 s2 s1 s1 time time Without pipeline With pipeline Slowest stage determines the pipeline performance

7
**Computational Pipelines**

Combinatorial logic Reg clock R R R Comb.log. A Comb.log. B Comb.log. C clock

8
**Limitations of Pipelining**

Nonuniform partitioning Stage delays may be nonuniform Throughput is limited by the slowest stage Deep pipelining Large number of stages Modern processors have deep pipelines (15 or more) to increase the clock rate. 50ps ps ps ps ps ps Comb.log. A R B C clock 50ps ps ps ps ps ps R R R … Comb.log. A Comb.log. B Comb.log. C clock

9
**Pipelined Parallel Adder**

a4,b4 a3,b3 a2,b2 a1,b1

10
**Pipelined Parallel Adder**

c4,d4 c3,d3 c2,d2 c1,d1 a4,b4 a3,b3 a2,b2 a1+b1

11
**Pipelined Parallel Adder**

e4,f4 e3,f3 e2,f2 e1,f1 c2,d2 c1+d1 c4,d4 c3,d3 a3,b3 a2+b2 a1+b1 a4,b4

12
**Pipelined Parallel Adder**

g4,h4 g3,h3 g2,h2 g1,h1 e4,f4 e3,f3 e2,f2 e1+f1 c4,d4 c3,d3 c2+d2 c1+d1 a3+b3 a4,b4 a2+b2 a1+b1

13
**Pipelined Parallel Adder**

g3,h3 g2,h2 g1+h1 g4,h4 e4,f4 e3,f3 e2+f2 e1+f1 c4,d4 c3+d3 c2+d2 c1+d1 a4+b4 a3+b3 a2+b2 a1+b1

14
**Instruction Execution Pipeline**

Instruction Fetch Cycle (IF) Fetch current instruction from memory Increment PC Instruction decode / register fetch cycle (ID) Decode instruction Compute possible branch target Read registers from the register file Execution / effective address cycle (EX) Form the effective address ALU performs the operation specified by the opcode Memory access (MEM) Memory read for load instruction Memory write for store instruction Write-back cycle (WB) Write result into register file IF ID EX MEM WB

15
**Instruction Execution Pipeline**

IF ID EX MEM WB stages WB MEM EX ID IF time

16
Pipeline Hazards Structural hazards Data Hazards Control Hazards

17
**Pipeline Hazards Structural Hazards**

Arise from resource conflicts when the hardware cannot support all possible combinations of instructions simultaneously in overlapped execution. stages stall (bubble) WB MEM EX ID IF time IF ID EX MEM WB Mem Reg ALU Mem Reg

18
**Pipeline Hazards Data Hazards**

Arise when an instruction depends on the results of a previous instruction in a way that is exposed by the overlapping of instructions. ADD R1, R2, R3 SUB R4, R1, R5 AND R6, R1, R7 OR R8, R1, R9 XOR R10, R1, R11 stages WB MEM EX ID IF time IF ID EX MEM WB Mem Reg ALU Mem Reg

19
**Pipeline Hazards Data Hazards Forwarding (by-passing) IF ID EX MEM WB**

Mem Reg ALU Mem Reg IF ID EX MEM WB Mem Reg ALU Mem Reg IF ID EX MEM WB Mem Reg ALU Mem Reg IF ID EX MEM WB Mem Reg ALU Mem Reg

20
**Control (Branch) Hazards**

Pipeline Hazards Control (Branch) Hazards Arise from pipelining of instructions (e.g. branch) that change PC. LOOP: LOAD 100,X ADD 200,X STORE 300,X DECX BNE LOOP ... for i=n to 1 ci = ai + bi stages WB MEM EX ID IF time

21
**Control (Branch) Hazards**

Pipeline Hazards Control (Branch) Hazards Freeze (flush) BRA L1 ... L1: NEXT NEXT stages WB MEM EX ID IF time

22
**Control (Branch) Hazards**

Pipeline Hazards Control (Branch) Hazards Predicted-not-taken BNE L1 NEXT ... L1: NEXT stages WB MEM EX ID IF time Not taken Taken

23
**Control (Branch) Hazards**

Pipeline Hazards Control (Branch) Hazards Predicted-taken BNE L1 NEXT ... L1: NEXT stages WB MEM EX ID IF time Not taken Taken

24
**Control (Branch) Hazards**

Pipeline Hazards Control (Branch) Hazards Delayed branch ADD R1,R2,R3 if (R2=0) branch L1 delay slot NEXT ... L1: NEXT if (R2=0) branch L1 ADD R1,R2,R3 NEXT ... L1: NEXT branch instruction sequential successor Branch target if taken stages WB MEM EX ID IF time Not taken Taken

25
**Levels of Parallelism Bit level parallelism**

Within arithmetic logic circuits Instruction level parallelism Multiple instructions execute per clock cycle Memory system parallelism Overlap of memory operations with computation Operating system parallelism More than one processor Multiple jobs run in parallel on SMP Loop level Procedure level

Similar presentations

Presentation is loading. Please wait....

OK

PipeliningPipelining Computer Architecture (Fall 2006)

PipeliningPipelining Computer Architecture (Fall 2006)

© 2018 SlidePlayer.com Inc.

All rights reserved.

To ensure the functioning of the site, we use **cookies**. We share information about your activities on the site with our partners and Google partners: social networks and companies engaged in advertising and web analytics. For more information, see the Privacy Policy and Google Privacy & Terms.
Your consent to our cookies if you continue to use this website.

Ads by Google

Ppt on product life cycle in marketing Ppt on kingdom monera Ppt on power generation using footsteps of jesus Ppt on chapter 3 atoms and molecules youtube Ppt on perimeter and area of circle Ppt on fire extinguisher types and uses Ppt on span of control Creating ppt on ipad Ppt on 2 dimensional figures and 3 dimensional slides to digital Ppt on fmcg industry in india