© 2000 Morgan Kaufman Overheads for Computers as Components CPUs zCPU performance zCPU power consumption.

Slides:



Advertisements
Similar presentations
CPU performance CPU power consumption
Advertisements

Lecture 4: CPU Performance
1 ECE369 ECE369 Pipelining. 2 ECE369 addm (rs), rt # Memory[R[rs]] = R[rt] + Memory[R[rs]]; Assume that we can read and write the memory in the same cycle.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. CPUs CPU performance CPU power consumption. 1.
Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
1 Lecture 5: Static ILP Basics Topics: loop unrolling, VLIW (Sections 2.1 – 2.2)
ENGS 116 Lecture 101 ILP: Software Approaches Vincent H. Berk October 12 th Reading for today: , 4.1 Reading for Friday: 4.2 – 4.6 Homework #2:
1 Advanced Computer Architecture Limits to ILP Lecture 3.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Intro to Computer Org. Pipelining, Part 2 – Data hazards + Stalls.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
ELEN 468 Advanced Logic Design
CMPT 334 Computer Organization
Chapter 8. Pipelining.
Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.
ARM Organization and Implementation Aleksandar Milenkovic Web:
1 Lecture: Pipeline Wrap-Up and Static ILP Topics: multi-cycle instructions, precise exceptions, deep pipelines, compiler scheduling, loop unrolling, software.
S. Barua – CPSC 440 CHAPTER 6 ENHANCING PERFORMANCE WITH PIPELINING This chapter presents pipelining.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
Pipelining Andreas Klappenecker CPSC321 Computer Architecture.
1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.
Computer ArchitectureFall 2007 © October 24nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
King Fahd University of Petroleum and Minerals King Fahd University of Petroleum and Minerals Computer Engineering Department Computer Engineering Department.
7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.
Appendix A Pipelining: Basic and Intermediate Concepts
Pipelining What is it? How does it work? What are the benefits? What could go wrong? By Derek Closson.
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
Pipelining. Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization.
-1.1- PIPELINING 2 nd week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM PIPELINING 2 nd week References Pipelining concepts The DLX.
© 2000 Morgan Kaufman Overheads for Computers as Components CPUs zInput and output. zSupervisor mode, exceptions, traps. zCo-processors.
Computer Science Education
Automobile Manufacturing 1. Build frame. 60 min. 2. Add engine. 50 min. 3. Build body. 80 min. 4. Paint. 40 min. 5. Finish.45 min. 275 min. Latency: Time.
1 Appendix A Pipeline implementation Pipeline hazards, detection and forwarding Multiple-cycle operations MIPS R4000 CDA5155 Spring, 2007, Peir / University.
Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
© 2000 Morgan Kaufman Overheads for Computers as Components Architectures and instruction sets zComputer architecture taxonomy. zAssembly language.
CS 1104 Help Session IV Five Issues in Pipelining Colin Tan, S
© 2000 Morgan Kaufman Overheads for Computers as Components CPUs zCPU performance: How fast it can execute instructions  increasing throughput by pipelining.
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
Natawut NupairojAssembly Language1 Pipelining Processor.
Pipelining Example Laundry Example: Three Stages
HazardsCS510 Computer Architectures Lecture Lecture 7 Pipeline Hazards.
Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.
HazardsCS510 Computer Architectures Lecture Lecture 7 Pipeline Hazards.
11 Pipelining Kosarev Nikolay MIPT Oct, Pipelining Implementation technique whereby multiple instructions are overlapped in execution Each pipeline.
Introduction to Computer Organization Pipelining.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
RISC / CISC Architecture by Derek Ng. Overview CISC Architecture RISC Architecture  Pipelining RISC vs CISC.
Speed up on cycle time Stalls – Optimizing compilers for pipelining
Pipelining Chapter 6.
CSCI206 - Computer Organization & Programming
Single Clock Datapath With Control
Pipeline Implementation (4.6)
The fetch-execute cycle
CS 5513 Computer Architecture Pipelining Examples
Pipelining Chapter 6.
CSCI206 - Computer Organization & Programming
Data Hazards Data Hazard
How to improve (decrease) CPI
Overheads for Computers as Components 2nd ed.
Chapter 8. Pipelining.
CSC3050 – Computer Architecture
ARM ORGANISATION.
Pipelining Chapter 6.
Introduction to Computer Organization and Architecture
Pipelining Chapter 6.
CS 3853 Computer Architecture Pipelining Examples
Presentation transcript:

© 2000 Morgan Kaufman Overheads for Computers as Components CPUs zCPU performance zCPU power consumption.

© 2000 Morgan Kaufman Overheads for Computers as Components Elements of CPU performance zCycle time. zCPU pipeline. zMemory system.

© 2000 Morgan Kaufman Overheads for Computers as Components Pipelining zSeveral instructions are executed simultaneously at different stages of completion. zVarious conditions can cause pipeline bubbles that reduce utilization: ybranches; ymemory system delays; yetc.

© 2000 Morgan Kaufman Overheads for Computers as Components Pipeline structures zBoth ARM and SHARC have 3-stage pipes: yfetch instruction from memory; ydecode opcode and operands; yexecute.

© 2000 Morgan Kaufman Overheads for Computers as Components ARM pipeline execution add r0,r1,#5 sub r2,r3,r6 cmp r2,#3 fetch time decode fetch execute decode fetch execute decode execute 123

© 2000 Morgan Kaufman Overheads for Computers as Components Performance measures zLatency: time it takes for an instruction to get through the pipeline. zThroughput: number of instructions executed per time period. zPipelining increases throughput without reducing latency.

© 2000 Morgan Kaufman Overheads for Computers as Components Pipeline stalls zIf every step cannot be completed in the same amount of time, pipeline stalls. zBubbles introduced by stall increase latency, reduce throughput.

© 2000 Morgan Kaufman Overheads for Computers as Components ARM multi-cycle LDMIA instruction fetchdecode ex ld r2 ldmia r0,{r2,r3} sub r2,r3,r6 cmp r2,#3 ex ld r3 fetch time decode ex sub fetchdecode ex cmp

© 2000 Morgan Kaufman Overheads for Computers as Components Control stalls zBranches often introduce stalls (branch penalty). yStall time may depend on whether branch is taken. zMay have to squash instructions that already started executing. zDon’t know what to fetch until condition is evaluated.

© 2000 Morgan Kaufman Overheads for Computers as Components ARM pipelined branch time fetchdecode ex bne bne foo sub r2,r3,r6 fetchdecode foo add r0,r1,r2 ex bne fetchdecode ex add ex bne

© 2000 Morgan Kaufman Overheads for Computers as Components Delayed branch zTo increase pipeline efficiency, delayed branch mechanism requires n instructions after branch always executed whether branch is executed or not. zSHARC supports delayed and non-delayed branches. ySpecified by bit in branch instruction. y2 instruction branch delay slot.

© 2000 Morgan Kaufman Overheads for Computers as Components Example: SHARC code scheduling L1=5; DM(I0,M1)=R1; L8=8; DM(I8,M9)=R2; z CPU cannot use DAG on cycle just after loading DAG’s register. yCPU performs NOP between register assign and DM.

© 2000 Morgan Kaufman Overheads for Computers as Components Rescheduled SHARC code L1=5; L8=8; DM(I0,M1)=R1; DM(I8,M9)=R2; z Avoids two NOP cycles.

© 2000 Morgan Kaufman Overheads for Computers as Components Example: ARM execution time zDetermine execution time of FIR filter: for (i=0; i<N; i++) f = f + c[i]*x[i]; zOnly branch in loop test may take more than one cycle.  BLT loop takes 1 cycle best case, 3 worst case.

© 2000 Morgan Kaufman Overheads for Computers as Components Superscalar execution zSuperscalar processor can execute several instructions per cycle. yUses multiple pipelined data paths. zPrograms execute faster, but it is harder to determine how much faster.

© 2000 Morgan Kaufman Overheads for Computers as Components Data dependencies zExecution time depends on operands, not just opcode. zSuperscalar CPU checks data dependencies dynamically: add r2,r0,r1 add r3,r2,r5 data dependency r0r1 r2r5 r3

© 2000 Morgan Kaufman Overheads for Computers as Components Memory system performance zCaches introduce indeterminacy in execution time. yDepends on order of execution. zCache miss penalty: added time due to a cache miss. zSeveral reasons for a miss: compulsory, conflict, capacity.

© 2000 Morgan Kaufman Overheads for Computers as Components CPU power consumption zMost modern CPUs are designed with power consumption in mind to some degree. zPower vs. energy: yheat depends on power consumption; ybattery life depends on energy consumption.

© 2000 Morgan Kaufman Overheads for Computers as Components CMOS power consumption zVoltage drops: power consumption proportional to V 2. zToggling: more activity means more power. zLeakage: basic circuit characteristics; can be eliminated by disconnecting power.

© 2000 Morgan Kaufman Overheads for Computers as Components CPU power-saving strategies zReduce power supply voltage. zRun at lower clock frequency. zDisable function units with control signals when not in use. zDisconnect parts from power supply when not in use.

© 2000 Morgan Kaufman Overheads for Computers as Components Power management styles zStatic power management: does not depend on CPU activity. yExample: user-activated power-down mode. zDynamic power management: based on CPU activity. yExample: disabling off function units.

© 2000 Morgan Kaufman Overheads for Computers as Components Application: PowerPC 603 energy features zProvides doze, nap, sleep modes. zDynamic power management features: yUses static logic. yCan shut down unused execution units. yCache organized into subarrays to minimize amount of active circuitry.

© 2000 Morgan Kaufman Overheads for Computers as Components PowerPC 603 activity zPercentage of time units are idle for SPEC integer/floating-point: unitSpecint92Specfp92 D cache29%28% I cache29%17% load/store35%17% fixed-point38%76% floating-point99%30% system register89%97%

© 2000 Morgan Kaufman Overheads for Computers as Components Power-down costs zGoing into a power-down mode costs: ytime; yenergy. zMust determine if going into mode is worthwhile. zCan model CPU power states with power state machine.

© 2000 Morgan Kaufman Overheads for Computers as Components Application: StrongARM SA-1100 power saving zProcessor takes two supplies: yVDD is main 3.3V supply. yVDDX is 1.5V. zThree power modes: yRun: normal operation. yIdle: stops CPU clock, with logic still powered.  Sleep: shuts off most of chip activity; 3 steps, each about 30  s; wakeup takes > 10 ms.

© 2000 Morgan Kaufman Overheads for Computers as Components SA-1100 power state machine run idle sleep P run = 400 mW P idle = 50 mW P sleep = 0.16 mW 10  s 90  s 160 ms 90  s