Presentation is loading. Please wait.

Presentation is loading. Please wait.

EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG Computer.

Similar presentations


Presentation on theme: "EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG Computer."— Presentation transcript:

1 EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring 2004 http://www.eng.yale.edu/courses/eeng449bG EENG 449bG/CPSC 439bG Computer Systems Lecture 5 FP Pipelining & Dynamically Scheduled Pipelines and Overview of ARM Architecture Part I

2 EENG449b/Savvides Lec 5.2 1/27/04 Floating-Point Support in Pipelines Floating point operations will take more than 1 or 2 cycles to complete –Structural hazards –Data hazards Multiple functional units required –Loads, stores and integer ALUs –FP and integer multiplier –FP adder that handles FP add, subtract and conversion –FP and integer divider Initiation interval – number of cycles that must elapse before issuing two operations of a given type

3 EENG449b/Savvides Lec 5.3 1/27/04 Multiple FUs and Latencies Functional UnitLate ncy Initiation Interval Integer ALU01 Data memory (integer and FP Loads) 11 FP add31 FP Multiply61 FP Divide2425

4 EENG449b/Savvides Lec 5.4 1/27/04 Support for Multiple Outstanding Operations Additional pipeline registers needed

5 EENG449b/Savvides Lec 5.5 1/27/04 Hazards in Longer Pipelines 1.Divide unit is not fully pipelined - structural hazards can occur 2.Instructions have varying running times so the number of register writes required in a cycle can be larger than 1. 3.WAW hazards are possible, since instructions don’t reach WB in order 4.Instructions can complete in different order than the one they were issued causing problems with exceptions 5.Because of longer latency of operations, stalls for RAW hazards will be more frequent

6 EENG449b/Savvides Lec 5.6 1/27/04 FP Pipeline Hazards Example Figure A.34 Simultaneous writeback Stall an instruction in the ID stage Stall the instruction when it tries to enter WB

7 EENG449b/Savvides Lec 5.7 1/27/04 Checks for Detecting Hazards Three checks to be performed before a multicycle instruction can issue in the ID stage: Check for structural hazards –A structural unit is not busy and a write register port is available when needed Check for a RAW data hazard –Wait until the source registers are not listed as pending destinations Check for WAW data hazard –Determine an instruction that already issued has the same destination as this instruction. If so stall the instruction issue in ID.

8 EENG449b/Savvides Lec 5.8 1/27/04 MIPS R4000 Pipeline Decompose the 5-stage pipeline to a deeper 8-stage pipeline(superpipeline) –achieve higher clock rates => better performance Extra stages come from decomposing memory accesses Longer pipelines increase the amount of forwarding and branch delays

9 EENG449b/Savvides Lec 5.9 1/27/04 Branch Delay Cycles Branch outcome needs 3 cycles

10 EENG449b/Savvides Lec 5.10 1/27/04 Dynamic Scheduled Pipelines Simple pipelines result in hazards that require stalling. Static scheduling – compilers rearrange instructions to avoid stalls. Dynamic scheduling – processor executes instructions out-of-order to minimize stalls Dynamic scheduling requires splitting the ID stage into stages: –Issue – Decode instructions, check for structural hazards –Read operands – Wait until there are no data hazards, then read operands –Also need to know when each instruction begins and ends execution Requires a lot more bookkeeping! More when we discuss Tomasulo’s algorithm in chapter 3…

11 EENG449b/Savvides Lec 5.11 1/27/04 Scoreboarding Scoreboarding – a technique that allows out- of-order execution when resources are available and there are no data dependencies – originated in CDC6600 in the mid 60s. Scoreboard fully responsible for instruction execution and hazard detection –Requires changes in # of functional units and latency of operations –Needs to keep track of status of all instructions in execution

12 EENG449b/Savvides Lec 5.12 1/27/04 Scoreboarding II

13 EENG449b/Savvides Lec 5.13 1/27/04 More Hazards WAR and WAW hazards are now possible! DIV.D F0, F2, F4 ADD.D F10, F0, F8 SUB.D F8, F8, F14 DIV.D F0, F2, F4 ADD.D F10, F0, F8 SUB.D F10, F8, F14 WAR! If SUB.D Executes first WAW! If SUB.D Executes first

14 EENG449b/Savvides Lec 5.14 1/27/04 Refer to figures A.52 – A.54 for example scoreboard tables Scoreboarding is limited by: Amount of parallelism among instructions The number of scoreboard entries The number and types of functional units Presence of antidependencies and output dependencies

15 EENG449b/Savvides Lec 5.15 1/27/04 Announcements Example on page 44 of the textbook is wrong –CPI for FPSQR not included in the computation of CPI… –Everything after that is affected… Midterm I, Thursday Feb, 19 –Chapters 1, 2, Appendix A and microcontroller material from class. Readings for next class and project related material posted on the class website

16 EENG449b/Savvides Lec 5.16 1/27/04 ARM Architecture Part I

17 EENG449b/Savvides Lec 5.17 1/27/04 Where is ARM Today?

18 EENG449b/Savvides Lec 5.18 1/27/04

19 EENG449b/Savvides Lec 5.19 1/27/04

20 EENG449b/Savvides Lec 5.20 1/27/04

21 EENG449b/Savvides Lec 5.21 1/27/04

22 EENG449b/Savvides Lec 5.22 1/27/04 Not the case when you have loads and stores!!!!

23 EENG449b/Savvides Lec 5.23 1/27/04

24 EENG449b/Savvides Lec 5.24 1/27/04

25 EENG449b/Savvides Lec 5.25 1/27/04

26 EENG449b/Savvides Lec 5.26 1/27/04

27 EENG449b/Savvides Lec 5.27 1/27/04

28 EENG449b/Savvides Lec 5.28 1/27/04

29 EENG449b/Savvides Lec 5.29 1/27/04

30 EENG449b/Savvides Lec 5.30 1/27/04

31 EENG449b/Savvides Lec 5.31 1/27/04

32 EENG449b/Savvides Lec 5.32 1/27/04 Microcontroller View

33 EENG449b/Savvides Lec 5.33 1/27/04 Price/Performance/Peripheral Tradeoffs For many consumer electronics cost is an issue –ARM7TDMI cores have less HW and cost less –With today’s prices you can get an ARM7 based chip for < $5.00 Power Tradeoffs –Power performance is given in Watts/MIPS but –Lifetime is a bandwidth vs. throughput issue »Bandwidth vs. thoughput of battery life

34 EENG449b/Savvides Lec 5.34 1/27/04 Features ARM7TDMI ROM-less (ML675001) 256KB MCP Flash (ML67Q5002) 512KB MCP Flash (ML67Q5003) 8KB Unified Cache 32KB RAM Interrupts 25 + 1 FIQ I2C (1-ch x master) DMA (2-ch) Timers (7 x 16-bit) WDT (16-bit) PWM (2 x 16-bit) UART (2-ch)/ SIO (1-ch) GPIO (5 x 8-bit) ADC (4-ch x 10-bit) up to 66MHz -40 ~ +85  C Package 144 LFBGA 144 QFP ML675001/67Q5002/67Q5003

35 EENG449b/Savvides Lec 5.35 1/27/04 Next Time Power Metrics Dynamic Voltage Scaling Microcontroller Programming Cycle


Download ppt "EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG Computer."

Similar presentations


Ads by Google