Presentation on theme: "CML CML CS 230: Computer Organization and Assembly Language Aviral Shrivastava Department of Computer Science and Engineering School of Computing and Informatics."— Presentation transcript:
CML CML CS 230: Computer Organization and Assembly Language Aviral Shrivastava Department of Computer Science and Engineering School of Computing and Informatics Arizona State University Slides courtesy: Prof. Yann Hang Lee, ASU, Prof. Mary Jane Irwin, PSU, Ande Carle, UCB
CML CMLAnnouncements Project 3 –MIPS Assembler Project 4 –MIPS Simulator –Due Nov 10, 2009 Quiz 4 –Nov 5, 2009 –Single-cycle implementation Finals –Tuesday, Dec 08, 2009 –Please come on time (You’ll need all the time) –Open book, notes, and internet –No communication with any other human
CML CML Single Cycle - Abstract View Abstract View –elements that operate on data values (combinational) –elements that contain state (sequential) Implementation –Design the datapath –Design the control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data
CML CML 26 Single cycle Datapath Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 ALU ovf zero ALU controlRegWrite Data Memory Address Write Data Read Data MemWrite MemRead Sign Extend 1632 MemtoReg ALUSrc Read Address Instruction Memory Add PC 4 Shift left 2 Add PCSrc 0 1 Shift left 2 Jump 28 PC+4[31-28] 32
CML CML Instr[25-0] Single cycle Datapath + Control Read Address Instr[31-0] Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 ALU ovf zero RegWrite Data Memory Address Write Data Read Data MemWrite MemRead Sign Extend 1632 MemtoReg ALUSrc Shift left 2 Add PCSrc RegDst ALU control 1 1 1 0 0 0 0 1 ALUOp Instr[5-0] Instr[15-0] Instr[25-21] Instr[20-16] Instr[15 -11] Control Unit Instr[31-26] Branch Shift left 2 0 1 Jump 32 26 PC+4[31-28] 28
CML CML Single cycle Control Unit Completely determined by the instruction opcode field –Note that a multiplexor whose control input is 0 has a definite action, even if it is not used in performing the operation InstrRegDstALUSrcMemtoRegRegWrMemRdMemWrBranchALUOp1ALUOp0 R-type 000000 1001X001X lw 100011 011110000 sw 101011 X1X0X1000 beq 000100 X0X0X01X1
CML CML Disadvantages of Single Cycle Implementation Uses the clock cycle inefficiently – the clock cycle must be timed to accommodate the slowest instruction –especially problematic for more complex instructions like floating point multiply Is wasteful of area since some functional units must be duplicated since they can not be “shared” during an instruction execution –e.g., need separate adders to do PC update and branch target address calculations, as well as an ALU to do R- type arithmetic/logic operations and data memory address calculations
CML CML How to make it fast? Parallelism Short-cuts or Caching, or Bypassing Prediction Skip some work First form of parallelism is Pipelining
CML CML Pipelining: Its Natural! Laundry Example –Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer takes 40 minutes “Folder” takes 20 minutes ABCD
CML CML Pipelined Laundry Pipelined laundry takes 3.5 hours for 4 loads A BCD 6 PM 789 10 11 Midnight TaskOrderTaskOrder Time 3040 20 Note: More time to do project 4
CML CML Pipelining Lessons Multiple tasks operating simultaneously Pipelining doesn’t help latency of single task, it helps throughput of entire workload Pipeline rate limited by slowest pipeline stage Potential speedup = Number pipe stages Unbalanced lengths of pipe stages reduces speedup Also, need time to “fill” and “drain” the pipeline. ABCD 6 PM 789 TaskOrderTaskOrder Time 3040 20
CML CML Pipelining: Some terms If you’re doing laundry or implementing a P, each stage where something is done called a pipe stage –In laundry example, washer, dryer, and folding table are pipe stages; clothes enter at one end, exit other –In a P, instructions enter at one end and have been executed when they leave –Another example: auto assembly line Throughput is how often stuff comes out of a pipeline
CML CML Technical details If times for all S stages are equal to T: –Time for one initiation to complete still ST –Time between 2 initiates = T not ST –Initiations per second = 1/T Pipelining: Overlap multiple executions of same sequence –Improves THROUGHPUT, not the time to perform a single operation Other examples: –Automobile assembly plant, chemical factory, garden hose, cooking
CML CML More technical details Book’s approach to draw pipeline timing diagrams… –Time runs left-to-right, in units of stage time –Each “row” below corresponds to distinct initiation –Boundary b/t 2 column entries: pipeline register (i.e. hamper) –Must look at column contents to see what stage is doing what 0123456 Wash 1Dry 1Fold 1Pack 1 Wash 2Dry 2Fold 2Pack 2 Wash 3Dry 3Fold 3Pack 3 Wash 4Dry 4Fold 4Pack 4 Wash 5Dry 5Fold 5 Wash 6Dry 6 Time for N initiations to complete: NT + (S-1)T Throughput: Time per initiation = T + (S-1)T/N T!
CML CML Ideal pipeline speedup Latch combinational logic delay = combinational logic delay = combinational logic delay = combinational logic delay = Unpipelined Latch delay for 1 piece of data = 4 + latch setup (assume small) approximate delay for 1000 pieces of data = 4000 Latch combinational logic delay = combinational logic delay = combinational logic delay = combinational logic delay = Pipelined Latch delay for 1 piece of data = 4( + latch setup) approximate delay for 1000 pieces of data = 3 + 1000 Ideal speedup = # of pipeline stages speedup for 1000 pieces of data = 4000 = ~ 4 1003
CML The “new look” dataflow PC Inst. Memory 4 ADD Register File Sign Extend 1632 MuxMux MuxMux Comp. ALU Branch taken MuxMux Data Mem. IR 6...10 IR 11..15 MEM/ WB.IR MuxMux IF/IDID/EXEX/MEMMEM/WB Data must be stored from one stage to the next in pipeline registers/latches. hold temporary values between clocks and needed info. for execution. Data must be stored from one stage to the next in pipeline registers/latches. hold temporary values between clocks and needed info. for execution.
CML Another way to look at it… Inst. #12345678 Inst. iIFIDEXMEMWB Inst. i+1IFIDEXMEMWB Inst. i+2IFIDEXMEMWB Inst. i+3IFIDEXMEMWB Clock Number ALU RegIMDMReg ALU RegIMDMReg ALU RegIMDMReg ALU RegIMDMReg Program execution order (in instructions) Time
CML CML Questions about control signals Following discussion relevant to a single instruction Q: Are all control signals active at the same time? Q: Can we generate all these signals at the same time?
CML CML Passing control w/pipe registers Analogy: send instruction with car on assembly line –“Install Corinthian leather interior on car 6 @ stage 3” WB M EX WB M Control IF/IDID/EXEX/MEMMEM/WB I n s t r u c t i o n RegDst ALUOp ALUSrc Branch MemRead MemWrite MemtoReg RegWrite strip off signals for execution phase strip off signals for write-back phase strip off signals for memory phase Genera- tion
CML CML A Pipelined Processor Pipeline latches: pass the status and result of the current instruction to next stage Comparison: Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9Cycle 10 Ifetch lw sw Dec/Reg Exec Mem Wr Dec/Reg Exec Mem Ifetch Single-cycle IfetchDec/Reg Exec Mem Wr IfetchDec/Reg Exec Mem Wr IfetchDec/Reg Exec Mem Wr pipelined
CML CML Yoda says… Ohhh. Great warrior. Wars not make one great