Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSC 2405 Computer Systems II Advanced Topics. Instruction Set Architecture.

Similar presentations


Presentation on theme: "CSC 2405 Computer Systems II Advanced Topics. Instruction Set Architecture."— Presentation transcript:

1 CSC 2405 Computer Systems II Advanced Topics

2 Instruction Set Architecture

3 3 Chapter 4 Instruction Set Architecture Assembly Language View – Processor state Registers, memory, … – Instructions addl, movl, leal, … How instructions are encoded as bytes Layer of Abstraction – Above: how to program machine Processor executes instructions in a sequence – Below: what needs to be built Use variety of tricks to make it run fast E.g., execute multiple instructions simultaneously ISA CompilerOS CPU Design Circuit Design Chip Layout Application Program

4 4 Chapter 4 Instruction Set Architectures Basic ISA Classes StackAccumulatorRegister (Register-memory) Register (load-store) Push ALoad ALoad R1, A Push BAdd BAdd R1, BLoad R2, B AddStore CStore C, R1Add R3, R1, R2 Pop CStore C, R3 The results of different address classes is easiest to see with the examples here, all of which implement the sequences for C = A + B. Registers are the class that won out. The more registers on the CPU, the better.

5 5 Chapter 4 80x86 Instruction Frequency

6 6 Chapter 4 Relative Frequency of Control Instructions Design hardware to handle branches quickly, since these occur most frequently

7 7 Chapter 4 CISC Instruction Sets – Complex Instruction Set Computer – Dominant style through mid-80’s Stack-oriented instruction set – Use stack to pass arguments, save program counter – Explicit push and pop instructions Arithmetic instructions can access memory – addl %eax, 12(%ebx,%ecx,4) requires memory read and write Complex address calculation Condition codes – Set as side effect of arithmetic and logical instructions Philosophy – Add instructions to perform “typical” programming tasks

8 8 Chapter 4 RISC Instruction Sets – Reduced Instruction Set Computer – Internal project at IBM, later popularized by Hennessy (Stanford) and Patterson (Berkeley) Fewer, simpler instructions – Might take more to get given task done – Can execute them with small and fast hardware Register-oriented instruction set – Many more (typically 32) registers – Use for arguments, return pointer, temporaries Only load and store instructions can access memory – Similar to Y86 mrmovl and rmmovl No Condition codes – Test instructions return 0/1 in register

9 9 Chapter 4 Example RISC Instruction Formats Op 312601516202125 rs1rd immediate Op 3126025 Op 312601516202125 rs1rs2 offset added to PC rd Register-Register (R-type)ADD R1, R2, R3 56 1011 Register-Immediate (I-type)SUB R1, R2, #3 Jump / Call (J-type)JUMP end func (ALU imm. operations, loads and stores, conditional branch, jump (and link) (jump, jump and link, trap and return from exception) (ALI reg. operations, read/write special registers and moves)

10 10 Chapter 4 CISC vs. RISC Original Debate – Strong opinions! – CISC proponents---easy for compiler, fewer code bytes – RISC proponents---better for optimizing compilers, can make run fast with simple chip design Current Status – For desktop processors, choice of ISA not a technical issue With enough hardware, can make anything run fast Code compatibility more important – For embedded processors, RISC makes sense Smaller, cheaper, less power

11 Logic Design

12 12 Chapter 4 Overview of Logic Design Fundamental Hardware Requirements – Communication How to get values from one place to another – Computation – Storage Bits are Our Friends – Everything expressed in terms of values 0 and 1 – Communication Low or high voltage on wire – Computation Compute Boolean functions – Storage Store bits of information

13 13 Chapter 4 Digital Signals – Use voltage thresholds to extract discrete values from continuous signal – Simplest version: 1-bit signal Either high range (1) or low range (0) With guard range between them – Not strongly affected by noise or low quality circuit elements Can make circuits simple, small, and fast Voltage Time 0 1 0

14 14 Chapter 4 Computing with Logic Gates – Outputs are Boolean functions of inputs – Respond continuously to changes in inputs With some, small delay Voltage Time a b a && b Rising Delay Falling Delay

15 15 Chapter 4 Combinational Circuits Acyclic Network of Logic Gates – Continuously responds to changes on primary inputs – Primary outputs become (after some delay) Boolean functions of primary inputs Acyclic Network Primary Inputs Primary Outputs

16 16 Chapter 4 Bit Equality – Generate 1 if a and b are equal Hardware Control Language (HCL) – Very simple hardware description language Boolean operations have syntax similar to C logical operations – We’ll use it to describe control logic for processors Bit equal a b eq bool eq = (a&&b)||(!a&&!b) HCL Expression

17 17 Chapter 4 Word Equality – 32-bit word size – HCL representation Equality operation Generates Boolean value b 31 Bit equal a 31 eq 31 b 30 Bit equal a 30 eq 30 b1b1 Bit equal a1a1 eq 1 b0b0 Bit equal a0a0 eq 0 Eq = = B A Word-Level Representation bool Eq = (A == B) HCL Representation

18 18 Chapter 4 1-Bit Latch D Latch Q+ Q– R S D C Data Clock Latching 1 d!d d dd 0 Storing d!d q !q q 0 0

19 19 Chapter 4 Registers – Stores word of data Different from program registers seen in assembly code – Collection of edge-triggered latches – Loads input on rising edge of clock IO Clock D C Q+ D C D C D C D C D C D C D C i7i7 i6i6 i5i5 i4i4 i3i3 i2i2 i1i1 i0i0 o7o7 o6o6 o5o5 o4o4 o3o3 o2o2 o1o1 o0o0 Clock Structure

20 20 Chapter 4 Random-Access Memory – Stores multiple words of memory Address input specifies which word to read or write – Register file Holds values of program registers %eax, %esp, etc. Register identifier serves as address – ID 8 implies no read or write performed – Multiple Ports Can read and/or write multiple words in one cycle – Each has separate address and data input/output Register file Register file A B W dstW srcA valA srcB valB valW Read portsWrite port Clock

21 21 Chapter 4 Basic Logic Gates NOTE: okay to use just a circle for NOT: 

22 22 Chapter 4 More than 2 Inputs? AND/OR can take any number of inputs. – AND = 1 if all inputs are 1. – OR = 1 if any input is 1. – Similar for NAND/NOR. Can implement with multiple two-input gates

23 23 Chapter 4 Logical Completeness Can implement ANY truth table with AND, OR, NOT. ABCD 0000 0010 0101 0110 1000 1011 1100 1110 1. AND combinations that yield a "1" in the truth table. 2. OR the results of the AND gates.

24 24 Chapter 4 DeMorgan's Law Converting AND to OR (with some help from NOT) Consider the following gate: AB 001110 011001 100101 110001 To convert AND to OR (or vice versa), invert inputs and output.

25 25 Chapter 4 Decoder n inputs, 2 n outputs – exactly one output is 1 for each possible input pattern 2-bit decoder

26 Sequential Processors

27 27 Chapter 4 Sequential HW Structure State – Program counter register (PC) – Condition code register (CC) – Register File – Memories Access same memory space Data: for reading/writing program data Instruction: for reading instructions Instruction Flow – Read instruction at address specified by PC – Process through stages – Update program counter Instruction memory Instruction memory PC increment PC increment CC ALU Data memory Data memory Fetch Decode Execute Memory Write back icode, ifun rA,rB valC Register file Register file AB M E Register file Register file AB M E PC valP srcA,srcB dstA,dstB valA,valB aluA,aluB Bch valE Addr, Data valM PC valE,valM newPC

28 28 Chapter 4 Seqential Stages Fetch – Read instruction from instruction memory Decode – Read program registers Execute – Compute value or address Memory – Read or write data Write Back – Write program registers PC – Update program counter Instruction memory Instruction memory PC increment PC increment CC ALU Data memory Data memory Fetch Decode Execute Memory Write back icode, ifun rA,rB valC Register file Register file AB M E Register file Register file AB M E PC valP srcA,srcB dstA,dstB valA,valB aluA,aluB Bch valE Addr, Data valM PC valE,valM newPC

29 29 Chapter 4 Instruction Decoding Instruction Format – Instruction byteicode:ifun – Optional register byterA:rB – Optional constant wordvalC 50 rArB D icode ifun rA rB valC Optional

30 30 Chapter 4 Sequential Summary Implementation – Express every instruction as series of simple steps – Follow same general flow for each instruction type – Assemble registers, memories, predesigned combinational blocks – Connect with control logic Limitations – Too slow to be practical – In one cycle, must propagate through instruction memory, register file, ALU, and data memory – Would need to run clock very slowly – Hardware units only active for fraction of clock cycle

31 Pipelined Processors

32 32 Chapter 4 What is Pipelining Computers execute billions of instructions, so instruction throughput is what matters IDEA: Divide instruction execution up into several pipeline stages. For example IF ID EX MEM WB Simultaneously have different instructions in different pipeline stages The length of the longest pipeline stage determines the cycle time Desirable pipeline features (e.g., RISC): – all instructions same length – registers located in same place in instruction format – memory operands only in loads or stores

33 33 Chapter 4 What Is Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer takes 40 minutes “Folder” takes 20 minutes ABCD

34 34 Chapter 4 What Is Pipelining Sequential laundry takes 6 hours for 4 loads If they learned pipelining, how long would laundry take? A B C D 304020304020304020304020 6 PM 789 10 11 Midnight TaskOrderTaskOrder Time

35 35 Chapter 4 Start work ASAP Pipelined laundry takes 3.5 hours for 4 loads ABCD 6 PM 789 10 11 Midnight TaskOrderTaskOrder Time 3040 20 What Is Pipelining

36 36 Chapter 4 Pipelining Lessons Pipelining doesn’t help latency of single task, it helps throughput of entire workload Pipeline rate limited by slowest pipeline stage Multiple tasks operating simultaneously Potential speedup = Number pipe stages Unbalanced lengths of pipe stages reduces speedup Time to “fill” pipeline and time to “drain” it reduces speedup ABCD 6 PM 789 TaskOrderTaskOrder Time 3040 20 What Is Pipelining

37 37 Chapter 4 Real-World Pipelines: Car Washes Idea – Divide process into independent stages – Move objects through stages in sequence – At any given times, multiple objects being processed SequentialParallel Pipelined

38 38 Chapter 4 Pipeline Diagrams Unpipelined – Cannot start new operation until previous one completes 3-Way Pipelined – Up to 3 operations in process simultaneously Time OP1 OP2 OP3 Time ABC ABC ABC OP1 OP2 OP3

39 39 Chapter 4 Data Dependencies System – Each operation depends on result from preceding one Clock Combinational logic RegReg Time OP1 OP2 OP3

40 40 Chapter 4 Data Hazards – Result does not feed back around in time for next operation – Pipelining has changed behavior of system RegReg Clock Comb. logic A RegReg Comb. logic B RegReg Comb. logic C Time OP1 OP2 OP3 ABC ABC ABC OP4 ABC

41 41 Chapter 4 One Memory Port/Structural Hazards I n s t r. O r d e r Time (clock cycles) Load Instr 1 Instr 2 Instr 3 Instr 4 Reg ALU DMem Ifetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMem Ifetch Reg Cycle 1Cycle 2Cycle 3Cycle 4Cycle 6Cycle 7Cycle 5 Reg ALU DMemIfetch Reg

42 42 Chapter 4 I n s t r. O r d e r Time (clock cycles) Load Instr 1 Instr 2 Stall Instr 3 Reg ALU DMem Ifetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg Cycle 1Cycle 2Cycle 3Cycle 4Cycle 6Cycle 7Cycle 5 Reg ALU DMemIfetch Reg Bubble How do you “bubble” the pipe? One Memory Port/Structural Hazards

43 43 Chapter 4 I n s t r. O r d e r add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg Data Hazard on R1 Time (clock cycles) IFID/RF EX MEM WB

44 44 Chapter 4 Read After Write (RAW) Instr J tries to read operand before Instr I writes it Caused by a “Dependence” (in compiler nomenclature). This hazard results from an actual need for communication. Three Generic Data Hazards I: add r1,r2,r3 J: sub r4,r1,r3

45 45 Chapter 4 Write After Read (WAR) Instr J writes operand before Instr I reads it Called an “anti-dependence” by compiler writers. This results from reuse of the name “r1”. I: sub r4,r1,r3 J: add r1,r2,r3 K: mul r6,r1,r7 Three Generic Data Hazards

46 46 Chapter 4 Three Generic Data Hazards Write After Write (WAW) Instr J writes operand before Instr I writes it. Called an “output dependence” by compiler writers This also results from the reuse of name “r1”. I: sub r1,r4,r3 J: add r1,r2,r3 K: mul r6,r1,r7

47 47 Chapter 4 Data Forwarding Naïve Pipeline – Register isn’t written until completion of write-back stage – Source operands read from register file in decode stage Needs to be in register file at start of stage Observation – Value generated in execute or memory stage Trick – Pass value directly from generating instruction to decode stage – Needs to be available at end of decode stage

48 48 Chapter 4 Time (clock cycles) Forwarding to Avoid Data Hazard I n s t r. O r d e r add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg


Download ppt "CSC 2405 Computer Systems II Advanced Topics. Instruction Set Architecture."

Similar presentations


Ads by Google