Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Computer Architecture & Assembly Language Spring 2001 Dr. Richard Spillman Lecture 24 –RISC II.

Similar presentations


Presentation on theme: "1 Computer Architecture & Assembly Language Spring 2001 Dr. Richard Spillman Lecture 24 –RISC II."— Presentation transcript:

1

2 1 Computer Architecture & Assembly Language Spring 2001 Dr. Richard Spillman Lecture 24 –RISC II

3 2 Semester Topics PLU 1 CPU Architecture CPU Disk Memory I/O ALU Assembly Microprogramming Alternatives Cache Virtual Structure Operation Network

4 3 Review – Last Lecture The Case for CISC Introduction to RISC RISC II Architecture SPARC Architecture

5 4 Outline MIPS Architecture PowerPC Architecture Alpha Architecture It’s too much my circuits hurt

6 5 MIPS Architecture The MIPS (Microprocessor without Interlocked Pipeline Stages) is a 64-bit, 1.3 million transistor 1-micron CMOS 447 pin microprocessor Based on a design developed at Stanford Now used in Silicon Graphics systems It has two logically independent processors a main CPU which is a 32-bit RISC processor an internal co-processor, CPO, which manages memory, exceptions, and operating system functions It can support up to 3 other co-processors one of which is a floating point processor

7 6 MIPS Registers There are 32 general purpose registers the size is either 32-bits or 64-bits depending on the operating system R0 is always 0 R31 is the link register R29 is, by convention, the stack pointer Also by convention, R8 to R15 are used by the compiler to store temporary results whose values are not saved across procedure calls There are two registers HI and LO that are used to save the results of multiplication and division

8 7 MIPS Instructions Three basic formats all in a fixed 32-bit implementation I-Type (immediate) J-Type (jump) R-type (register) OP rs rt immediate 31 26 25 21 20 16 15 0 rt: target register field rs: source register field OP target 31 26 25 0 the 26-bit target field is expanded by two 0’s on the end since all instructions are aligned on a 4-byte boundary OP rs rt rd shamt funct 31 26 25 21 20 16 15 11 10 6 5 0 rd: destination register shamt: shift amount funct: function

9 8 Instruction Issues The MIPS follows the RISC concept by requiring that all computational instructions restrict their operands to register and immediate data Other unusual conditions there is no register to register move instruction an OR instruction is used to implement a MOV MOV R2, R4 OR R2, R4, R0 that is R4 is ORed with R0 and the result stored in R2 the jump and branch instructions are limited by the lack of a flag register

10 9 Conditional Jumps There are two conditional test and jump instructions BEQ (branch on equal) BNE (branch on not equal) They both use two registers in an I-type instruction compare rs to rt and use the 16-bit offset for a branch A less than test is provided by the SLT (set on less than) instruction SLT rd, rs, rt which rd is 1 if rs < rt, 0 otherwise follow this with a BEQ or BNE between rd and r0

11 10 Instruction Pipeline The pipeline is one of the major features of the MIPS architecture (its part of its name) uses a 5 stage pipeline IF RD ALU MEM WB During the instruction fetch stage (IF) the instruction address is calculated and the instruction is loaded from the instruction cache During the read stage (RD) there is some instruction decoding and register operands required by the instruction are read from the CPU During the ALU stage the operation is performed or in the case of load/store the effective address is calculated During the memory stage any required operands in memory are fetched During the writeback stage (WB) the ALU result is placed in the correct location RESULT: instructions require 5 cycles but the throughput is 1 cycle

12 11 Pipeline Problems There are two problems with this and any pipeline jumps and branches may invalidate instructions already in the pipe data dependencies may mean that the correct data is not available when an instruction in the pipe requires it There is no hardware in the MIPS (like there is in SPARC) to handle these problems

13 12 MIPS Solution Pipeline problems are solved in software RULE: All instructions placed after a branch must be executed whether the branch is taken or not Requirement: the compiler must fill a load delay slot (a slot after a branch) with a useful instruction or a NOP In addition, the compiler must look for data dependencies and insert stall cycles

14 13 The Stall Cycle Consider the following MIPS code: LW R2, A (load R2 with the data at memory location A)... (some instruction that does not involve R2) LW R3, 10(R2) (load R3 with data related to R2) PROBLEM: If the data in memory location A is in the cache then the timing works, if not then the timing fails cache requires 2 cycles memory require 4 or 5 cycles

15 14 The MIPS Stall If the cache cannot deliver the data, then stall cycles are added to the pipeline (i.e. the pipeline is frozen) until the data appears Hence, a stall cycle is used when miss in the data cache miss in the instruction cache busy write buffer (sending data out) RESULT: SPARC uses hardware interlock which involves complex hardware but efficient code/MIPS uses stalling which involves simple hardware but inefficient code

16 15 Power PC Jointly developed by Apple, IBM, and Motorola it comes in a 32-bit and 64-bit version the 601 is a 32-bit microprocessor 32-bit addresses 8, 16, and 32-bit integer data 32 and 64-bit floating point data first implementation of the PowerPC family 602, 603, 604 are also available

17 16 601 Features It contains three execution units: integer unit (IU), branch processing unit (BPU), and a pipelined floating point unit (FPU) Other features include: capable of executing 3 instructions per clock cycle contains an on-chip 32-Kbyte unified cache (combined instruction and data cache) contains an on-chip memory management unit (MMU) has 64-bit data and 32-bit address bus

18 17 601 Architecture RTCU RTCL RTC Instruction Unit Instruction Queue Issue Logic BPU CTR CR LR IU FPU XER GPR File FPR File FPSCR MMU 32k Cache Tags 8 Words Data Address

19 18 RTC (Real Time Clock) The 601 has an on-chip clock (usually these are outside a microprocessor) It contains two registers, RTCU and RTCL the RTCU (upper) maintains the number of seconds from a point in time specified by software the RTCL (lower) counts nanoseconds either register may be copied to any 601 general purpose register They are used for task switching and for keeping calendar dates

20 19 Instruction Unit The Instruction Unit computes the address of the next instruction to be fetched. It contains a instruction queue holds 8 instructions can be filled from the cache in one cycle It also contains the Branch Processing Unit (BPU) searches the instruction queue for conditional branch instructions (... more later...)

21 20 Execution Unit The 601 execution unit includes three on- chip hardware components The floating-point unit (FPU), the integer unit (IU), and the branch processing unit (BPU) Each unit operates independently and in parallel

22 21 Floating Point Unit The FPU includes a single-precision multiply-add array the floating-point status and control register (FPSCR) thirty-two 64-bit floating point registers The FPU is pipelined so results are available on each clock cycle

23 22 Instruction Unit (IU) The Integer Unit executes all integer instructions usually in one cycle It contains an ALU, an integer exception register (XER) and a general purpose register file (GPR) the XER deals with exceptions (i.e. interrupts) there are thirty-two 32-bit registers in the GPR

24 23 Branch Processing Unit (BPU) Used for prediction of conditional branch instructions early in order to achieve zero- cycle branches. Contains an adder to compute branch target addresses and three special registers link register (LR) - save the return pointer from a subroutine count register (CTR) - contains the branch target address conditional register (CR) - used for testing and branching

25 24 Memory Management Unit Supports up to 4 peta bytes (2 52 ) of virtual memory and 4 gigabytes of physical memory Functions translate logical addresses into physical addresses translate I/O addresses (I/O is memory mapped) support demand-paged virtual memory

26 25 Cache Unit A 32-KByte cache associate - addressed by content one line is 64 bytes divided into two 8-word segments each segment is independent (read or write) uses the LRU (least recently used) replacement policy each cache line is tagged with access and replacement information

27 26 Memory Unit In addition to the MMU, the 601 has an on- chip memory unit (not shown in the diagram) consists of read and write queues serve as buffers between the cache and external devices such as memory the read queue buffers 8 words of data between the external bus and the cache the write queue buffers 24 words of data

28 27 Selected Power PC Features System Interface 32-bit address bus 64-bit data bus 52 control signals Byte/Bit ordering supports both big-endian/little-endian ordering big-endian ordering assigns the lowest address to the highest order byte of multiple byte data

29 28 Register Structure In addition to the 32 general purpose registers, the 601 contains a set of supervisor-level registers (only accessed by programs executed with supervisor privileges) Machine State Register (MSR): 32 bits which indicate the state of the processor privilege level, single step mode,... segment registers: 16 32-bit registers used for effective address calculations decrement register: 32-bit decrementing counter used for programmable delays Others...

30 29 DEC Alpha AXP Architecture The Alpha is a 64-bit RISC microprocessor the first implementation was in a 0.75-micron CMOS technology (21064) used 1.68 million transistors 150-200 MHz on a 431-pin chip the next version was the 21064A 0.5-micron technology 2.5 million transistors 225 to 275 MHz

31 30 Alpha Block Structure BIUBIU Address Bus Data Bus (128 bits) External Cache External System ICache DCache EBox IBox FBox Branch History Table Tag Data Tag Data ABox Write Buffer Address Generator DTB Load Silo Multiplier Adder Shifter Logic Box IRF Prefetcher Resource Conflict PC Calculation ITB Pipeline Control Multiplier/Adder Divider FRF

32 31 Alpha Subsystems The main subsystems of the Alpha are: ICache instruction cache, 8 kbytes DCache data cache, 8 kbytes ABOX address generation unit performs all address calculations contains a 32 entry data translation buffer contains internal processor registers

33 32 Other Alpha Subsystems Other Alpha subsystems include: IBOX issues two instructions at a time maintains the integer pipeline contains branch prediction logic, interrupt logic,... EBOX integer execution unit contains 32 64-bit registers 64-bit adder integer multiplier FBOX floating point unit 32 64-bit registers; multiplier and adder

34 33 Summary MIPS Architecture PowerPC Architecture Alpha Architecture It wasn’t so bad after all


Download ppt "1 Computer Architecture & Assembly Language Spring 2001 Dr. Richard Spillman Lecture 24 –RISC II."

Similar presentations


Ads by Google