EECC550 - Shaaban #1 Lec # 1 Winter 2005 11-29-2005 Computer Organization EECC 550 Introduction: Modern Computer Design Levels, Components, Technology Trends, Register Transfer Notation (RTN). [Chapters 1, 2] Instruction Set Architecture (ISA) Characteristics and Classifications: CISC Vs. RISC. [Chapter 2] MIPS: An Example RISC ISA. Syntax, Instruction Formats, Addressing Modes, Encoding & Examples. [Chapter 2] Central Processor Unit (CPU) & Computer System Performance Measures. [Chapter 4] CPU Organization: Datapath & Control Unit Design. [Chapter 5] –MIPS Single Cycle Datapath & Control Unit Design. –MIPS Multicycle Datapath and Finite State Machine Control Unit Design. Microprogrammed Control Unit Design. [Chapter 5] –Microprogramming Project Midterm Review and Midterm Exam CPU Pipelining. [Chapter 6] The Memory Hierarchy: Cache Design & Performance. [Chapter 7] The Memory Hierarchy: Main & Virtual Memory. [Chapter 7] Input/Output Organization & System Performance Evaluation. [Chapter 8] Computer Arithmetic & ALU Design. [Chapter 3] If time permits. Final Exam. Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Week 10 Week 11
EECC550 - Shaaban #2 Lec # 1 Winter 2005 11-29-2005 Computing System History/Trends + Instruction Set Architecture (ISA) Fundamentals Computing Element Choices: –Computing Element Programmability –Spatial vs. Temporal Computing –Main Processor Types/Applications General Purpose Processor Generations The Von Neumann Computer Model CPU Organization (Design) Recent Trends in Computer Design/performance Hierarchy of Computer Architecture Hardware Description: Register Transfer Notation (RTN) Computer Architecture Vs. Computer Organization Instruction Set Architecture (ISA): –Definition and purpose –ISA Specification Requirements –Main General Types of Instructions –ISA Types and characteristics –Typical ISA Addressing Modes –Instruction Set Encoding –Instruction Set Architecture Tradeoffs –Complex Instruction Set Computer (CISC) –Reduced Instruction Set Computer (RISC) –Evolution of Instruction Set Architectures (Chapters 1, 2)
EECC550 - Shaaban #3 Lec # 1 Winter 2005 11-29-2005 Computing Element Choices General Purpose Processors (GPPs): Intended for general purpose computing (desktops, servers, clusters..) Application-Specific Processors (ASPs): Processors with ISAs and architectural features tailored towards specific application domains –E.g Digital Signal Processors (DSPs), Network Processors (NPs), Media Processors, Graphics Processing Units (GPUs), Vector Processors???... Co-Processors: A hardware (hardwired) implementation of specific algorithms with limited programming interface (augment GPPs or ASPs) Configurable Hardware: –Field Programmable Gate Arrays (FPGAs) –Configurable array of simple processing elements Application Specific Integrated Circuits (ASICs): A custom VLSI hardware solution for a specific computational task The choice of one or more depends on a number of factors including: - Type and complexity of computational algorithm (general purpose vs. Specialized) - Desired level of flexibility/ - Performance requirements programmability - Development cost/time - System cost - Power requirements - Real-time constrains The main goal of this course is the study of fundamental design techniques for General Purpose Processors
EECC550 - Shaaban #4 Lec # 1 Winter 2005 11-29-2005 Computing Element Choices Performance Flexibility General Purpose Processors (GPPs): Application-Specific Processors (ASPs) Co-Processors Application Specific Integrated Circuits (ASICs) Configurable Hardware - Type and complexity of computational algorithms (general purpose vs. Specialized) - Desired level of flexibility - Performance - Development cost - System cost - Power requirements - Real-time constrains Selection Factors: Specialization, Development cost/time Performance/Chip Area/Watt (Computational Efficiency) Programmability / The main goal of this course is the study of fundamental design techniques for General Purpose Processors Processor : Programmable computing element that runs programs written using a pre-defined set of instructions
EECC550 - Shaaban #5 Lec # 1 Winter 2005 11-29-2005 Computing Element Programmability Computes one function (e.g. FP-multiply, divider, DCT) Function defined at fabrication time e.g hardware (ASICs) Computes “any” computable function (e.g. Processors) Function defined after fabrication Fixed Function: Programmable: Parameterizable Hardware: Performs limited “set” of functions e.g. Co-Processors Processor = Programmable computing element that runs programs written using pre-defined instructions Computing Element Choices:
EECC550 - Shaaban #6 Lec # 1 Winter 2005 11-29-2005 SpatialTemporal Processor Instructions (using hardware) (using software/program running on a processor) Processor = Programmable computing element that runs programs written using a pre-defined set of instructions Spatial vs. Temporal Computing Computing Element Choices:
EECC550 - Shaaban #7 Lec # 1 Winter 2005 11-29-2005 Main Processor Types/Applications General Purpose Processors (GPPs) - high performance. –RISC or CISC: Intel P4, IBM Power4, SPARC, PowerPC, MIPS... –Used for general purpose software –Heavy weight OS - Windows, UNIX –Workstations, Desktops (PC’s), Clusters Embedded processors and processor cores –e.g: Intel XScale, ARM, 486SX, Hitachi SH7000, NEC V800... –Often require Digital signal processing (DSP) support or other application-specific support (e.g network, media processing) –Single program –Lightweight, often realtime OS or no OS –Examples: Cellular phones, consumer electronics.. (e.g. CD players) Microcontrollers –Extremely cost/power sensitive –Single program –Small word size - 8 bit common –Highest volume processors by far –Examples: Control systems, Automobiles, toasters, thermostats,... Increasing Cost/Complexity Increasing volume Examples of Application-Specific Processors The main goal of this course is the study of fundamental design techniques for General Purpose Processors
EECC550 - Shaaban #8 Lec # 1 Winter 2005 11-29-2005 Processor Cost Performance Microprocessors Performance is everything & Software rules Embedded processors Microcontrollers Cost is everything Application specific architectures for performance GPPs Real-time constraints Specialized applications Low power/cost constraints Chip Area, Power complexity The Processor Design Space Processor = Programmable computing element that runs programs written using a pre-defined set of instructions The main goal of this course is the study of fundamental design techniques for General Purpose Processors
EECC550 - Shaaban #9 Lec # 1 Winter 2005 11-29-2005 General Purpose Processor/Computer System Generations Classified according to implementation technology: The First Generation, 1946-59: Vacuum Tubes, Relays, Mercury Delay Lines: –ENIAC (Electronic Numerical Integrator and Computer): First electronic computer, 18000 vacuum tubes, 1500 relays, 5000 additions/sec (1944). –First stored program computer: EDSAC (Electronic Delay Storage Automatic Calculator), 1949. The Second Generation, 1959-64: Discrete Transistors. –e.g. IBM Main frames The Third Generation, 1964-75: Small and Medium-Scale Integrated (MSI) Circuits. –e.g Main frames (IBM 360), mini computers (DEC PDP-8, PDP-11). The Fourth Generation, 1975-Present: The Microcomputer. VLSI-based Microprocessors (single-chip processor) –First microprocessor: Intel’s 4-bit 4004 (2300 transistors), 1970. –Personal Computer (PCs), laptops, PDAs, servers, clusters … –Reduced Instruction Set Computer (RISC) 1984 Common factor among all generations: All target the The Von Neumann Computer Model or paradigm
EECC550 - Shaaban #10 Lec # 1 Winter 2005 11-29-2005 The Von-Neumann Computer Model Partitioning of the programmable computing engine into components: –Central Processing Unit (CPU): Control Unit (instruction decode, sequencing of operations), Datapath (registers, arithmetic and logic unit, buses). –Memory: Instruction (program) and operand (data) storage. –Input/Output (I/O). –The stored program concept: Instructions from an instruction set are fetched from a common memory and executed one at a time. - Memory (instructions, data) Control Datapath registers ALU, buses CPU Computer System Input Output I/O Devices Neumann computing Major CPU Performance Limitation: The Von Neumann computing model implies sequential execution one instruction at a time
EECC550 - Shaaban #11 Lec # 1 Winter 2005 11-29-2005 Instruction Fetch Instruction Decode Operand Fetch Execute Result Store Next Instruction Obtain instruction from program storage Determine required actions and instruction size Locate and obtain operand data Compute result value or status Deposit results in storage for later use Determine successor or next instruction Neumann computing model Major CPU Performance Limitation: The Von Neumann computing model implies sequential execution one instruction at a time Generic CPU Machine Instruction Processing Steps (memory) (Implied by The Von Neumann Computer Model)
EECC550 - Shaaban #12 Lec # 1 Winter 2005 11-29-2005 Hardware Components of Computer Systems Processor (active) Computer Control Unit Datapath Memory (passive) (where programs, data live when running) Devices Input Output Keyboard, Mouse, etc. Display, Printer, etc. Disk Five classic components of all computers: 1. Control Unit; 2. Datapath; 3. Memory; 4. Input; 5. Output 1. Control Unit; 2. Datapath; 3. Memory; 4. Input; 5. Output } Processor } I/O
EECC550 - Shaaban #13 Lec # 1 Winter 2005 11-29-2005 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional Units (FUs): –(e.g., Registers, ALU, Shifters, Logic Units,...) –Ways in which these components are interconnected (buses connections, multiplexors, etc.). –How information flows between components. Control Unit Design: –Logic and means by which such information flow is controlled. –Control and coordination of FUs operation to realize the targeted Instruction Set Architecture to be implemented (can either be implemented using a finite state machine or a microprogram). Hardware description with a suitable language, possibly using Register Transfer Notation (RTN).
EECC550 - Shaaban #14 Lec # 1 Winter 2005 11-29-2005 A Typical Microprocessor Layout: The Intel Pentium Classic Control Unit Datapath First Level of Memory (Cache) 1993 - 1997 60MHz - 233 MHz
EECC550 - Shaaban #15 Lec # 1 Winter 2005 11-29-2005 A Typical Microprocessor Layout: The Intel Pentium Classic Control Unit Datapath First Level of Memory (Cache) 1993 - 1997 60MHz - 233 MHz
EECC550 - Shaaban #16 Lec # 1 Winter 2005 11-29-2005 Computer System Components SDRAM PC100/PC133 100-133MHZ 64-128 bits wide 2-way inteleaved ~ 900 MBYTES/SEC )64bit) Double Date Rate (DDR) SDRAM PC3200 200 MHZ DDR 64-128 bits wide 4-way interleaved ~3.2 GBYTES/SEC (one 64bit channel) ~6.4 GBYTES/SEC (two 64bit channels) RAMbus DRAM (RDRAM) 400MHZ DDR 16 bits wide (32 banks) ~ 1.6 GBYTES/SEC CPU Caches Front Side Bus (FSB) I/O Devices: Memory Controllers adapters Disks Displays Keyboards Networks NICs I/O Buses Memory Controller Example: PCI, 33-66MHz 32-64 bits wide 133-528 MBYTES/SEC PCI-X 133MHz 64 bit 1024 MBYTES/SEC CPU Core 1 GHz - 3.8 GHz 4-way Superscaler RISC or RISC-core (x86): Deep Instruction Pipelines Dynamic scheduling Multiple FP, integer FUs Dynamic branch prediction Hardware speculation L1 L2 L3 Memory Bus All Non-blocking caches L1 16-128K 1-2 way set associative (on chip), separate or unified L2 256K- 2M 4-32 way set associative (on chip) unified L3 2-16M 8-32 way set associative (off or on chip) unified Examples: Alpha, AMD K7: EV6, 200-400 MHz Intel PII, PIII: GTL+ 133 MHz Intel P4 800 MHz North Bridge South Bridge Chipset Off or On-chip Current Standard I/O Subsystem
EECC550 - Shaaban #17 Lec # 1 Winter 2005 11-29-2005 Performance Increase of Workstation-Class Microprocessors 1987-1997 Integer SPEC92 Performance > 100x performance increase in one decade
EECC550 - Shaaban #18 Lec # 1 Winter 2005 11-29-2005 Microprocessor Transistor Count Growth Rate Moore’s Law: 2X transistors/Chip Every 1.5 years (circa 1970) Alpha 21264: 15 million Pentium Pro: 5.5 million PowerPC 620: 6.9 million Alpha 21164: 9.3 million Sparc Ultra: 5.2 million Moore’s Law 2300 Currently > 1 Billion ~ 500,000x transistor density increase in the last 35 years
EECC550 - Shaaban #19 Lec # 1 Winter 2005 11-29-2005 year size(Megabit) 19800.0625 19830.25 19861 19894 199216 199664 1999256 2000 1024 1.55X/yr, or doubling every 1.6 years Increase of Capacity of VLSI Dynamic RAM (DRAM) Chips Moore’s Law) (Also follows Moore’s Law) ~ 17,000x DRAM chip capacity increase in 20 years 64k bit 256k bit 1 M bit 16 M bit 1024 M bit = 1 G bit
EECC550 - Shaaban #20 Lec # 1 Winter 2005 11-29-2005 Computer Technology Trends: Evolutionary but Rapid Change Processor: –1.5-1.6 performance improvement every year; Over 100X performance in last decade. Memory: –DRAM capacity: > 2x every 1.5 years; 1000X size in last decade. –Cost per bit: Improves about 25% or more per year. –Only 15-25% performance improvement per year. Disk: –Capacity: > 2X in size every 1.5 years. –Cost per bit: Improves about 60% per year. –200X size in last decade. –Only 10% performance improvement per year, due to mechanical limitations. Expected State-of-the-art PC by end of year 2005 : –Processor clock speed: > 4000 MegaHertz (4 Giga Hertz) –Memory capacity: > 4000 MegaByte (4 Giga Bytes) –Disk capacity:> 500 GigaBytes (0.5 Tera Bytes)
EECC550 - Shaaban #21 Lec # 1 Winter 2005 11-29-2005 A Simplified View of The Software/Hardware Hierarchical Layers
EECC550 - Shaaban #22 Lec # 1 Winter 2005 11-29-2005 Hierarchy of Computer Architecture I/O systemInstr. Set Proc. Compiler Operating System Application Digital Design Circuit Design Instruction Set Architecture Firmware Datapath & Control Layout Software Hardware Software/Hardware Boundary High-Level Language Programs Assembly Language Programs Microprogram Register Transfer Notation (RTN) Logic Diagrams Circuit Diagrams Machine Language Program e.g. BIOS (Basic Input/Output System) e.g. BIOS (Basic Input/Output System) VLSI placement & routing (ISA) The ISA forms an abstraction layer that sets the requirements for both complier and CPU designers
EECC550 - Shaaban #23 Lec # 1 Winter 2005 11-29-2005 Levels of Program Representation High Level Language Program Assembly Language Program Machine Language Program Control Signal Specification Compiler Assembler Machine Interpretation temp = v[k]; v[k] = v[k+1]; v[k+1] = temp; lw$15,0($2) lw$16,4($2) sw$16,0($2) sw$15,4($2) 0000 1001 1100 0110 1010 1111 0101 1000 1010 1111 0101 1000 0000 1001 1100 0110 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111 °°°° ALUOP[0:3] <= InstReg[9:11] & MASK Register Transfer Notation (RTN) Microprogram MIPS Assembly Code Software Hardware
EECC550 - Shaaban #24 Lec # 1 Winter 2005 11-29-2005 A Hierarchy of Computer Design Level Name Modules Primitives Descriptive Media 1 Electronics Gates, FF’s Transistors, Resistors, etc. Circuit Diagrams 2 Logic Registers, ALU’s... Gates, FF’s …. Logic Diagrams 3 Organization Processors, Memories Registers, ALU’s … Register Transfer Notation (RTN) 4 Microprogramming Assembly Language Microinstructions Microprogram 5 Assembly language OS Routines Assembly language Assembly Language programming Instructions Programs 6 Procedural Applications OS Routines High-level Language Programming Drivers.. High-level Languages Programs 7 Application Systems Procedural Constructs Problem-Oriented Programs Low Level - Hardware Firmware High Level - Software
EECC550 - Shaaban #25 Lec # 1 Winter 2005 11-29-2005 Hardware Description Hardware visualization: –Block diagrams (spatial visualization): Two-dimensional representations of functional units and their interconnections. –Timing charts (temporal visualization): Waveforms where events are displayed vs. time. Register Transfer Notation (RTN): –A way to describe microoperations capable of being performed by the data flow (data registers, data buses, functional units) at the register transfer level of design (RT). –Also describes conditional information in the system which cause operations to come about. –A “shorthand” notation for microoperations. Hardware Description Languages: –Examples: VHDL: VHSIC (Very High Speed Integrated Circuits) Hardware Description Language, Verilog.
EECC550 - Shaaban #26 Lec # 1 Winter 2005 11-29-2005 Register Transfer Notation (RTN) Dependent RTN: When RTN is used after the data flow is assumed to be frozen. No data transfer can take place over a path that does not exist. No statement implies a function the data flow hardware is incapable of performing. Independent RTN: Describe actions on registers without regard to nonexistence of direct paths or intermediate registers. No predefined data flow. The general format of an RTN statement: Conditional information: Action1; Action2 The conditional statement is often an AND of literals (status and control signals) in the system (a p-term). The p-term is said to imply the action. Possible actions include transfer of data to/from registers/memory data shifting, functional unit operations etc. i.e No datapath design yet
EECC550 - Shaaban #27 Lec # 1 Winter 2005 11-29-2005 RTN Statement Examples A B or R[A] R[B] where R[X] mean the content of register X –A copy of the data in entity B (typically a register) is placed in Register A –If the destination register has fewer bits than the source, the destination accepts only the lowest-order bits. –If the destination has more bits than the source, the value of the source is sign extended to the left. CTL T0: A = B –The contents of B are presented to the input of combinational circuit A –This action to the right of “:” takes place when control signal CTL is active and signal T0 is active.
EECC550 - Shaaban #28 Lec # 1 Winter 2005 11-29-2005 RTN Statement Examples MD M[MA] or MD Mem[MA] –Means the memory data (MD) register receives the contents of the main memory (M or Mem) as addressed from the Memory Address (MA) register. AC(0), AC(1), AC(2), AC(3) –Register fields are indicated by parenthesis. –The concatenation operation is indicated by a comma. –Bit AC(0) is bit 0 of the accumulator AC –The above expression means AC bits 0, 1, 2, 3 –More commonly represented by AC(0-3) E T3: CLRWRITE –The control signal CLRWRITE is activated when the condition E T3 is active.
EECC550 - Shaaban #29 Lec # 1 Winter 2005 11-29-2005 Computer Architecture Vs. Computer Organization The term Computer architecture is sometimes erroneously restricted to computer instruction set design, with other aspects of computer design called implementation. More accurate definitions: – Instruction Set Architecture (ISA): The actual programmer- visible instruction set and serves as the boundary or interface between the software and hardware. –Implementation of a machine has two components: Organization: includes the high-level aspects of a computer’s design such as: The memory system, the bus structure, the internal CPU unit which includes implementations of arithmetic, logic, branching, and data transfer operations. Hardware: Refers to the specifics of the machine such as detailed logic design and packaging technology. In general, Computer Architecture refers to the above three aspects: 1- Instruction set architecture 2- Organization. 3- Hardware. The ISA forms an abstraction layer that sets the requirements for both complier and CPU designers Hardware design and implementation CPU Micro- architecture (CPU design)
EECC550 - Shaaban #30 Lec # 1 Winter 2005 11-29-2005 Instruction Set Architecture (ISA) “... the attributes of a [computing] system as seen by the programmer, i.e. the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls the logic design, and the physical implementation.” – Amdahl, Blaaw, and Brooks, 1964. The instruction set architecture is concerned with: Organization of programmable storage (memory & registers): Includes the amount of addressable memory and number of available registers. Data Types & Data Structures: Encodings & representations. Instruction Set: What operations are specified. Instruction formats and encoding. Modes of addressing and accessing data items and instructions Exceptional conditions. The ISA forms an abstraction layer that sets the requirements for both complier and CPU designers
EECC550 - Shaaban #31 Lec # 1 Winter 2005 11-29-2005 Computer Instruction Sets Regardless of computer type, CPU structure, or hardware organization, every machine instruction must specify the following: –Opcode: Which operation to perform. Example: add, load, and branch. –Where to find the operand or operands, if any: Operands may be contained in CPU registers, main memory, or I/O ports. –Where to put the result, if there is a result: May be explicitly mentioned or implicit in the opcode. –Where to find the next instruction: Without any explicit branches, the instruction to execute is the next instruction in the sequence or a specified address in case of jump or branch instructions. Opcode = Operation Code
EECC550 - Shaaban #32 Lec # 1 Winter 2005 11-29-2005 Instruction Set Architecture (ISA) Specification Requirements Instruction Fetch Instruction Decode Operand Fetch Execute Result Store Next Instruction Instruction Format or Encoding: – How is it decoded? Location of operands and result (addressing modes): – Where other than memory? – How many explicit operands? – How are memory operands located? – Which can or cannot be in memory? Data type and Size. Operations – What are supported Successor instruction: – Jumps, conditions, branches. Fetch-decode-execute is implicit.
EECC550 - Shaaban #33 Lec # 1 Winter 2005 11-29-2005 Main General Types of Instructions Data Movement Instructions, possible variations: –Memory-to-memory. –Memory-to-CPU register. –CPU-to-memory. –Constant-to-CPU register. –CPU-to-output. –etc. Arithmetic Logic Unit (ALU) Instructions. Branch (Control) Instructions: –Unconditional jumps. –Conditional branches.
EECC550 - Shaaban #34 Lec # 1 Winter 2005 11-29-2005 Examples of Data Movement Instructions InstructionMeaningMachine MOV A,BMove 16-bit data from memory loc. A to loc. B VAX11 lwz R3,AMove 32-bit data from memory loc. A to register R3PPC601 li $3,455Load the 32-bit integer 455 into register $3MIPS R3000 MOV AX,BXMove 16-bit data from register BX into register AXIntel X86 LEA.L (A0),A2Load the address pointed to by A0 into A2MC68000
EECC550 - Shaaban #35 Lec # 1 Winter 2005 11-29-2005 Examples of ALU Instructions InstructionMeaningMachine MULF A,B,CMultiply the 32-bit floating point values at mem. VAX11 locations A and B, and store result in loc. C nabs r3,r1Store the negative absolute value of register r1 in r2PPC601 ori $2,$1,255Store the logical OR of register $1 with 255 into $2MIPS R3000 SHL AX,4Shift the 16-bit value in register AX left by 4 bitsIntel X86 ADD.L D0,D1Add the 32-bit values in registers D0, D1 and store MC68000 the result in register D0
EECC550 - Shaaban #36 Lec # 1 Winter 2005 11-29-2005 Examples of Branch Instructions InstructionMeaningMachine BLBS A, TgtBranch to address Tgt if the least significantbit VAX11 at location A is set. bun r2Branch to location in r2 if the previous comparison PPC601 signaled that one or more values was not a number. Beq $2,$1,32Branch to location PC+4+32 if contents of $1 and $2 MIPS R3000 are equal. JCXZ AddrJump to Addr if contents of register CX = 0.Intel X86 BVS nextBranch to next if overflow flag in CC is set. MC68000
EECC550 - Shaaban #37 Lec # 1 Winter 2005 11-29-2005 Operation Types in The Instruction Set Operator Type Examples Arithmetic and logical Integer arithmetic and logical operations: add, or Data transfer Loads-stores (move on machines with memory addressing) Control Branch, jump, procedure call, and return, traps. System Operating system call/return, virtual memory management instructions... Floating point Floating point operations: add, multiply.... Decimal Decimal add, decimal multiply, decimal to character conversion String String move, string compare, string search Media The same operation performed on multiple data (e.g Intel MMX, SSE)
EECC550 - Shaaban #38 Lec # 1 Winter 2005 11-29-2005 Instruction Usage Example: Top 10 Intel X86 Instructions RankInteger Average Percent total executed 1 2 3 4 5 6 7 8 9 10 instruction load conditional branch compare store add and sub move register-register call return Total Observation: Simple instructions dominate instruction usage frequency. 22% 20% 16% 12% 8% 6% 5% 4% 1% 96% CISC to RISC observation
EECC550 - Shaaban #39 Lec # 1 Winter 2005 11-29-2005 Types of Instruction Set Architectures According To Operand Addressing Fields Memory-To-Memory Machines: –Operands obtained from memory and results stored back in memory by any instruction that requires operands. –No local CPU registers are used in the CPU datapath. –Include: The 4 Address Machine. The 3-address Machine. The 2-address Machine. The 1-address (Accumulator) Machine: –A single local CPU special-purpose register (accumulator) is used as the source of one operand and as the result destination. The 0-address or Stack Machine: –A push-down stack is used in the CPU. General Purpose Register (GPR) Machines: –The CPU datapath contains several local general-purpose registers which can be used as operand sources and as result destinations. –A large number of possible addressing modes. –Load-Store or Register-To-Register Machines: GPR machines where only data movement instructions (loads, stores) can obtain operands from memory and store results to memory. CISC to RISC observation (load-store simplifies CPU design)
EECC550 - Shaaban #40 Lec # 1 Winter 2005 11-29-2005 Types of Instruction Set Architectures Memory-To-Memory Machines: The 4-Address Machine No program counter (PC) or other CPU registers are used. Instruction encoding has four address fields to specify: –Location of first operand. - Location of second operand. –Place to store the result. - Location of next instruction. Memory Op1 Op2 Res Nexti :::: Op1Addr: Op2Addr: ResAddr: NextiAddr: + CPU Instruction: add Res, Op1, Op2, Nexti Meaning: Res Op1 + Op2 or more precise RTN: M[ResAddr] M[Op1Addr] + M[Op2Addr] NextiAddrResAddrOp1AddrOp2Addradd Bits: 8 24 24 24 24 Instruction Format (encoding) Opcode Which operation Where to put result Where to find next instruction Where to find operands Can address 2 24 bytes = 16 MBytes Instruction Size: 13 bytes
EECC550 - Shaaban #41 Lec # 1 Winter 2005 11-29-2005 A program counter (PC) is included within the CPU which points to the next instruction. No CPU storage (general-purpose registers). Types of Instruction Set Architectures Memory-To-Memory Machines: The 3-Address Machine Instruction: add Res, Op1, Op2 Meaning: Res Op1 + Op2 or more precise RTN: M[ResAddr] M[Op1Addr] + M[Op2Addr] PC PC + 10 ResAddrOp1AddrOp2Addradd Bits: 8 24 24 24 Instruction Format (encoding) Opcode Which operation Where to put result Where to find operands Memory Op1 Op2 Res Nexti :::: Op1Addr: Op2Addr: ResAddr: NextiAddr: + CPU Program Counter (PC) Where to find next instruction 24 Can address 2 24 bytes = 16 MBytes Increment PC Instruction Size: 10 bytes
EECC550 - Shaaban #42 Lec # 1 Winter 2005 11-29-2005 The 2-address Machine: Result is stored in the memory address of one of the operands. Types of Instruction Set Architectures Memory-To-Memory Machines: The 2-Address Machine Instruction: add Op2, Op1 Meaning: Op2 Op1 + Op2 or more precise RTN: M[Op2Addr] M[Op1Addr] + M[Op2Addr] PC PC + 7 Where to put result Op2AddrOp1Addradd Bits: 8 24 24 Instruction Format (encoding) Opcode Which operation Where to find operands Memory Op1 Op2,Res Nexti :::: Op1Addr: Op2Addr: NextiAddr: + CPU Program Counter (PC) Where to find next instruction 24 Increment PC Instruction Size: 7 bytes
EECC550 - Shaaban #43 Lec # 1 Winter 2005 11-29-2005 A single accumulator in the CPU is used as the source of one operand and result destination. Instruction: add Op1 Meaning: Acc Acc + Op1 or more precise RTN: Acc Acc + M[Op1Addr] PC PC + 4 Types of Instruction Set Architectures The 1-address (Accumulator) Machine Op1Addradd Bits: 8 24 Instruction Format (encoding) Opcode Which operation Where to find operand1 Memory Op1 Nexti :::: Op1Addr: NextiAddr: + CPU Program Counter (PC) Where to find next instruction 24 Accumulator Where to find operand2, and where to put result Increment PC Instruction Size: 4 bytes
EECC550 - Shaaban #44 Lec # 1 Winter 2005 11-29-2005 A push-down stack is used in the CPU. Types of Instruction Set Architectures The 0-address (Stack) Machine Instruction: push Op1 Meaning: TOS M[Op1Addr] Instruction: add Meaning: TOS TOS + SOS Instruction Format add Bits: 8 Opcode Instruction: pop Res Meaning: M[ResAddr] TOS Op1Addr push Bits: 8 24 Instruction Format Opcode Where to find operand ResAddr pop Bits: 8 24 Instruction Format Opcode Memory Destination CPU Program Counter (PC) 24 Memory Op1 Nexti :::: Op1Addr: NextiAddr: TOS SOS etc. Stack push Op2 Op2Addr: Res ResAddr: + add Op1 Op2, Res pop 8 4 Bytes 1 Byte 4 Bytes TOS = Top Entry in Stack SOS = Second Entry in Stack
EECC550 - Shaaban #45 Lec # 1 Winter 2005 11-29-2005 CPU contains several general-purpose registers which can be used as operand sources and result destination. Types of Instruction Set Architectures General Purpose Register (GPR) Machines Instruction: load R8, Op1 Meaning: R8 M[Op1Addr] PC PC + 5 + CPU Program Counter (PC) 24 Memory Op1 Nexti :::: Op1Addr: NextiAddr: R8 R7 R6 R5 R4 R3 R2 R1 Registers load add store Op1Addrload Bits: 8 3 24 Instruction Format Opcode Where to find operand1 R8 Instruction: add R2, R4, R6 Meaning: R2 R4 + R6 PC PC + 3 add Bits: 8 3 3 3 Instruction Format Opcode Des Operands R2R4R6 Instruction: store R2, Op2 Meaning: M[Op2Addr] R2 PC PC + 5 ResAddrstore Bits: 8 3 24 Instruction Format Opcode Destination R2 Here add instruction has three register specifier fields While load, store instructions have one register specifier field and one memory address specifier field Size = 4.375 bytes rounded up to 5 bytes Size = 2.125 bytes rounded up to 3 bytes Size = 4.375 bytes rounded up to 5 bytes
EECC550 - Shaaban #46 Lec # 1 Winter 2005 11-29-2005 Expression Evaluation Example with 3-, 2-, 1-, 0-Address, And GPR Machines For the expression A = (B + C) * D - E where A-E are in memory 0-Address Stack push B push C add push D mul push E sub pop A 8 instructions Code size: 23 bytes 5 memory accesses for data 3-Address add A, B, C mul A, A, D sub A, A, E 3 instructions Code size: 30 bytes 9 memory accesses for data 2-Address load A, B add A, C mul A, D sub A, E 4 instructions Code size: 28 bytes 12 memory accesses for data 1-Address Accumulator load B add C mul D sub E store A 5 instructions Code size: 20 bytes 5 memory accesses for data GPR Register-Memory load R1, B add R1, C mul R1, D sub R1, E store A, R1 5 instructions Code size: 25 bytes 5 memory accesses for data Load-Store load R1, B load R2, C add R3, R1, R2 load R1, D mul R3, R3, R1 load R1, E sub R3, R3, R1 store A, R3 8 instructions Code size: 34 bytes 5 memory accesses for data
EECC550 - Shaaban #48 Lec # 1 Winter 2005 11-29-2005 Addressing Modes Usage Example Displacement42% avg, 32% to 55% Immediate: 33% avg, 17% to 43% Register deferred (indirect): 13% avg, 3% to 24% Scaled: 7% avg, 0% to 16% Memory indirect: 3% avg, 1% to 6% Misc:2% avg, 0% to 3% 75% displacement & immediate 88% displacement, immediate & register indirect. Observation: In addition Register direct, Displacement, Immediate, Register Indirect addressing modes are important. For 3 programs running on VAX ignoring direct register mode: 75% 88% CISC to RISC observation (fewer addressing modes simplify CPU design)
EECC550 - Shaaban #49 Lec # 1 Winter 2005 11-29-2005 Displacement Address Size Example Avg. of 5 SPECint92 programs v. avg. 5 SPECfp92 programs 1% of addresses > 16-bits 12 - 16 bits of displacement needed Displacement Address Bits Needed CISC to RISC observation
EECC550 - Shaaban #50 Lec # 1 Winter 2005 11-29-2005 Instruction Set Encoding Considerations affecting instruction set encoding: –To have as many registers and addressing modes as possible. –The Impact of of the size of the register and addressing mode fields on the average instruction size and on the average program. –To encode instructions into lengths that will be easy to handle in the implementation. On a minimum to be a multiple of bytes. Fixed length encoding: Faster and easiest to implement in hardware. Variable length encoding: Produces smaller instructions. Hybrid encoding. e.g. Simplifies design of pipelined CPUs CISC to RISC observation
EECC550 - Shaaban #51 Lec # 1 Winter 2005 11-29-2005 Three Examples of Instruction Set Encoding Variable Length Encoding: VAX (1-53 bytes) Operations & no of operands Address specifier 1 Address field 1 Address specifier n Address field n OperationAddress field 1 Address field 2 Address field3 Fixed Length Encoding: MIPS, PowerPC, SPARC (all instructions are 4 bytes each) Operation Address Specifier Address field Operation Address Specifier 1 Address Specifier 2 Address field Operation Address Specifier Address field 1 Address field 2 Hybrid Encoding: IBM 360/370, Intel 80x86
EECC550 - Shaaban #52 Lec # 1 Winter 2005 11-29-2005 Instruction Set Architecture Tradeoffs 3-address machine: shortest code sequence; a large number of bits per instruction; large number of memory accesses. 0-address (stack) machine: Longest code sequence; shortest individual instructions; more complex to program. General purpose register machine (GPR): –Addressing modified by specifying among a small set of registers with using a short register address (all new ISAs since 1975). –Advantages of GPR: Low number of memory accesses. Faster, since register access is currently still much faster than memory access. Registers are easier for compilers to use. Shorter, simpler instructions. Load-Store Machines: GPR machines where memory addresses are only included in data movement instructions (loads/stores) between memory and registers (all new ISAs designed after 1980). CISC to RISC observation (load-store simplifies CPU design) Machine = CPU or ISA
EECC550 - Shaaban #53 Lec # 1 Winter 2005 11-29-2005 ISA Examples Machine Number of General Architecture year Purpose Registers EDSAC IBM 701 CDC 6600 IBM 360 DEC PDP-8 DEC PDP-11 Intel 8008 Motorola 6800 DEC VAX Intel 8086 Motorola 68000 Intel 80386 MIPS HP PA-RISC SPARC PowerPC DEC Alpha HP/Intel IA-64 AMD64 (EMT64) 1 8 16 1 8 1 16 1 16 8 32 128 16 accumulator load-store register-memory accumulator register-memory accumulator register-memory memory-memory extended accumulator register-memory load-store register-memory 1949 1953 1963 1964 1965 1970 1972 1974 1977 1978 1980 1985 1986 1987 1992 2001 2003
EECC550 - Shaaban #54 Lec # 1 Winter 2005 11-29-2005 Examples of GPR Machines number ofMax. number Max. number of Max. number memory addresses of operands allowed SPARC, MIPS PowerPC, ALPHA PowerPC, ALPHA Intel Intel 80386 Motorola 68000 Motorola 68000 VAX 2 or 3 2 or 3 VAX 03 12 For Arithmetic/Logic (ALU) Instructions (ISAs)
EECC550 - Shaaban #55 Lec # 1 Winter 2005 11-29-2005 Complex Instruction Set Computer (CISC) Emphasizes doing more with each instruction. Motivated by the high cost of memory and hard disk capacity when original CISC architectures were proposed: –When M6800 was introduced: 16K RAM = $500, 40M hard disk = $ 55, 000 –When MC68000 was introduced: 64K RAM = $200, 10M HD = $5,000 Original CISC architectures evolved with faster, more complex CPU designs, but backward instruction set compatibility had to be maintained. Wide variety of addressing modes: 14 in MC68000, 25 in MC68020 A number instruction modes for the location and number of operands: The VAX has 0- through 3-address instructions. Variable-length or hybrid instruction encoding is used. ISAs Circa 1980
EECC550 - Shaaban #56 Lec # 1 Winter 2005 11-29-2005 Example CISC ISAs Motorola 680X0 18 addressing modes: Data register direct. Address register direct. Immediate. Absolute short. Absolute long. Address register indirect. Address register indirect with postincrement. Address register indirect with predecrement. Address register indirect with displacement. Address register indirect with index (8-bit). Address register indirect with index (base). Memory inderect postindexed. Memory indirect preindexed. Program counter indirect with index (8-bit). Program counter indirect with index (base). Program counter indirect with displacement. Program counter memory indirect postindexed. Program counter memory indirect preindexed. Operand size: Range from 1 to 32 bits, 1, 2, 4, 8, 10, or 16 bytes. Instruction Encoding: Instructions are stored in 16-bit words. the smallest instruction is 2- bytes (one word). The longest instruction is 5 words (10 bytes) in length.
EECC550 - Shaaban #57 Lec # 1 Winter 2005 11-29-2005 Example CISC ISA: Intel Example CISC ISA: Intel 80386 12 addressing modes: Register. Immediate. Direct. Base. Base + Displacement. Index + Displacement. Scaled Index + Displacement. Based Index. Based Scaled Index. Based Index + Displacement. Based Scaled Index + Displacement. Relative. Operand sizes: Can be 8, 16, 32, 48, 64, or 80 bits long. Also supports string operations. Instruction Encoding: The smallest instruction is one byte. The longest instruction is 12 bytes long. The first bytes generally contain the opcode, mode specifiers, and register fields. The remainder bytes are for address displacement and immediate data.
EECC550 - Shaaban #58 Lec # 1 Winter 2005 11-29-2005 Reduced Instruction Set Computer (RISC) Focuses on reducing the number and complexity of instructions of the machine. Reduced number of cycles needed per instruction. – Goal: At least one instruction completed per clock cycle. Designed with CPU instruction pipelining in mind. Fixed-length instruction encoding. Only load and store instructions access memory. Simplified addressing modes. –Usually limited to immediate, register indirect, register displacement, indexed. Delayed loads and branches. Prefetch and speculative execution. Examples: MIPS, HP PA-RISC, SPARC, Alpha, PowerPC. ISAs Machine = CPU or ISA ~1984
EECC550 - Shaaban #59 Lec # 1 Winter 2005 11-29-2005 Example RISC ISA: PowerPC 8 addressing modes: Register direct. Immediate. Register indirect. Register indirect with immediate index (loads and stores). Register indirect with register index (loads and stores). Absolute (jumps). Link register indirect (calls). Count register indirect (branches). Operand sizes: Four operand sizes: 1, 2, 4 or 8 bytes. Instruction Encoding: Instruction set has 15 different formats with many minor variations. All are 32 bits in length.
EECC550 - Shaaban #60 Lec # 1 Winter 2005 11-29-2005 Example RISC ISA: HP Precision Architecture HP PA-RISC 7 addressing modes: Register Immediate Base with displacement Base with scaled index and displacement Predecrement Postincrement PC-relative Operand sizes: Five operand sizes ranging in powers of two from 1 to 16 bytes. Instruction Encoding: Instruction set has 12 different formats. All are 32 bits in length.
EECC550 - Shaaban #61 Lec # 1 Winter 2005 11-29-2005 Example RISC ISA: SPARC 5 addressing modes: Register indirect with immediate displacement. Register inderect indexed by another register. Register direct. Immediate. PC relative. Operand sizes: Four operand sizes: 1, 2, 4 or 8 bytes. Instruction Encoding: Instruction set has 3 basic instruction formats with 3 minor variations. All are 32 bits in length.
EECC550 - Shaaban #62 Lec # 1 Winter 2005 11-29-2005 Example RISC ISA: DEC Alpha AXP 4 addressing modes: Register direct. Immediate. Register indirect with displacement. PC-relative. Operand sizes: Four operand sizes: 1, 2, 4 or 8 bytes. Instruction Encoding: Instruction set has 7 different formats. All are 32 bits in length.
EECC550 - Shaaban #63 Lec # 1 Winter 2005 11-29-2005 RISC ISA Example: MIPS R3000 (32-bit) Instruction Categories: Load/Store. Computational. Jump and Branch. Floating Point (using coprocessor). Memory Management. Special. OP rs rt rdsafunct rs rt immediate jump target Instruction Encoding: 3 Instruction Formats, all 32 bits wide. R0 - R31 PC HI LO Registers 5 Addressing Modes: Register direct (arithmetic). Immedate (arithmetic). Base register + immediate offset (loads and stores). PC relative (branches). Pseudodirect (jumps) Operand Sizes: Memory accesses in any multiple between 1 and 4 bytes. MIPS is the target ISA for CPU design in this course
EECC550 - Shaaban #64 Lec # 1 Winter 2005 11-29-2005 Evolution of Instruction Set Architectures Single Accumulator (EDSAC 1949) Accumulator + Index Registers (Manchester Mark I, IBM 700 series 1953) Separation of Programming Model from Implementation High-level Language BasedConcept of an ISA Family (B5000 1963)(IBM 360 1964) General Purpose Register (GPR) Machines Complex Instruction Sets (CISC) Load/Store Architecture ( Reduced Instruction Set Computer (RISC) (Vax, Motorola 68000, Intel x86 1977-80) (CDC 6600, Cray 1 1963-76) (MIPS, SPARC, HP-PA, PowerPC,... 1984..)