Asanovic/Devadas Spring 2002 6.823 Microprogramming Krste Asanovic Laboratory for Computer Science Massachusetts Institute of Technology.

Slides:

Advertisements

Similar presentations

Control Unit Implemntation

Advertisements

Chapter 8: Central Processing Unit

Fall 2012SYSC 5704: Elements of Computer Systems 1 MicroArchitecture Murdocca, Chapter 5 (selected parts) How to read Chapter 5.

CS1104: Computer Organisation School of Computing National University of Singapore.

Topics covered: CPU Architecture CSE 243: Introduction to Computer Architecture and Hardware/Software Interface.

OMSE 510: Computing Foundations 4: The CPU!

1 ITCS 3181 Logic and Computer Systems 2014 B. Wilkinson Slides8.ppt Modification date: Nov 3, 2014 Random Logic Approach The approach described so far.

Microprogramming. S 2/e C D A Computer Systems Design and Architecture Second Edition© 2004 Prentice Hall Microprogramming Main Points/Terminology Difference.

CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture MIPS Review & Pipelining I Steve Ko Computer Sciences and Engineering University at Buffalo.

MIPS Overview (with comparisons to x86)

CS-447– Computer Architecture Lecture 12 Multiple Cycle Datapath

COMP3221: Microprocessors and Embedded Systems Lecture 2: Instruction Set Architecture (ISA) Lecturer: Hui Wu Session.

Chapter 16 Control Unit Operation No HW problems on this chapter. It is important to understand this material on the architecture of computer control units,

January 21, 2010CS152 Spring 2010 CS 152 Computer Architecture and Engineering Lecture 2 - Simple Machine Implementations Krste Asanovic Electrical Engineering.

The Processor Data Path & Control Chapter 5 Part 3 - Microprogrammed Control Unit N. Guydosh 3/1/04+

January 24, 2011CS152 Spring 2011 CS 152 Computer Architecture and Engineering Lecture 2 - Simple Machine Implementations Krste Asanovic Electrical Engineering.

CS 161Computer Architecture Chapter 5 Lecture 12

1 COMP541 Sequencing – III (Sequencing a Computer) Montek Singh April 9, 2007.

CS 152 Computer Architecture and Engineering Lecture 2 - Simple Machine Implementations Krste Asanovic Electrical Engineering and Computer Sciences University.

Chapter 16 Control Unit Implemntation. A Basic Computer Model.

Preparation for Midterm Binary Data Storage (integer, char, float pt) and Operations, Logic, Flip Flops, Switch Debouncing, Timing, Synchronous / Asynchronous.

CPEN Digital System Design Chapter 10 – Instruction SET Architecture (ISA) © Logic and Computer Design Fundamentals, 4 rd Ed., Mano Prentice Hall.

Chapter 15 IA 64 Architecture Review Predication Predication Registers Speculation Control Data Software Pipelining Prolog, Kernel, & Epilog phases Automatic.

S. Barua – CPSC 440 CHAPTER 5 THE PROCESSOR: DATAPATH AND CONTROL Goals – Understand how the various.

January 24, 2012CS152 Spring 2012 CS 152 Computer Architecture and Engineering Lecture 2 - Simple Machine Implementations Krste Asanovic Electrical Engineering.

Cisc Complex Instruction Set Computing By Christopher Wong 1.

January 26, 2012CS152, Spring 2012 CS 152 Computer Architecture and Engineering Lecture 3 - From CISC to RISC Krste Asanovic Electrical Engineering and.

Asanovic/Devadas Spring Simple Instruction Pipelining Krste Asanovic Laboratory for Computer Science Massachusetts Institute of Technology.

Electrical and Computer Engineering

ECE/CS 552: Microprogramming and Exceptions Instructor: Mikko H Lipasti Fall 2010 University of Wisconsin-Madison Lecture notes based on set created by.

Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.

Introduction to Computer Organization and Architecture Micro Program ภาษาเครื่อง ไมโครโปรแกรม.

Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.

Chapter 16 Micro-programmed Control

1 Computer Architecture Part II-B: CPU Instruction Set.

1 Control Unit Operation and Microprogramming Chap 16 & 17 of CO&A Dr. Farag.

EECS 322: Computer Architecture

5-1 Chapter 5—Processor Design—Advanced Topics Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan Chapter.

PART 6: (1/2) Enhancing CPU Performance CHAPTER 16: MICROPROGRAMMED CONTROL 1.

Ted Pedersen – CS 3011 – Chapter 10 1 A brief history of computer architectures CISC – complex instruction set computing –Intel x86, VAX –Evolved from.

CS 152 Computer Architecture and Engineering Lecture 3 - From CISC to RISC Krste Asanovic Electrical Engineering and Computer Sciences University of California.

1/29/2013 CS152, Spring 2013 CS 152 Computer Architecture and Engineering Lecture 3 - From CISC to RISC Krste Asanovic Electrical Engineering and Computer.

© Krste Asanovic, 2015CS252, Fall 2015, Lecture 3 CS252 Graduate Computer Architecture Fall 2015 Lecture 3: CISC versus RISC Krste Asanovic

Krste Asanovic Electrical Engineering and Computer Sciences

Asanovic/Devadas Spring Complex Instruction Set Evolution in the Sixties: Stack and GPR Architectures Krste Asanovic Laboratory for Computer.

Lecture 15 Microarchitecture Level: Level 1. Microarchitecture Level The level above digital logic level. Job: to implement the ISA level above it. The.

Overview of Control Hardware Development

Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.

1  1998 Morgan Kaufmann Publishers Value of control signals is dependent upon: –what instruction is being executed –which step is being performed Use.

Basic Elements of Processor ALU Registers Internal data pahs External data paths Control Unit.

1 CORPORATE INSTITUTE OF SCIENCE & TECHNOLOGY, BHOPAL DEPARTMENT OF ELECTRONICS & COMMUNICATIONS MICRO CODED CONTROLLER - PROF. RAKESH K. JHA.

MICROPROGRAMMED CONTROL

1/27/2016 CS152, Spring 2016 CS 152 Computer Architecture and Engineering Lecture 3 - From CISC to RISC Dr. George Michelogiannakis EECS, University of.

Copyright © 2005 – Curt Hill MicroProgramming Programming at a different level.

1  2004 Morgan Kaufmann Publishers No encoding: –1 bit for each datapath operation –faster, requires more memory (logic) –used for Vax 780 — an astonishing.

CDA 3101 Spring 2016 Introduction to Computer Organization Microprogramming and Exceptions 08 March 2016.

Design a MIPS Processor (II)

CS161 – Design and Architecture of Computer Systems

Krste Asanovic Electrical Engineering and Computer Sciences

Computer Organization & Design Microcode for Control Sec. 5

Dr. George Michelogiannakis EECS, University of California at Berkeley

ECE/CS 552: Microprogramming and Exceptions

Processor Organization and Architecture

CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Lecture 2 - Simple Machine Implementations Krste Asanovic Electrical.

Multi-Cycle Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

Computer Architecture

Presentation transcript:

Asanovic/Devadas Spring Microprogramming Krste Asanovic Laboratory for Computer Science Massachusetts Institute of Technology

Asanovic/Devadas Spring Instruction Set Architecture (ISA) versus Implementation  ISA is the hardware/software interface Defines set of programmer visible state Defines instruction format (bit encoding) and instruction semantics Examples: DLX, x86, IBM 360, JVM  Many possible implementations of one ISA 360 implementations: model 30 (c. 1964), z900 (c. 2001) x86 implementations: 8086 (c. 1978), 80186, 286, 386, 486,Pentium, Pentium Pro, Pentium-4 (c. 2000), AMD Athlon, Transmeta Crusoe, SoftPC DLX implementations: microcoded, pipelined, superscalar JVM: HotSpot, PicoJava, ARM Jazelle,...

Asanovic/Devadas Spring ISA to Microarchitecture Mapping  ISA often designed for particular microarchitectural style, e.g., CISC ISAs designed for microcoded implementation RISC ISAs designed for hardwired pipelined implementation VLIW ISAs designed for fixed latency in-order pipelines JVM ISA designed for software interpreter  But ISA can be implemented in any microarchitectural style Pentium-4: hardwired pipelined CISC (x86) machine (with somemicrocode support) This lecture: a microcoded RISC (DLX) machine Intel will probably eventually have a dynamically scheduled out- of-order VLIW (IA-64) processor PicoJava: A hardware JVM processor

Asanovic/Devadas Spring Microcoded Microarchitecture Microcode instructions fixed in ROM inside microcontroller Memory (RAM) holds user program written using macrocode instructions (e.g., DLX, x86, etc.) busy? zero? opcode μcontroller Datapath Memory enMem MemWrt Data Addr

Asanovic/Devadas Spring A Bus-based Datapath for DLX Opcode RegSel busy zero IR ExtSel enImm Imm Ext ALU control enALU ALU Bus 32 GPRs + PC bit Reg Memory addr data ldIR OpSel ldA ldB ldMA MA RegWrt enReg MemWrt enMem Microinstruction: register to register transfer (17 control signals) MA ← PC means RegSel = PC; enReg=yes; ldMA= yes B ← Reg[rf2] means RegSel = rf2; enReg=yes; ldB = yes

Asanovic/Devadas Spring Instruction Execution Execution of a DLX instruction involves 1.1instruction fetch 2.decode and register fetch 3.ALU operation 4.memory operation (optional) 5.write back to register file (optional) and the computation of the address of the next instruction

Asanovic/Devadas Spring Microprogram Fragments instr fetch: MA ← PC IR ← Memory A ← PC PC ← A + 4 dispatch on OPcode ALU: ALUi: A ← Reg[rf1] B ← Reg[rf2] Reg[rf3] ← func(A,B) do instruction fetch A ← Reg[rf1] B ← Imm sign extention... Reg[rf2] ← Opcode(A,B) do instruction fetch can be treated as a macro

Asanovic/Devadas Spring Microprogram Fragments (cont.) LW: J: beqz: bz-taken: A ← Reg[rf1] B ← Imm MA ← A + B Reg[rf2] ← Memory do instruction fetch A ← PC B ← Imm PC ← A + B do instruction fetch A ← Reg[rf1] If zero?(A) then go to bz-taken do instruction fetch A ← PC B ← Imm PC ← A + B do instruction fetch

Asanovic/Devadas Spring DLX Microcontroller: first attempt Opcode zero? Busy (memory) latching the inputs may cause a one-cycle delay μPC (state) μProgram ROM 2 (opcode+status+s) words word = (control+s) bits ROM Size ? How big is “s”? 17 Control Signals next state addr data

Asanovic/Devadas Spring Microprogram in the ROM worksheet State Op zero? busy Control points next-state

Asanovic/Devadas Spring Microprogram in the ROM State Op zero? busy Control points next-state

Asanovic/Devadas Spring Microprogram in the ROM Cont. State Op zero? busy Control points next-state

Asanovic/Devadas Spring Size of Control Store status & opcode Control ROM μPC next μPC Control signals size = 2 (w+s) x (c + s) DLX w = 6+2 c = 17 s = ? no. of steps per opcode = 4 to 6 + fetch-sequence no. of states ≈ (4 steps per op-group ) x op-groups + common sequences = 4 x states = 42 states ⇒ s = 6 Control ROM = 2 (8+6) x 23 bits ≈ 48 Kbytes

Asanovic/Devadas Spring Reducing Size of Control Store Control store has to be fast ⇒ expensive  Reduce the ROM height (= address bits) ⇒ reduce inputs by extra external logic each input bit doubles the size of the control store ⇒ reduce states by grouping opcodes find common sequences of actions ⇒ condense input status bits combine all exceptions into one, i.e., exception/no-exception  Reduce the ROM width ⇒ restrict the next-state encoding Next, Dispatch on opcode, Wait for memory,... ⇒ encode control signals (vertical microcode)

Asanovic/Devadas Spring DLX Controller V2 Opcode ext absolute Reduced ROM height by encoding inputs μPC (state) jump logic Control ROM 17 Control Signals μJumpType (next, spin, fetch, dispatch, feqz, fnez ) Reduce ROM width by encoding next-state μPC μPC+1 μPCSrc zero buzy

Asanovic/Devadas Spring Jump Logic μPCSrc = Case μJumpTypes next ⇒ μPC+1 spin ⇒ if (busy) then μPC else μPC+1 fetch ⇒ absolute dispatch ⇒ op-group feqz ⇒ if (zero) then absolute else μPC+1 fnez ⇒ if (zero) then μPC+1 else absolute

Asanovic/Devadas Spring Instruction Fetch & ALU: DLX-Controller-2 State Control points next-state fetch 0 MA ← PC next fetch 1 IR ← Memory spin fetch 2 A ← PC next fetch 3 PC ← A + 4 dispatch... ALU 0 A ← Reg[rf1] next ALU 1 B ← Reg[rf2] next ALU 2 Reg[rf3]← func(A,B) fetch ALUi 0 A ← Reg[rf1] next ALUi 1 B ← sExt16(Imm) next ALUi 2 Reg[rf3]← Op(A,B) fetch

Asanovic/Devadas Spring Load & Store: DLX-Controller-2 State Control points next-state LW 0 A ← Reg[rf1] next LW 1 B ← sExt16(Imm) next LW 2 MA ← A+B next LW 3 Reg[rf2] ← Memory spin LW 4 fetch SW 0 A ← Reg[rf1] next SW 1 B ← sExt 16 (Imm) next SW 2 MA ← A+B next SW 3 Memory ← Reg[rf2] spin SW 4 fetch

Asanovic/Devadas Spring Branches: DLX-Controller-2 State Control points next-state BEQZ 0 A ← Reg[rf1] next BEQZ 1 fnez BEQZ 2 A ← PC next BEQZ 3 B ← sExt16(Imm) next BEQZ 4 PC ← A+B fetch BNEZ 0 A ← Reg[rf1] next BNEZ 1 feqz BNEZ 2 A ← PC next BNEZ 3 B ← sExt16(Imm) next BNEZ 4 PC ← A+B fetch

Asanovic/Devadas Spring Jumps: DLX-Controller-2 State Control points next-state J 0 A ← PC next J 1 B ← sExt26(Imm) next J 2 PC ← A+B fetch JR 0 PC ←Reg[rf1] fetch JAL 0 A ← PC next JAL 1 Reg[31] ← A next JAL 2 B ← sExt26(Imm) next JAL 3 PC ← A+B fetch JALR 0 A ← PC next JALR 1 Reg[31] ← A next JALR 2 PC ←Reg[rf1] fetch

Asanovic/Devadas Spring Microprogramming in IBM 360  Only fastest models (75 and 95) were hardwired M30M40M50M65 Datapath width (bits) μinst width (bits) μcode size (K μinsts) μstore technology CCROSTCROSBCROS μstore cycle (ns) memore cycle (ns) Rental fee ($K/month)

Asanovic/Devadas Spring Horizontal vs Vertical μCode  Horizontal μcode has longer μinstructions Can specify multiple parallel operations per μinstruction Needs fewer steps to complete each macroinstruction Sparser encoding ⇒ more bits  Vertical μcode has more, narrower μinstructions In limit, only single datapath operation per μinstruction μcode branches require separate μinstruction More steps to complete each macroinstruction More compact ⇒ less bits  Nanocoding Tries to combine best of horizontal and vertical μcode Bits per μInstruction μInstructions

Asanovic/Devadas Spring Nanocoding  MC68000 had 17-bit μcode containing either 10-bit μjump or 9-bit nanoinstruction pointer Nanoinstructions were 68 bits wide, decoded to give 196 control signals Exploits recurring control signal patterns in μcode, e.g., ALU0 A ← Reg[rf1]... ALUi0 A ← Reg[rf1]... μPC (state) μcode ROM nanoaddress nanoinstruction ROM μcode next-state μaddress data 17 Control Signals

Asanovic/Devadas Spring Implementing Complex Instructions Opcode RegSel busy zero ExtSel enImm Imm Ext enALU ALU Bus 32 GPRs + PC bit Reg Memory addr data ldIR OpSel ldA ldB ldMA MA RegWrt enReg MemWrt enMem rf3 ← M[(rf1)] op (rf2) Reg- Memory-src ALU op M[(rf3)] ← M[(rf1)] op M[(rf2)] Reg- Memory-dst ALU op M[(rf3)] ← (rf1) op (rf2) Mem- Mem ALU op

Asanovic/Devadas Spring Mem-Mem ALU Instructions: DLX-Controller-2 Mem-Mem ALU op M[(rf3)] ← M[(rf1)] op M[(rf2)] ALUMM0 MA ← Reg[rf1] next ALUMM1 A ← Memory spin ALUMM2 MA ← Reg[rf2] next ALUMM3 B ← Memory spin ALUMM4 MA ← Reg[rf3] next ALUMM5 Memory ← func(A,B) spin ALUMM6 fetch Complex instructions usually do not require datapath modifications in a microprogrammed implementation --only extra space for the control program Implementing these instructions using a hardwired controller is difficult without datapath modifications

Asanovic/Devadas Spring Microcode Emulation  IBM initially miscalculated importance of software compatibility when introducing 360 series  Honeywell started effort to steal IBM 1401 customers by offering translation software (“Liberator”) for Honeywell H200 series machine  IBM retaliates with optional additional microcode for 360 series that can emulate IBM 1401 ISA, later extended for IBM 7000 series one popular program on 1401 was a 650 simulator, so somecustomers ran many 650 programs on emulated 1401s (650->1401->360)

Asanovic/Devadas Spring Microprogramming in the Seventies Thrived because:  Significantly faster ROMs than DRAMs were available  For complex instruction sets, datapath and controller were cheaper and simpler  New instructions, e.g., floating point, could be supported without datapath modifications  Fixing bugs in the controller was easier  ISA compatibility across various models could be achieved easily and cheaply Except for cheapest and fastest machines, all computers were microprogrammed

Asanovic/Devadas Spring Writable Control Store (WCS) Implement control store with SRAM not ROM MOS SRAM memories now almost as fast as control store (core memories/DRAMs were 10x slower) Bug-free microprograms difficult to write User-WCS provided as option on several minicomputers Allowed users to change microcode for each process User-WCS failed Little or no programming tools support Hard to fit software into small space Microcode control tailored to original ISA, less useful for others Large WCS part of processor state -expensive context switches Protection difficult if user can change microcode Virtual memory required restartable microcode

Asanovic/Devadas Spring Performance Issues Microprogrammed control ⇒ multiple cycles per instruction Cycle time ? t C > max(t reg-reg, t ALU, t μROM, t RAM ) Given complex control, t ALU & t RAM can be broken into multiple cycles. However, t μROM cannot be broken down. Hence t C > max(t reg-reg, t μROM ) Suppose 10 * t μROM < t RAM good performance, relative to the single-cycle hardwired implementation, can be achieved even with a CPI of 10

Asanovic/Devadas Spring VLSI & Microprogramming By late seventies technology assumption about ROM & RAM speed became invalid micromachines became more complicated to overcome slower ROM, micromachines were pipelined complex instruction sets led to the need for subroutine and call stacks in μcode. need for fixing bugs in control programs was in conflict with read-only nature of μROM ⇒ WCS (B1700, QMachine, Intel432, …) introduction of caches and buffers, especially for instructions, made multiple-cycle execution of reg-reg instructions unattractive

Asanovic/Devadas Spring Modern Usage Microprogramming is far from extinct Played a crucial role in micros of the Eighties, Motorola 68K series Intel 386 and 486 Microcode is present in most modern CISC micros in an assisting role (e.g. AMD Athlon, Intel Pentium-4) Most instructions are executed directly, i.e., with hard-wired control Infrequently-used and/or complicated instructions invoke the microcode engine Patchable microcode common for post-fabrication bug fixes, e.g. Intel Pentiums load μcode patches at bootup