Chapter 1 Microcomputers and Microprocessors

Slides:



Advertisements
Similar presentations
CHAPTER - ONE Computer History Development of the PC.
Advertisements

PHY 201 (Blum) Transistors and Logic Gates References:
 2002 Prentice Hall Hardware Basics: Inside The Box Chapter 2.
INTRODUCTION OF COMPUTER
1.21 Introduction to microprocessors KUEU 2135 / KBEB 2193 Mikropemproses dan Sistemnya.
Computer History.
MEMORY ORGANIZATION Memory Hierarchy Main Memory Auxiliary Memory
Chapter 01 Introduction Chapter 0 Introduction. Chapter 02 History of Computing - Early Computers Abacus (ancient orient, still in use) Slide rule (17C,
Main Memory Lecture 2 CSCI 1405, CSCI 1301 Introduction to Computer Science Fall 2009.
Introduction to Computer Systems
Computer Hardware.
Vacuum tubes Transistor 1948 –Smaller, Cheaper, Less heat dissipation, Made from Silicon (Sand) –Invented at Bell Labs –Shockley, Brittain, Bardeen ICs.
Enhanced A+ Guide to Managing and Maintaining Your PC Third Edition.
1 Chapter 4 The Central Processing Unit and Memory.
Computer Organization and Assembly language
Prepared by: Jasper Francisco. The Early Years 1  In the early years, before the computer was invented, there were several inventions of counting machine.
Department of Computer Engineering
Computer Architecture (Hardware Engineering) Dr. BEN CHOI Ph.D. in EE (Computer Engineering), The Ohio State University System Performance Engineer, Lucent.
The History of Computers
SECOND GENERATION COMPUTERS (ERA OF TRANSISTORS)
History of Computers By : Tyler Zimmerman. Internal Diagram of a typical PC.
Principles of Information Systems, Sixth Edition 1 Hardware: Input, Processing, and Output Devices Chapter 3.
Information Technology
CPS120: Introduction to Computer Science Introduction to Computers.
An overview of Computer System
Bipolar junction Transistor BJT. BJT Symbols نماد ترانزيستور collector base emitter collector base emitter npn bipolar transistorpnp bipolar transistor.
COMPUTER ARCHITECTURE. Recommended Text 1Computer Organization and Architecture by William Stallings 2Structured Computer Organisation Andrew S. Tanenbaum.
Computer Organization - 1. INPUT PROCESS OUTPUT List different input devices Compare the use of voice recognition as opposed to the entry of data via.
Computer Evolution. ENIAC - background Electronic Numerical Integrator And Computer Eckert and Mauchly University of Pennsylvania Trajectory tables for.
Computer Organization & Assembly Language © by DR. M. Amer.
Computer System Internal components - The processor - Main memory - I / O controllers - Buses External components (peripherals). These include: - keyboard.
Introduction to Microprocessors Manijeh Keshtgary Fall 2006 Shiraz University of Technology.
LBSC 690 Module 2 Architecture. Computer Explosion Last week examined explosive growth of computers. What has led to this growth? Reduction in cost. Reduction.
INTRODUCTION. This course is basically about silicon chip fabrication, the technologies used to manufacture ICs.
GENERATIONS OF COMPUTERS.
The computer is a purpose device that can be programed to carry out a finite set of arithmetic or logical operations. Computer can solve more than one.
University of Sargodha, Lahore Campus Prepared by Ali Saeed.
Parts and Operation of a Computer
Main memory Processor Bus Cache memory Figure 1.5.The processor cache.
The four generations of digital Computer 1. The first generation computers 2. The second generation computers 3. The third generation computers 4. The.
Computer Architecture
Evolution of the Computer. Zeroth Generation- Mechanical 1.Blaise Pascal –Mechanical calculator only perform Von Leibiniz –Mechanical.
Overview of microcomputer structure and operation
Introduction To Computer Programming – 1A Computer Parts, Words, and Definition Herriman High School.
Lecture 1 (Introduction)
Computer Generations Key for Computer Generations Time Frame Circuit Components Elements per Component Internal Storage Memory Capacity Data Input Popular.
HISTORY OF COMPUTER TECHNOLOGY By: Parveer Grewal.
Computer Architecture Furkan Rabee
نظام المحاضرات الالكترونينظام المحاضرات الالكتروني Introduction :: Computer Organization and Architecture Computer.
Chapter 1 Introduction.
Computer Science Introduction
William Stallings Computer Organization and Architecture 6th Edition
Computer History.
Introduction to Microprocessors
Computer Architecture and Organization
Generations of Computer
Generations of Computers
General Architecture of Digital Computer
Evolution Of Electronic Device: Diode, Transistor And IC.
Transistors and Logic Gates
=
BIC 10503: COMPUTER ARCHITECTURE
Microprocessor & Assembly Language
8051 Supplement.
Hardware & Processor Pioneers
Computer History.
Computer Electronic device Accepts data - input
Presentation transcript:

Chapter 1 Microcomputers and Microprocessors Microprocessor Evolution and Performance

Contents Introduction to microcomputer system Microprocessor evolution the INTEL processor family Microprocessor performance

Introduction to Microcomputer An microcomputer can be interpreted as a machine with: I/O devices for Input/Output, microprocessor for processing, memory units for storage Buses for connecting the above components In 1970, a microcomputer was normally interpreted as a computer considerably smaller than a mini-computer, possibly using ROM for program storage

Basic hardware units Input Microprocessor Memory Output e.g. keyboard, mouse Microprocessor e.g. 8085, 8086, mc68000 microprocessors Memory e.g. RAM, hard disk Output e.g. monitor, printer

Buses Buses: External connections to input/output unit Major Buses: Address bus: address of memory locations containing instructions or data Data bus: contents of memory locations Control Bus: synchronization and handshaking between components

General Architecture Memory Unit Primary memory Secondary memory Microprocessing unit Input unit Output unit

Processor History Vacuum Tubes to IC’s

First Generation Computers Vacuum tube technology Large room, air-conditioned Tube life-time: 3,000 hours Useless Machine? 1951: 1st Univac I (UNIVersal Automatic Computer) delivered 1952: Prediction of presidential election by CBS 1952: IBM Model 710 Data Processing System

Second Generation Computers The Transistor Is Born (Solid-State Era) 1948: invention of bipolar transistors 1956: Nobel physics award: Drs. William Shockley, John Bardeen and Walter H. Brattain (Bell Labs) 1954: Bell Labs: all-transistorized computer (TRADIC) 800 transistors Much less heat More reliable and less costly

Second Generation Computers Mainframe Computers 1958: IBM’s 1st transistorized computer 7070/7090 1959: 1401 (business-oriented model) Built on circuit boards mounted into rack panels, or frames Main frame (mainframe): the CPU portion of the computer Popular with business and industry

Third Generation Computers Invention of IC: 1959 Dr. Robert Noyce (Fairchild) and Jack Kilby (TI) Kilby: fabricating resistors, capacitors and transistors on a germanium wafer, and connecting these parts with fine gold wires Noyce: isolating individual components with reverse-biased diodes, and deposing an adherent metal film over the circuit, thus connecting the components 1st IC: 2-transistor multivibrator By mid 1960s: memory chips with 1,000 components are common

Third Generation Computers 1964: IBM 360 Series (32-bit) The first to use IC technology A family of 6 compatible computers 40 different I/O and auxiliary storage devices Memory capacity: 16K words to over 1MB. 32-bit registers x 16 24-bit address bus 128-bit data bus

Third Generation Computers 1964: IBM 360 Series (32-bit) 375,000 computations per second (<< 150 mips Pentium 100) $5 billion development cost IBM became the leading mainframe company

Minicomputer 1960s: Space Race between US & USSR IC industry boom A tremendous demand by scientists and engineers for an inexpensive computer that they could operate by themselves 1965: DEC PDP-8 (by Edson de Castro’s group) Low-cost ($25,000) minicomputer 12-bit 16-bit PDP-11 Supermini …

Microprocessors: CPU on a Chip 1968: INTEL (Integrated Electronics) Founded by Robert Noyce and Gordon Moore (Fairchild) Original goals: semiconductor memory market 1969: customized IC’s for Busicom for calculator Ted Hoff and Stan Mazor: proposed 4-bit CPU on a single chip, plus ROM, RAM chips

Microprocessors: CPU on a Chip 1971: 4000 Family By Fredrico Faggin 4001: 2K ROM with 4-bit I/O port 4002: 320-bit RAM, 4-bit output port 4003: 10-bit serial-in parallel-out shift register 4004: 4-bit processor Processor-on-a-chip: Micro-processor era

Microprocessors: CPU on a Chip 1972: 8008, 8-bit 1974: 8080, an improved version

Microprocessors: CPU on a Chip 8-bit CPUs 16-bit address (64K) MC6800: Motorola 6502: MOS Technology (spin-off from Motorola) Apple-II, Apple DOS Z-80: Zilog (spin-off from Intel) Z-80 cards on Apple-II, CP/M

Microprocessors: CPU on a Chip 16-bit CPUs (Late 1970s) 8086, 80186, 80286: Intel PC, PC-DOS, MS-DOS, SCO-Unix MC68000: Motorola 16-bit instructions Hardware multiply and divide 20-bit address buses (1MB) Workstations: Sun3

Microprocessors: CPU on a Chip 32-bit CPUs 80386, 80486: Intel MC68020, 68030: Motorola 64-bit CPUs Pentium, Pentium Pro (64-bit external data bus, 32-bit internal registers, not recognized as 64-bit CPUs in terms of internal register word length)

Microcomputers: Computers Based on Microprocessors 1975: MITS Altair 8800 (Kit) $399, i8080, programmed by depositing 1s/0s via front panel switches Other Computers boom 8080: MITS, … 6800: SWTPC 6800, … Z-80: TRS-80, … 6502: Apple I, 8K, programmed with BASIC Steve Jobs & Steve Wozniak, millionaires from PC COM’s …

Personal Computers: the Open Architecture Era 1982: IBM PC A system board (mother board) Intel 8088 processor 16K memory 5 expansion slots Third-party vendors to supply various IO adapter cards Open architecture Computer with interchangeable components

Micro-controllers: Microcomputers on a Chip Microcontroller: a computer on a chip Microprocessor, plus On-chip memory, plus Input/output ports 1995: microcontrollers out sold microprocessors 10:1 embedded on various equipments: Thermostat, machine tools, communication, automotive, … Evolution: getting greater IO capabilities Intel: MCS-51, MCS-96, …

High-Performance Processors Supercomputers Aircraft design, global climate modeling, oil-bearing formation, molecular design of new drugs, financial behavior CDC6600, 7600: Seymour Cray Cray-1: 1976, the first true supercomputer ECL, 128 KW power consumption 130 MFLOPS (Pentium 100: 150 MFLOPS) $5.1 million

High-Performance Processors Parallel Processors Tens of gigaflops Multi-processors wired by a common bus Each is given a portion of the problem to solve Hypercube: early 1980s Cosmic Cube, iPSC (with i860/RISC chips) 2D rectangular Mesh architecture: multiple processor at each node Intel: teraflops computer with 4500 nodes, each powered by 2 Pentium Pro 200.

RISC vs. CISC RISC: Reduced Instruction Set Computer (1980s) A small number of fixed-length instructions Simple addressing modes A large number of registers Instructions executed in one clock cycle Intel i860 (“Cray on a Chip”) 82 instructions, 32-bit long each Four addressing modes 32 general-purpose registers

RISC vs. CISC CISC: Complex Instruction Set Computer Intel 8086 A large number of variable length instructions Multiple addressing modes A small number of registers Multiple number of clock cycles to execute Intel 8086 Over 3000 instruction forms, 1-6 bytes 9 addressing modes 8 general-purpose registers Execution from 2 to 80+ cycles

RISC vs. CISC RISC Control unit is much simpler (simpler instructions, execution in 1 CLK) Faster execution with less total on-chip logic Chip area: 10% (vs 50% for CISC) More area for register file, data and instruction caches, FPU, and co-processor PowerPC: 32-bit, by IBM, Apple, Motorola Sparc: for SunMicro workstations

Application-Specific Processors DSP Chips Mostly for analog signal processing ADC-DSP-DAC architecture Avoid processing analog signals using discrete circuits, involving capacitors and inductance DSP: conduct complex mathematic functions Digital filter, spectrum analysis

Application-Specific Processors DSP Chip Architecture Different data/program areas: Harvard Architecture Hardware multipliers and adders, optimized to execute on a single cycle Arithmetic pipelining: several instructions operated at once Hardware loop control Multiple IO ports for communication with other processors

Summary of Processor History 1940s: Vacuum tube, large and consuming large power 1950s: Transistor (1948-) 1959: First IC (second industrial revolution) 1960s: IC was popular to build CPU’s. 1971: Intel 4004 microprocessor (2300 transistors) Starts of the microprocessor age Late 1970’s: 8080/85

Summary of Processor History 1980: RISC (reduced instruction set computer) CISC (complicated instruction set computer) vs. RISC CISC family: Intel 80x86, Pentium; Motorola 68000 series All others are RISC series.

Evolution of INTEL Processors 4004 (’71)-Pentium Pro (’93-)

INTEL Integrated Electronics Evolution: 1968: founded by Robert Noyce and Gordon Moore IA: Intel Architecture (e.g, IA-16, IA-32, IA-64) since 8008 (’72) had became the de facto standard Evolution: Internal register sizes External bus widths Real, Protected, and Virtual 8086 modes

4-bit Processors 4004 first microprocessor became available in 1971 4-bit microprocessor: 4-bit registers & 4-bit data bus #transistors: 2250 Min. feature size: 10 microns Address bus: 10 bits/1K 0.06 MIPS (@ 0.108 MHz) No internal cache

8-bit Processors 8008, 8080, 8085 became available in 1974 8-bit microprocessor

8086: IA standard Became available in 1978 16-bit data bus 20-bit address bus (was 16-bit for 8080) memory organization: 16 segments of 64KB (1 MB limit) Re-organize CPU into BIU (bus interface unit) and EU (execution unit) Allow fetch and execution simultaneously Internal register expanded to 16-bit Allow access of low/high byte separately

8086 Hardware multiply and divide instructions External math co-processor Instruction set compatible with 8080/8085 8086: defined the 80x86 architecture

8086 Not quite successful 16-bit data bus: Requires two separate 8-bit memory banks Memory chips were expensive

8088: PC standard Became available in 1979, almost identical to 8086 8-bit data bus: for hardware compatibility with 8080 16-bit internal registers and data bus (same as 8086) 20-bit address bus (was 16-bit for 8080) BIU re-designed memory organization: 16 segments of 64KB (1 MB limit) Two memory accesses for 16-bit data (less efficient) But less cost 8088: used by IBM PC (1982), 16K-64K, 4.77MHz

80186, 80188: High Integration CPU PC system: 8088 CPU + various supporting chips Clock generator 8251: serial IO (RS232) 8253: timer/counter 8255: PPI (programmable periphial interface) 8257: DMA controller 8259: interrupt controller 80186/80188: 8086/8088 + supporting functions Compatible instruction set (+ 9 new instructions)

80286 Became available in 1982 used in IBM AT computer (1984) 16-bit data bus clock speed 25% faster than 8088, throughput 5 times greater than 8088 24-bit address bus (16 MB) (vs. 20-bit/1M 8086)

80286: Real vs. Protected Modes Larger address space: 24-bit address bus Real Mode vs. Protected Mode Real Mode: Power on default mode Function like a 8086: use 20-bit least significant address lines (1M) Software compatible with 286 16 new instructions (for Protected Mode management) Faster 286: redesigned processor, plus higher clock rate (6-8MHz)

80286: Real vs. Protected Modes Multi-program environment Each program has a predetermined amount of memory Addressed via segment selector (physical addresses invisible): 16M addressable Multiple programs loaded at once (within their respective segments), protected from read/write by each other

80286: Real vs. Protected Modes Cannot be switch back to real mode to avoid illegal access by switching back and forth between modes A faster 8086 only? MS-DOS requires that all programs be run in Real Mode

Clock Speed Electrical signals cannot change instantaneously (transition period required) System clock provides timing signal for synchronization Cannot be used to compare the performance of microprocessors with different instruction sets e.g., a 66 MHz Pentium is twice as fast as a 66 MHz 80486

80386DX (aka. 80386) available in 1985, a major redesign of 86/286 Compatibility commitment through 2000 32-bit data and address buses (4 GB memory) Real Address Mode: 1M visible, 286 real mode Protected Virtual Address Mode: On board MMU Segmented tasks of 1byte to 4G bytes Segment base, limit, attributes defined by a descriptor register Page swapping: 4K pages, up to 64TB virtual memory space Windows, OS/2, Unix/Linux

80386DX (aka. 80386) Virtual 8086 mode (a special Protected mode feature): permitted multiple 8086 virtual machines-multitasking (similar to real mode) Windows (multiple MSDOS’s) Clock rate: max. 40MHz, 2 pulses per R/W bus cycle External memory cache to avoid wait Fast SRAM 93% hit rate with 64K cache Compatible instructions (14 new)

80386SX 80386SX: (for transition to 32-bit) 16-bit data bus/32-bit register 24-bit address bus

80486DX 1989: a polished 386, 6 new OS level instructions virtually identical to 386 in terms of compatibility RISC design concepts fewer clock cycles per operation, a single clock cycle for most frequently used instructions Max 50MHz 5 stage execution pipeline Portions of 5 instructions execute at once

80486DX Highly Integrated: On board 8K memory cache FPP (equivalent to external 80387 co-processor) Twice as fast as 386 at any given clock rate 20Mhz 486 ~= 40Mhz 386

80486SX 80486SX NOT a 16-bit version for transition purpose no coprocessor No internal cache For low-end applications Max. 33Mhz only

80486DX2/DX4: Overdrive Chips Processor speed increased too fast Redesign of microcomputer for compatibility becomes harder Solution: Separating internal speed with external speed, improve performance independently 80486DX2/DX4 – internal clock twice/three times (NOT four times) the external clock: runs faster internally

80486DX2/DX4: Overdrive Chips System board design is independent of processor upgrade (less expensive components are allowed) Processor operate at maximum speed data rate internally Only slow access to external data operates at system board rate Internal cache offset the speed gap 486DX2 66: 66 internal, 33 external 486DX4 100: 100 internal, 33 external (3x) Overdrive sockets: for upgrading 486dx/sx to 486dx2/dx4 (with overdrive socket pin-outs)

Pentium: Superscaler Processor available in 1992 32-bit architecture Superscaler architecture Scaling: scaling down etchable feature size to increase complexity of IC (e.g., DRAM) 10 microns/4004 to 0.13 microns (2001) Superscaler: go beyond simply scaling down Two instruction pipelines: each with own ALU, address generation circuitry, data cache interface Execute two different instructions simultaneously

Pentium: Superscaler Processor Onboard cache Separate 8K data and code caches to avoid access conflicts FPP Instruction pipeline: 8 stage Optimized floating point functions 5x-10x FLOP’s of 486 2x performance of 486 at any clock rate

Pentium: Superscaler Processor Compatibility with 386/486: Internal 32-bit registers and address bus Data bus expanded to 64-bits for higher data transfer rate Compare 8088 to 386sx transition

Pentium: Superscaler Processor non-clone competition from AMD, Cyrix development of brand identity by Intel

Pentium Pro: Two Chips in One Became available in 1995 Superscaler of degree 3 Can execute 3 instructions simultaneously Optimized for 32-bit operating systems (e.g., Windows NT, OS2/Warp) Two separate silicon die on the same package Processor: 0.35 u, 5.5 million transistors 256KB(/512K) Level 2 cache included on chip, 15.5 million transistors in smaller area

Pentium Pro: Two Chips in One On Board Level 2 cache Simplifies system board design Requires less space Gains faster communication with processor Internal (level 1) cache: 8K Pentium Pro 133 ~= 2x Pentium 66 ~= 4x 486DX2 66

Pentium Pro:Dynamic Execution Dynamic execution: reduce idle processor time by predicting instruction behaviors Multiple Branch Prediction: look as far as 30 instructions ahead to anticipate program branches Data Flow Analysis: looks at upcoming instructions and determine if they are available for processing, depending on other instructions. Determine optimal execution sequences. Speculative Execution: execute instructions in different order as entered. Speculative results are stored until final states can be determined.

What’s More from Moore’s Law? Processor Future What’s More from Moore’s Law?

Moore's Law In 1965, Gordon Moore predicted that: “The number of transistors per integrated circuit would double every 18 months” He forecast that this trend would continue through 1975

Moore’s Law

Other Microprocessors Motorola family from 6809 (Apple II) through 68040 PowerPC joint venture between Apple, IBM, and Motorola RISC Processors DEC Alpha, MIPS, Sun SPARC, etc.

CISC vs. RISC CISC (Complex Instruction Set Computer) CISC processors have a large versatile instruction set that supports many complex addressing modes move complexity from software to hardware RISC (Reduced Instruction Set Computer) RISC processors have a small instruction set move complexity from hardware to software

Microprocessor Performance Two main factors: Respond time the time between the start and completion of a task, also referred to as execution time Throughput the total amount of work done in a given time

MIPS Million Instructions Per Second MIPS = (Instruction count) / (Execution time in micro second X 106) It specifies performance inversely to execution time Faster machines have a higher MIPS rating

Some Problems of MIPS Cannot compare computers with different instruction sets, since the instruction count will certainly differ MIPS varies between programs on the same computer

iCOMP An index provided by Intel for comparison of performance of their 32-bit microprocessors Based on a variety of performance components that represent integer mathematics, graphics, etc. Combine results of a set of software application benchmarks

Chapter 2 Computer Codes, Programming, and Operating Systems Number Systems Computer Codes Programming Operating Systems

Number Systems Decimal: Base 10 Binary: Base 2 Octal: Base 8 Hexadecimal: Base 16

Base Conversion: 210 Binary to Decimal Decimal to Binary D = i=0,n-1 bi x 2i Decimal to Binary Repeated subtraction D’ = i=0,m-1 bi x 2i = D - 2m (bm=1) D <= D’ & m <= m’ (m’: max exp. s.t. (bm’=1) Long division D’ = D/2 … bi & D <= D’

MCS-51 Program Development .SDT Symbol Converter ICE (CVTSYM) Program .SYM Editor Assembler Linker .ASM .OBJ .HEX (X8051) (Link) Target

Chapter 3 80x86 Processor Architecture 8086/88 Segmented Memory 80386 80486 Pentium Pentium Pro

Processor Model Programming Model The 8086 and 8088 Processor Model Programming Model

8086: IA standard Became available in 1978 16-bit data bus 20-bit address bus (was 16-bit for 8080) memory organization: 16 segments of 64KB (1 MB limit) Re-organize CPU into BIU (bus interface unit) and EU (execution unit) Allow fetch and execution simultaneously Internal register expanded to 16-bit Allow access of low/high byte separately

8088: PC standard Became available in 1979, almost identical to 8086 8-bit data bus: for hardware compatibility with 8080 16-bit internal registers and data bus (same as 8086) 20-bit address bus (was 16-bit for 8080) BIU re-designed memory organization: 16 segments of 64KB (1 MB limit) Two memory accesses for 16-bit data (less efficient) But less cost 8088: used by IBM PC (1982), 16K-64K, 4.77MHz

80186, 80188: High Integration CPU PC system: 8088 CPU + various supporting chips Clock generator 8251: serial IO (RS232) 8253: timer/counter 8255: PPI (programmable periphial interface) 8257: DMA controller 8259: interrupt controller 80186/80188: 8086/8088 + supporting functions Compatible instruction set (+ 9 new instructions)

8086 Processor Model: BIU+EU Memory & IO address generation EU Receive codes and data from BIU Not connected to system buses Execute instructions Save results in registers, or pass to BIU to memory and IO

8086 Processor Model EU BIU BH BL AH AL DH DL CH CL BP DI SI SP CS ES Address Generation and Bus Control EU BIU BH BL AH AL DH DL CH CL BP DI SI SP  CS ES SS DS IP Instruction Queue ALU Flags

Fetch and Execution Cycle BIU+EU allows the fetch and execution cycle to overlap 0. System boot, Instruction Queue is empty 1. IP =>BIU=> address bus && IP++ 2. Mem[(IP-1)] => Instruction Queue[tail++] 3a. InstrQ[head] => EU => execution 3b. Mem[IP++] => InstrQ[tail++] Maybe multiple instructions Repeat 3a+3b (overlapped)

Waiting Conditions: Memory Access BIU+EU: execute (almost) continuously without waiting Waiting Conditions: Accessing memory locations not in queue BIU suspend instruction fetch Issues external memory address Resumes instruction fetch and execution

Waiting Conditions: Jump Next Jump Instruction Instructions in queue are discarded EU wait for the next instruction after the jump location to be fetched by BIU Resume execution

Waiting Conditions: Long Instructions Long Instruction is being executed Instruction Full BIU waits Resume instruction fetch after EU pull one or tow bytes from queue

BIU: 8088 vs. 8086 BIU is the major difference 8088: data bus: 8-bit (vs. 16-bit/8086) Instruction queue: 4 bytes (vs. 6-byte/8086) Only 30% slower than 8086 If queue is kept full

8086 Programming Model BH BL AH AL DH DL CH CL BP DI SI SP CS ES SS DS IP Flags H Flags L

8086 Programming Model Data Group: AX (AH+AL): Accumulator BX (BH+BL): Base CX (CH+CL): Counter DX (DH+DL): Data

8086 Programming Model Segment Group: Segment Registers: CS: Code Segment DS: Data Segment ES: Extra Segment SS: Stack Segment Segment Registers: Base address to particular segments

8086 Programming Model Pointer/Index Group: Index Registers: IP: Instruction Pointer CS SI: Source IndexDS DI: Destination IndexES SP: Stack PointerSS Index Registers: Index (offset) or Pointer to a Base address

8086 Flag Word Flag L: CF= 0:No Carry (Add) or Borrow (SUB) SF ZF X AF X PF X CF CF: Carry Flag CF= 0:No Carry (Add) or Borrow (SUB) CF= 1:high-order bit Carry/Borrow PF: (Even) Parity Flag (even number of 1’s in low-order 8 bits of result) AF: Aux. Carry: Carry/Borrow on bit 3 (Low nibble of AL) ZF: Zero Flag: (1: result is zero) SF: Sign Flag: (0: positive, 1: negative)

8086 Flag Word Flag H: X X X X OF DF IF TF TF: Trap flag (single-step after next instruction; clear by single-step interrupt) IF: Interrupt-Enable: enable maskable interrupts DF: Direction flag: auto-decrement (1) or increment(0) index on string operations OF: Overflow: signed result cannot be expressed within #bits in destination operand

Segmented Memory Linear vs. Segmented Linear Addressing: Segmented: The entire memory is regarded as a whole the entire memory space is available all the time Segmented: memory is divided into segments Process is limited to access designated segments at a given time

8086 Memory Organization Even and Odd Memory Banks 16-bit data bustwo-byte / two one-byte access Allows processor to work on bytes or on words (16-bit) IO operations are normally conducted in bytes Can handle odd-length instructions Single byte instructions Multiple byte (and very long) instructions

8086 Memory Organization Memory Space: Memory Banks 20-bit address bus Linearly, 1M bytes directly addressable Memory Banks Can read 16-bit data (512K words) from even and odd-addressed simultaneously need Two memory banks in parallel BHE control line: allows addressing even/odd banks or both

Memory Organization: Alignment Endianess: One way to model multi-byte CPU register AX  AH+AL Two ways to store operands in memory Big-endian CPU: (IBM370, M68*, Sparc) High-order-byte-first (HOBF) Maps highest-order byte of internal registerlowest (1st) memory byte address Operand addressaddress of MSB MOV R1, N  N: 1st byte in memory & MSB of register

Memory Organization: Alignment Little-endian CPU: (DEC, Intel) Low-order-byte-first (LOBF) Maps lowest-order byte of register 1st memory byte Operand address address of LSB (1st memory byte) MOV AX, N N: 1st byte in memory & LSB of register ALN, AHN+1 Configurable: Can switch between Big/Little-endian, or Provide instructions which convert 16-/32-bit data between two byte ordering (80486)

8086 Memory Organization Aligned operand Mis-aligned words: Operand aligned at even-byte (word/dword) boundaries Allows single access to read/write one operand Through internal shift/swap mechanism, if necessary Mis-aligned words: Word operand not start at even address Need 2 read cycles to read/write the word (8086) Issues two addresses to access the two even-aligned words containing the operand in order to access the operand slower but transparent to programmer

8086 Memory Organization 8088 always 2 cycles for word operations Aligned or not Because of 8-bit external data bus Single memory bank is sufficient

8086 Memory Map Memory Map: How memory space is allocated ROM Area: boot, BIOS RAM: OS/User Apps & data Unused Reserved: for future hardware/software uses Dedicated: for specific system interrupt and rest functions, etc.

Segment Registers 64K memory segments x 16 16-bit offset each CS, DS, ES, SS

Logical and Physical Addresses Physical: 20-bit Logical: 16-bit 16-byte segment boundaries Address Translation E.g., CS:IP

80286 First with Protection Mode Review of 286 Protected Mode … Next

80286 Became available in 1982 used in IBM AT computer (1984) 16-bit data bus clock speed 25% faster than 8088, throughput 5 times greater than 8088 24-bit address bus (16 MB) (vs. 20-bit/1M 8086)

80286: Real vs. Protected Modes Larger address space: 24-bit address bus Real Mode vs. Protected Mode Real Mode: Power on default mode Function like a 8086: use 20-bit least significant address lines (1M) Software compatible with 286 16 new instructions (for Protected Mode management) Faster 286: redesigned processor, plus higher clock rate (6-8MHz)

80286: Real vs. Protected Modes Multi-program environment Each program has a predetermined amount of memory Addressed via segment selector (physical addresses invisible): 16M addressable Multiple programs loaded at once (within their respective segments), protected from read/write by each other

80286: Real vs. Protected Modes Cannot be switch back to real mode to avoid illegal access by switching back and forth between modes A faster 8086 only? MS-DOS requires that all programs be run in Real Mode

80386 Model Refine 286 Protect Mode Expand to 32-bit registers New Virtual 8086 Mode

80386 Review

80386DX (aka. 80386) available in 1985, a major redesign of 86/286 Compatibility commitment through 2000 32-bit data and address buses (4 GB memory) Real Address Mode: 1M visible, 286 real mode Protected Virtual Address Mode: On board MMU Segmented tasks of 1byte to 4G bytes Segment base, limit, attributes defined by a descriptor register Page swapping: 4K pages, up to 64TB virtual memory space Windows, OS/2, Unix/Linux

80386DX (aka. 80386) Virtual 8086 mode (a special Protected mode feature): permitted multiple 8086 virtual machines-multitasking (similar to real mode) Windows (multiple MSDOS’s) Clock rate: max. 40MHz, 2 pulses per R/W bus cycle External memory cache to avoid wait Fast SRAM 93% hit rate with 64K cache Compatible instructions (14 new)

80386SX 80386SX: (for transition to 32-bit) 16-bit data bus/32-bit register 24-bit address bus

80386: Real vs. Protected Modes Larger address space: 32-bit address bus (4G) Real Mode vs. Protected Mode (refined from 286) Real Mode: Power on default mode Function like a 8086: (1) use only 20-bit least significant address lines (1M) (2) segmented memory retained (64K) Software compatible with 286 New Real Mode Features: access to 32-bit register set two new segments: F, G

80386: Real vs. Protected Modes new addressing mechanism vs. real mode supports protection levels segment size: 1 to 4G (not 64K, fixed) segment register: pointer to a descriptor table not base address

80386: Real vs. Protected Modes descriptor table: (8 byte per entry) 32-bit base address of segment segment size access rights memory address = base address (in table) + offset (in instruction)

80386: Real vs. Protected Modes Paging mechanism: map 32-bit linear address (base+offset) =>physical address & page frame address (4K page frames in system memory) 64TB of virtual memory

80386: Real vs. Protected Modes Protection mechanism: tasks/data/instructions are assigned a privilege level (PL) tasks running at lower PL cannot access tasks or data segments at a higher PL running programs that are protected from the others

80386: Real vs. Protected Modes Two Ways to Run 8086 Programs: Real Mode Virtual 8086 Mode Virtual 8086 Mode: runs multiple 8086+other 386 (protected mode) programs independently each sees 1 MB (mapped via paging to anywhere in 4GB space) running V8086+ Protected mode simultaneously

386 80386 Processor Model

80386 Processor Model: BIU+CPU+MMU control 32-bit address and data buses keep instruction queue full (16 bytes) Address pipelining address of next memory location is output halfway through current bus cycle more address decode time slower memory chip is OK easier to keep up with faster (2 CLK) bus cycle of 386

80386 Processor Model: BIU dynamic data bus sizing switch between 16-/32-bit data bus on the fly accommodate to external 16-bit memory cards or IO devices adjust bus timing to use only the least significant 16 bits

80386 Processor Model: BIU External memory 4 memory banks (4x8=32bits) BE0-BE3 for bank selection access byte or word or double word aligned operands: 1 bus cycle mis-aligned (not %4): 2 bus cycles

80386 Processor Model: CPU CPU=IU (instruction) +EU (execution) IU: fetching & execution overlap IU: retrieval instructions from queue decode store in decoded queue EU:ALU+registers (32-bit) execute decode instructions

80386 Processor Model: MMU Segmentation unit Paging Unit Real mode: generate the 20-bit physical address Protected mode: store base/size/rights in descriptor registers cache descriptor tables in RAM faster operations Paging Unit determines physical addresses associated with active segments (divided into 4K pages) virtual memory support to allow larger programs

80386 Programming Model General Purpose Registers Data & Addresses Groups Status & Control Flags VM, RF, NT, IOPL Segment Group

80386 Programming Model Special purpose Registers

80386 Programming Model Memory Management segment descriptors Paging keep base, size, access rights 3 types of tables: global (GDT), local (LDT), interrupt (IDT) addressing: index (to a table) + RPL base + offset (from instruction) Paging TLB

80386 Programming Model Protection (PL) Gates task: CPL instruction: RPL data segment: DPL Gates special descriptors that allows access to higher PL tasks from lower PL tasks

80486 Review …

80486DX 1989: a polished 386, 6 new OS level instructions virtually identical to 386 in terms of compatibility RISC design concepts fewer clock cycles per operation, a single clock cycle for most frequently used instructions Max 50MHz 5 stage execution pipeline Portions of 5 instructions execute at once

80486DX Highly Integrated: On board 8K memory cache FPP (equivalent to external 80387 co-processor) Twice as fast as 386 at any given clock rate 20Mhz 486 ~= 40Mhz 386

80486SX 80486SX NOT a 16-bit version for transition purpose no coprocessor No internal cache For low-end applications Max. 33Mhz only

80486DX2/DX4: Overdrive Chips Processor speed increased too fast Redesign of microcomputer for compatibility becomes harder Solution: Separating internal speed with external speed, improve performance independently 80486DX2/DX4 – internal clock twice/three times (NOT four times) the external clock: runs faster internally

80486DX2/DX4: Overdrive Chips System board design is independent of processor upgrade (less expensive components are allowed) Processor operate at maximum speed data rate internally Only slow access to external data operates at system board rate Internal cache offset the speed gap 486DX2 66: 66 internal, 33 external 486DX4 100: 100 internal, 33 external (3x) Overdrive sockets: for upgrading 486dx/sx to 486dx2/dx4 (with overdrive socket pin-outs)

486 Processor Features 386 features: New features Real/Protected Modes Memory Management PL’s registers & bus sizes New features 6 OS instructions 8K/16K onboard cache (was external before 386)

486 Processor Features A better 386 5 stage instruction pipeline IF/ID/EX => PF/D1/D2/EX/WB PF: instructions => Q (2*16-bytes) D1: determine opcode D2: determine memory address of operands EX: execute indicated OP WB: update register

486 Processor Features Reduced Instruction Cycle Times 5 stage instruction pipeline (e.g., Fig. 3.18) instruction cycle times: 8086: 4 CLK 80386: 2 CLK 80486: 1 CLK (close to RISC) about 2X faster than 386

486 Processor Model: 386+FPU+Cache 386 units retained: BIU, CPU, MMU new: FPU (80387) + Cache (8K/16K) FPU: 387 onboard 0.8 u => #transistors increased (275K => 1+ millions) simplified system board design speedup FP operations

486 Processor Model: Cache Cache (8K/16K (dx4)) Function: bridge processor memory bandwidth 8088: 4.77MHz 80486: 50MHz Pentium: 100MHz Pentium Pro: 133 MHz Main Memory (DRAM): relatively slow Fast Static RAMs (SRAM) as cache

486 Processor Model: Cache Organization: 8K 4-way set associative 4 direct mapped caches wired in parallel each block maps to a set of 4 lines unified: data & code in the same cache write-through: update cache and memory page on write operations

486 Processor Model: Cache locality (why caches help?) spatial locality: e.g., array of data temporal: e.g., loops in codes operations on hit/miss 128-bit cache lines 32-bit x N to catch locality (N=4) 128-bit = 16-byte

486 Processor Model: Cache Mapping: memory => many-to-many => cache Data RAM: save memory data Tag RAM: save memory address information 3 methods of mapping fully associative: memory block to any cache line direct map: memory block to specific line trashing set associative: memory block to a set of cache lines

486 Processor Model: Cache Replacement policy (LRU) valid bits: all 4 lines in use ? NO => use any unused line YES => find one to replace LRU bits: which is least recently used

Pentium Review …

Pentium: Superscaler Processor available in 1992 32-bit architecture Superscaler architecture Scaling: scaling down etchable feature size to increase complexity of IC (e.g., DRAM) 10 microns/4004 to 0.13 microns (2001) Superscaler: go beyond simply scaling down Two instruction pipelines: each with own ALU, address generation circuitry, data cache interface Execute two different instructions simultaneously

Pentium: Superscaler Processor Onboard cache Separate 8K data and code caches to avoid access conflicts FPP Instruction pipeline: 8 stage Optimized floating point functions 5x-10x FLOP’s of 486 2x performance of 486 at any clock rate

Pentium: Superscaler Processor Compatibility with 386/486: Internal 32-bit registers and address bus Data bus expanded to 64-bits for higher data transfer rate Compare 8088 to 386sx transition

Pentium: Superscaler Processor non-clone competition from AMD, Cyrix development of brand identity by Intel

Pentium Pro Review …

Pentium Pro: Two Chips in One Became available in 1995 Superscaler of degree 3 Can execute 3 instructions simultaneously Optimized for 32-bit operating systems (e.g., Windows NT, OS2/Warp) Two separate silicon die on the same package Processor: 0.35 u, 5.5 million transistors 256KB(/512K) Level 2 cache included on chip, 15.5 million transistors in smaller area

Pentium Pro: Two Chips in One On Board Level 2 cache Simplifies system board design Requires less space Gains faster communication with processor Internal (level 1) cache: 8K Pentium Pro 133 ~= 2x Pentium 66 ~= 4x 486DX2 66

Pentium Pro:Dynamic Execution Dynamic execution: reduce idle processor time by predicting instruction behaviors Multiple Branch Prediction: look as far as 30 instructions ahead to anticipate program branches Data Flow Analysis: looks at upcoming instructions and determine if they are available for processing, depending on other instructions. Determine optimal execution sequences. Speculative Execution: execute instructions in different order as entered. Speculative results are stored until final states can be determined.