Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 1 Microcomputers and Microprocessors

Similar presentations


Presentation on theme: "Chapter 1 Microcomputers and Microprocessors"— Presentation transcript:

1 Chapter 1 Microcomputers and Microprocessors
Microprocessor Evolution and Performance

2 Contents Introduction to microcomputer system Microprocessor evolution
the INTEL processor family Microprocessor performance

3 Introduction to Microcomputer
An microcomputer can be interpreted as a machine with: I/O devices for Input/Output, microprocessor for processing, memory units for storage Buses for connecting the above components In 1970, a microcomputer was normally interpreted as a computer considerably smaller than a mini-computer, possibly using ROM for program storage

4 Basic hardware units Input Microprocessor Memory Output
e.g. keyboard, mouse Microprocessor e.g. 8085, 8086, mc68000 microprocessors Memory e.g. RAM, hard disk Output e.g. monitor, printer

5 Buses Buses: External connections to input/output unit Major Buses:
Address bus: address of memory locations containing instructions or data Data bus: contents of memory locations Control Bus: synchronization and handshaking between components

6 General Architecture Memory Unit Primary memory Secondary memory
Microprocessing unit Input unit Output unit

7 Processor History Vacuum Tubes to IC’s

8 First Generation Computers
Vacuum tube technology Large room, air-conditioned Tube life-time: 3,000 hours Useless Machine? 1951: 1st Univac I (UNIVersal Automatic Computer) delivered 1952: Prediction of presidential election by CBS 1952: IBM Model 710 Data Processing System

9 Second Generation Computers
The Transistor Is Born (Solid-State Era) 1948: invention of bipolar transistors 1956: Nobel physics award: Drs. William Shockley, John Bardeen and Walter H. Brattain (Bell Labs) 1954: Bell Labs: all-transistorized computer (TRADIC) 800 transistors Much less heat More reliable and less costly

10 Second Generation Computers
Mainframe Computers 1958: IBM’s 1st transistorized computer 7070/7090 1959: 1401 (business-oriented model) Built on circuit boards mounted into rack panels, or frames Main frame (mainframe): the CPU portion of the computer Popular with business and industry

11 Third Generation Computers
Invention of IC: 1959 Dr. Robert Noyce (Fairchild) and Jack Kilby (TI) Kilby: fabricating resistors, capacitors and transistors on a germanium wafer, and connecting these parts with fine gold wires Noyce: isolating individual components with reverse-biased diodes, and deposing an adherent metal film over the circuit, thus connecting the components 1st IC: 2-transistor multivibrator By mid 1960s: memory chips with 1,000 components are common

12 Third Generation Computers
1964: IBM 360 Series (32-bit) The first to use IC technology A family of 6 compatible computers 40 different I/O and auxiliary storage devices Memory capacity: 16K words to over 1MB. 32-bit registers x 16 24-bit address bus 128-bit data bus

13 Third Generation Computers
1964: IBM 360 Series (32-bit) 375,000 computations per second (<< 150 mips Pentium 100) $5 billion development cost IBM became the leading mainframe company

14 Minicomputer 1960s: Space Race between US & USSR IC industry boom
A tremendous demand by scientists and engineers for an inexpensive computer that they could operate by themselves 1965: DEC PDP-8 (by Edson de Castro’s group) Low-cost ($25,000) minicomputer 12-bit 16-bit PDP-11 Supermini …

15 Microprocessors: CPU on a Chip
1968: INTEL (Integrated Electronics) Founded by Robert Noyce and Gordon Moore (Fairchild) Original goals: semiconductor memory market 1969: customized IC’s for Busicom for calculator Ted Hoff and Stan Mazor: proposed 4-bit CPU on a single chip, plus ROM, RAM chips

16 Microprocessors: CPU on a Chip
1971: 4000 Family By Fredrico Faggin 4001: 2K ROM with 4-bit I/O port 4002: 320-bit RAM, 4-bit output port 4003: 10-bit serial-in parallel-out shift register 4004: 4-bit processor Processor-on-a-chip: Micro-processor era

17 Microprocessors: CPU on a Chip
1972: 8008, 8-bit 1974: 8080, an improved version

18 Microprocessors: CPU on a Chip
8-bit CPUs 16-bit address (64K) MC6800: Motorola 6502: MOS Technology (spin-off from Motorola) Apple-II, Apple DOS Z-80: Zilog (spin-off from Intel) Z-80 cards on Apple-II, CP/M

19 Microprocessors: CPU on a Chip
16-bit CPUs (Late 1970s) 8086, 80186, 80286: Intel PC, PC-DOS, MS-DOS, SCO-Unix MC68000: Motorola 16-bit instructions Hardware multiply and divide 20-bit address buses (1MB) Workstations: Sun3

20 Microprocessors: CPU on a Chip
32-bit CPUs 80386, 80486: Intel MC68020, 68030: Motorola 64-bit CPUs Pentium, Pentium Pro (64-bit external data bus, 32-bit internal registers, not recognized as 64-bit CPUs in terms of internal register word length)

21 Microcomputers: Computers Based on Microprocessors
1975: MITS Altair 8800 (Kit) $399, i8080, programmed by depositing 1s/0s via front panel switches Other Computers boom 8080: MITS, … 6800: SWTPC 6800, … Z-80: TRS-80, … 6502: Apple I, 8K, programmed with BASIC Steve Jobs & Steve Wozniak, millionaires from PC COM’s …

22 Personal Computers: the Open Architecture Era
1982: IBM PC A system board (mother board) Intel 8088 processor 16K memory 5 expansion slots Third-party vendors to supply various IO adapter cards Open architecture Computer with interchangeable components

23 Micro-controllers: Microcomputers on a Chip
Microcontroller: a computer on a chip Microprocessor, plus On-chip memory, plus Input/output ports 1995: microcontrollers out sold microprocessors 10:1 embedded on various equipments: Thermostat, machine tools, communication, automotive, … Evolution: getting greater IO capabilities Intel: MCS-51, MCS-96, …

24 High-Performance Processors
Supercomputers Aircraft design, global climate modeling, oil-bearing formation, molecular design of new drugs, financial behavior CDC6600, 7600: Seymour Cray Cray-1: 1976, the first true supercomputer ECL, 128 KW power consumption 130 MFLOPS (Pentium 100: 150 MFLOPS) $5.1 million

25 High-Performance Processors
Parallel Processors Tens of gigaflops Multi-processors wired by a common bus Each is given a portion of the problem to solve Hypercube: early 1980s Cosmic Cube, iPSC (with i860/RISC chips) 2D rectangular Mesh architecture: multiple processor at each node Intel: teraflops computer with 4500 nodes, each powered by 2 Pentium Pro 200.

26 RISC vs. CISC RISC: Reduced Instruction Set Computer (1980s)
A small number of fixed-length instructions Simple addressing modes A large number of registers Instructions executed in one clock cycle Intel i860 (“Cray on a Chip”) 82 instructions, 32-bit long each Four addressing modes 32 general-purpose registers

27 RISC vs. CISC CISC: Complex Instruction Set Computer Intel 8086
A large number of variable length instructions Multiple addressing modes A small number of registers Multiple number of clock cycles to execute Intel 8086 Over 3000 instruction forms, 1-6 bytes 9 addressing modes 8 general-purpose registers Execution from 2 to 80+ cycles

28 RISC vs. CISC RISC Control unit is much simpler (simpler instructions, execution in 1 CLK) Faster execution with less total on-chip logic Chip area: 10% (vs 50% for CISC) More area for register file, data and instruction caches, FPU, and co-processor PowerPC: 32-bit, by IBM, Apple, Motorola Sparc: for SunMicro workstations

29 Application-Specific Processors
DSP Chips Mostly for analog signal processing ADC-DSP-DAC architecture Avoid processing analog signals using discrete circuits, involving capacitors and inductance DSP: conduct complex mathematic functions Digital filter, spectrum analysis

30 Application-Specific Processors
DSP Chip Architecture Different data/program areas: Harvard Architecture Hardware multipliers and adders, optimized to execute on a single cycle Arithmetic pipelining: several instructions operated at once Hardware loop control Multiple IO ports for communication with other processors

31 Summary of Processor History
1940s: Vacuum tube, large and consuming large power 1950s: Transistor (1948-) 1959: First IC (second industrial revolution) 1960s: IC was popular to build CPU’s. 1971: Intel 4004 microprocessor (2300 transistors) Starts of the microprocessor age Late 1970’s: 8080/85

32 Summary of Processor History
1980: RISC (reduced instruction set computer) CISC (complicated instruction set computer) vs. RISC CISC family: Intel 80x86, Pentium; Motorola series All others are RISC series.

33 Evolution of INTEL Processors
4004 (’71)-Pentium Pro (’93-)

34 INTEL Integrated Electronics Evolution:
1968: founded by Robert Noyce and Gordon Moore IA: Intel Architecture (e.g, IA-16, IA-32, IA-64) since 8008 (’72) had became the de facto standard Evolution: Internal register sizes External bus widths Real, Protected, and Virtual 8086 modes

35 4-bit Processors 4004 first microprocessor became available in 1971
4-bit microprocessor: 4-bit registers & 4-bit data bus #transistors: 2250 Min. feature size: 10 microns Address bus: 10 bits/1K 0.06 MIPS MHz) No internal cache

36 8-bit Processors 8008, 8080, 8085 became available in 1974
8-bit microprocessor

37 8086: IA standard Became available in 1978
16-bit data bus 20-bit address bus (was 16-bit for 8080) memory organization: 16 segments of 64KB (1 MB limit) Re-organize CPU into BIU (bus interface unit) and EU (execution unit) Allow fetch and execution simultaneously Internal register expanded to 16-bit Allow access of low/high byte separately

38 8086 Hardware multiply and divide instructions
External math co-processor Instruction set compatible with 8080/8085 8086: defined the 80x86 architecture

39 8086 Not quite successful 16-bit data bus: Requires two separate 8-bit memory banks Memory chips were expensive

40 8088: PC standard Became available in 1979, almost identical to 8086
8-bit data bus: for hardware compatibility with 8080 16-bit internal registers and data bus (same as 8086) 20-bit address bus (was 16-bit for 8080) BIU re-designed memory organization: 16 segments of 64KB (1 MB limit) Two memory accesses for 16-bit data (less efficient) But less cost 8088: used by IBM PC (1982), 16K-64K, 4.77MHz

41 80186, 80188: High Integration CPU PC system:
8088 CPU + various supporting chips Clock generator 8251: serial IO (RS232) 8253: timer/counter 8255: PPI (programmable periphial interface) 8257: DMA controller 8259: interrupt controller 80186/80188: 8086/ supporting functions Compatible instruction set (+ 9 new instructions)

42 80286 Became available in 1982 used in IBM AT computer (1984)
16-bit data bus clock speed 25% faster than 8088, throughput 5 times greater than 8088 24-bit address bus (16 MB) (vs. 20-bit/1M 8086)

43 80286: Real vs. Protected Modes
Larger address space: 24-bit address bus Real Mode vs. Protected Mode Real Mode: Power on default mode Function like a 8086: use 20-bit least significant address lines (1M) Software compatible with 286 16 new instructions (for Protected Mode management) Faster 286: redesigned processor, plus higher clock rate (6-8MHz)

44 80286: Real vs. Protected Modes
Multi-program environment Each program has a predetermined amount of memory Addressed via segment selector (physical addresses invisible): 16M addressable Multiple programs loaded at once (within their respective segments), protected from read/write by each other

45 80286: Real vs. Protected Modes
Cannot be switch back to real mode to avoid illegal access by switching back and forth between modes A faster 8086 only? MS-DOS requires that all programs be run in Real Mode

46 Clock Speed Electrical signals cannot change instantaneously (transition period required) System clock provides timing signal for synchronization Cannot be used to compare the performance of microprocessors with different instruction sets e.g., a 66 MHz Pentium is twice as fast as a 66 MHz 80486

47 80386DX (aka. 80386) available in 1985, a major redesign of 86/286
Compatibility commitment through 2000 32-bit data and address buses (4 GB memory) Real Address Mode: 1M visible, 286 real mode Protected Virtual Address Mode: On board MMU Segmented tasks of 1byte to 4G bytes Segment base, limit, attributes defined by a descriptor register Page swapping: 4K pages, up to 64TB virtual memory space Windows, OS/2, Unix/Linux

48 80386DX (aka ) Virtual 8086 mode (a special Protected mode feature): permitted multiple 8086 virtual machines-multitasking (similar to real mode) Windows (multiple MSDOS’s) Clock rate: max. 40MHz, 2 pulses per R/W bus cycle External memory cache to avoid wait Fast SRAM 93% hit rate with 64K cache Compatible instructions (14 new)

49 80386SX 80386SX: (for transition to 32-bit)
16-bit data bus/32-bit register 24-bit address bus

50 80486DX 1989: a polished 386, 6 new OS level instructions
virtually identical to 386 in terms of compatibility RISC design concepts fewer clock cycles per operation, a single clock cycle for most frequently used instructions Max 50MHz 5 stage execution pipeline Portions of 5 instructions execute at once

51 80486DX Highly Integrated: On board 8K memory cache FPP (equivalent to external co-processor) Twice as fast as 386 at any given clock rate 20Mhz 486 ~= 40Mhz 386

52 80486SX 80486SX NOT a 16-bit version for transition purpose
no coprocessor No internal cache For low-end applications Max. 33Mhz only

53 80486DX2/DX4: Overdrive Chips
Processor speed increased too fast Redesign of microcomputer for compatibility becomes harder Solution: Separating internal speed with external speed, improve performance independently 80486DX2/DX4 – internal clock twice/three times (NOT four times) the external clock: runs faster internally

54 80486DX2/DX4: Overdrive Chips
System board design is independent of processor upgrade (less expensive components are allowed) Processor operate at maximum speed data rate internally Only slow access to external data operates at system board rate Internal cache offset the speed gap 486DX2 66: 66 internal, 33 external 486DX4 100: 100 internal, 33 external (3x) Overdrive sockets: for upgrading 486dx/sx to 486dx2/dx4 (with overdrive socket pin-outs)

55 Pentium: Superscaler Processor
available in 1992 32-bit architecture Superscaler architecture Scaling: scaling down etchable feature size to increase complexity of IC (e.g., DRAM) 10 microns/4004 to 0.13 microns (2001) Superscaler: go beyond simply scaling down Two instruction pipelines: each with own ALU, address generation circuitry, data cache interface Execute two different instructions simultaneously

56 Pentium: Superscaler Processor
Onboard cache Separate 8K data and code caches to avoid access conflicts FPP Instruction pipeline: 8 stage Optimized floating point functions 5x-10x FLOP’s of 486 2x performance of 486 at any clock rate

57 Pentium: Superscaler Processor
Compatibility with 386/486: Internal 32-bit registers and address bus Data bus expanded to 64-bits for higher data transfer rate Compare 8088 to 386sx transition

58 Pentium: Superscaler Processor
non-clone competition from AMD, Cyrix development of brand identity by Intel

59 Pentium Pro: Two Chips in One
Became available in 1995 Superscaler of degree 3 Can execute 3 instructions simultaneously Optimized for 32-bit operating systems (e.g., Windows NT, OS2/Warp) Two separate silicon die on the same package Processor: 0.35 u, 5.5 million transistors 256KB(/512K) Level 2 cache included on chip, 15.5 million transistors in smaller area

60 Pentium Pro: Two Chips in One
On Board Level 2 cache Simplifies system board design Requires less space Gains faster communication with processor Internal (level 1) cache: 8K Pentium Pro 133 ~= 2x Pentium 66 ~= 4x 486DX2 66

61 Pentium Pro:Dynamic Execution
Dynamic execution: reduce idle processor time by predicting instruction behaviors Multiple Branch Prediction: look as far as 30 instructions ahead to anticipate program branches Data Flow Analysis: looks at upcoming instructions and determine if they are available for processing, depending on other instructions. Determine optimal execution sequences. Speculative Execution: execute instructions in different order as entered. Speculative results are stored until final states can be determined.

62 What’s More from Moore’s Law?
Processor Future What’s More from Moore’s Law?

63 Moore's Law In 1965, Gordon Moore predicted that:
“The number of transistors per integrated circuit would double every 18 months” He forecast that this trend would continue through 1975

64 Moore’s Law

65 Other Microprocessors
Motorola family from 6809 (Apple II) through 68040 PowerPC joint venture between Apple, IBM, and Motorola RISC Processors DEC Alpha, MIPS, Sun SPARC, etc.

66 CISC vs. RISC CISC (Complex Instruction Set Computer)
CISC processors have a large versatile instruction set that supports many complex addressing modes move complexity from software to hardware RISC (Reduced Instruction Set Computer) RISC processors have a small instruction set move complexity from hardware to software

67 Microprocessor Performance
Two main factors: Respond time the time between the start and completion of a task, also referred to as execution time Throughput the total amount of work done in a given time

68 MIPS Million Instructions Per Second
MIPS = (Instruction count) / (Execution time in micro second X 106) It specifies performance inversely to execution time Faster machines have a higher MIPS rating

69 Some Problems of MIPS Cannot compare computers with different instruction sets, since the instruction count will certainly differ MIPS varies between programs on the same computer

70 iCOMP An index provided by Intel for comparison of performance of their 32-bit microprocessors Based on a variety of performance components that represent integer mathematics, graphics, etc. Combine results of a set of software application benchmarks

71

72 Chapter 2 Computer Codes, Programming, and Operating Systems
Number Systems Computer Codes Programming Operating Systems

73 Number Systems Decimal: Base 10 Binary: Base 2 Octal: Base 8
Hexadecimal: Base 16

74 Base Conversion: 210 Binary to Decimal Decimal to Binary
D = i=0,n-1 bi x 2i Decimal to Binary Repeated subtraction D’ = i=0,m-1 bi x 2i = D - 2m (bm=1) D <= D’ & m <= m’ (m’: max exp. s.t. (bm’=1) Long division D’ = D/2 … bi & D <= D’

75

76 MCS-51 Program Development
.SDT Symbol Converter ICE (CVTSYM) Program .SYM Editor Assembler Linker .ASM .OBJ .HEX (X8051) (Link) Target

77 Chapter 3 80x86 Processor Architecture
8086/88 Segmented Memory 80386 80486 Pentium Pentium Pro

78 Processor Model Programming Model
The 8086 and 8088 Processor Model Programming Model

79 8086: IA standard Became available in 1978
16-bit data bus 20-bit address bus (was 16-bit for 8080) memory organization: 16 segments of 64KB (1 MB limit) Re-organize CPU into BIU (bus interface unit) and EU (execution unit) Allow fetch and execution simultaneously Internal register expanded to 16-bit Allow access of low/high byte separately

80 8088: PC standard Became available in 1979, almost identical to 8086
8-bit data bus: for hardware compatibility with 8080 16-bit internal registers and data bus (same as 8086) 20-bit address bus (was 16-bit for 8080) BIU re-designed memory organization: 16 segments of 64KB (1 MB limit) Two memory accesses for 16-bit data (less efficient) But less cost 8088: used by IBM PC (1982), 16K-64K, 4.77MHz

81 80186, 80188: High Integration CPU PC system:
8088 CPU + various supporting chips Clock generator 8251: serial IO (RS232) 8253: timer/counter 8255: PPI (programmable periphial interface) 8257: DMA controller 8259: interrupt controller 80186/80188: 8086/ supporting functions Compatible instruction set (+ 9 new instructions)

82 8086 Processor Model: BIU+EU
Memory & IO address generation EU Receive codes and data from BIU Not connected to system buses Execute instructions Save results in registers, or pass to BIU to memory and IO

83 8086 Processor Model EU BIU BH BL AH AL DH DL CH CL BP DI SI SP CS ES
Address Generation and Bus Control EU BIU BH BL AH AL DH DL CH CL BP DI SI SP CS ES SS DS IP Instruction Queue ALU Flags

84 Fetch and Execution Cycle
BIU+EU allows the fetch and execution cycle to overlap 0. System boot, Instruction Queue is empty 1. IP =>BIU=> address bus && IP++ 2. Mem[(IP-1)] => Instruction Queue[tail++] 3a. InstrQ[head] => EU => execution 3b. Mem[IP++] => InstrQ[tail++] Maybe multiple instructions Repeat 3a+3b (overlapped)

85 Waiting Conditions: Memory Access
BIU+EU: execute (almost) continuously without waiting Waiting Conditions: Accessing memory locations not in queue BIU suspend instruction fetch Issues external memory address Resumes instruction fetch and execution

86 Waiting Conditions: Jump
Next Jump Instruction Instructions in queue are discarded EU wait for the next instruction after the jump location to be fetched by BIU Resume execution

87 Waiting Conditions: Long Instructions
Long Instruction is being executed Instruction Full BIU waits Resume instruction fetch after EU pull one or tow bytes from queue

88 BIU: 8088 vs. 8086 BIU is the major difference 8088:
data bus: 8-bit (vs. 16-bit/8086) Instruction queue: 4 bytes (vs. 6-byte/8086) Only 30% slower than 8086 If queue is kept full

89 8086 Programming Model BH BL AH AL DH DL CH CL BP DI SI SP CS ES SS DS
IP Flags H Flags L

90 8086 Programming Model Data Group: AX (AH+AL): Accumulator
BX (BH+BL): Base CX (CH+CL): Counter DX (DH+DL): Data

91 8086 Programming Model Segment Group: Segment Registers:
CS: Code Segment DS: Data Segment ES: Extra Segment SS: Stack Segment Segment Registers: Base address to particular segments

92 8086 Programming Model Pointer/Index Group: Index Registers:
IP: Instruction Pointer CS SI: Source IndexDS DI: Destination IndexES SP: Stack PointerSS Index Registers: Index (offset) or Pointer to a Base address

93 8086 Flag Word Flag L: CF= 0:No Carry (Add) or Borrow (SUB)
SF ZF X AF X PF X CF CF: Carry Flag CF= 0:No Carry (Add) or Borrow (SUB) CF= 1:high-order bit Carry/Borrow PF: (Even) Parity Flag (even number of 1’s in low-order 8 bits of result) AF: Aux. Carry: Carry/Borrow on bit 3 (Low nibble of AL) ZF: Zero Flag: (1: result is zero) SF: Sign Flag: (0: positive, 1: negative)

94 8086 Flag Word Flag H: X X X X OF DF IF TF TF: Trap flag (single-step after next instruction; clear by single-step interrupt) IF: Interrupt-Enable: enable maskable interrupts DF: Direction flag: auto-decrement (1) or increment(0) index on string operations OF: Overflow: signed result cannot be expressed within #bits in destination operand

95 Segmented Memory Linear vs. Segmented Linear Addressing: Segmented:
The entire memory is regarded as a whole the entire memory space is available all the time Segmented: memory is divided into segments Process is limited to access designated segments at a given time

96 8086 Memory Organization Even and Odd Memory Banks
16-bit data bustwo-byte / two one-byte access Allows processor to work on bytes or on words (16-bit) IO operations are normally conducted in bytes Can handle odd-length instructions Single byte instructions Multiple byte (and very long) instructions

97 8086 Memory Organization Memory Space: Memory Banks 20-bit address bus
Linearly, 1M bytes directly addressable Memory Banks Can read 16-bit data (512K words) from even and odd-addressed simultaneously need Two memory banks in parallel BHE control line: allows addressing even/odd banks or both

98 Memory Organization: Alignment
Endianess: One way to model multi-byte CPU register AX  AH+AL Two ways to store operands in memory Big-endian CPU: (IBM370, M68*, Sparc) High-order-byte-first (HOBF) Maps highest-order byte of internal registerlowest (1st) memory byte address Operand addressaddress of MSB MOV R1, N  N: 1st byte in memory & MSB of register

99 Memory Organization: Alignment
Little-endian CPU: (DEC, Intel) Low-order-byte-first (LOBF) Maps lowest-order byte of register 1st memory byte Operand address address of LSB (1st memory byte) MOV AX, N N: 1st byte in memory & LSB of register ALN, AHN+1 Configurable: Can switch between Big/Little-endian, or Provide instructions which convert 16-/32-bit data between two byte ordering (80486)

100 8086 Memory Organization Aligned operand Mis-aligned words:
Operand aligned at even-byte (word/dword) boundaries Allows single access to read/write one operand Through internal shift/swap mechanism, if necessary Mis-aligned words: Word operand not start at even address Need 2 read cycles to read/write the word (8086) Issues two addresses to access the two even-aligned words containing the operand in order to access the operand slower but transparent to programmer

101 8086 Memory Organization 8088 always 2 cycles for word operations
Aligned or not Because of 8-bit external data bus Single memory bank is sufficient

102 8086 Memory Map Memory Map: How memory space is allocated
ROM Area: boot, BIOS RAM: OS/User Apps & data Unused Reserved: for future hardware/software uses Dedicated: for specific system interrupt and rest functions, etc.

103 Segment Registers 64K memory segments x 16 16-bit offset each
CS, DS, ES, SS

104 Logical and Physical Addresses
Physical: 20-bit Logical: 16-bit 16-byte segment boundaries Address Translation E.g., CS:IP

105 80286 First with Protection Mode Review of 286 Protected Mode … Next

106 80286 Became available in 1982 used in IBM AT computer (1984)
16-bit data bus clock speed 25% faster than 8088, throughput 5 times greater than 8088 24-bit address bus (16 MB) (vs. 20-bit/1M 8086)

107 80286: Real vs. Protected Modes
Larger address space: 24-bit address bus Real Mode vs. Protected Mode Real Mode: Power on default mode Function like a 8086: use 20-bit least significant address lines (1M) Software compatible with 286 16 new instructions (for Protected Mode management) Faster 286: redesigned processor, plus higher clock rate (6-8MHz)

108 80286: Real vs. Protected Modes
Multi-program environment Each program has a predetermined amount of memory Addressed via segment selector (physical addresses invisible): 16M addressable Multiple programs loaded at once (within their respective segments), protected from read/write by each other

109 80286: Real vs. Protected Modes
Cannot be switch back to real mode to avoid illegal access by switching back and forth between modes A faster 8086 only? MS-DOS requires that all programs be run in Real Mode

110 80386 Model Refine 286 Protect Mode Expand to 32-bit registers
New Virtual 8086 Mode

111 80386 Review

112 80386DX (aka. 80386) available in 1985, a major redesign of 86/286
Compatibility commitment through 2000 32-bit data and address buses (4 GB memory) Real Address Mode: 1M visible, 286 real mode Protected Virtual Address Mode: On board MMU Segmented tasks of 1byte to 4G bytes Segment base, limit, attributes defined by a descriptor register Page swapping: 4K pages, up to 64TB virtual memory space Windows, OS/2, Unix/Linux

113 80386DX (aka ) Virtual 8086 mode (a special Protected mode feature): permitted multiple 8086 virtual machines-multitasking (similar to real mode) Windows (multiple MSDOS’s) Clock rate: max. 40MHz, 2 pulses per R/W bus cycle External memory cache to avoid wait Fast SRAM 93% hit rate with 64K cache Compatible instructions (14 new)

114 80386SX 80386SX: (for transition to 32-bit)
16-bit data bus/32-bit register 24-bit address bus

115 80386: Real vs. Protected Modes
Larger address space: 32-bit address bus (4G) Real Mode vs. Protected Mode (refined from 286) Real Mode: Power on default mode Function like a 8086: (1) use only 20-bit least significant address lines (1M) (2) segmented memory retained (64K) Software compatible with 286 New Real Mode Features: access to 32-bit register set two new segments: F, G

116 80386: Real vs. Protected Modes
new addressing mechanism vs. real mode supports protection levels segment size: 1 to 4G (not 64K, fixed) segment register: pointer to a descriptor table not base address

117 80386: Real vs. Protected Modes
descriptor table: (8 byte per entry) 32-bit base address of segment segment size access rights memory address = base address (in table) + offset (in instruction)

118 80386: Real vs. Protected Modes
Paging mechanism: map 32-bit linear address (base+offset) =>physical address & page frame address (4K page frames in system memory) 64TB of virtual memory

119 80386: Real vs. Protected Modes
Protection mechanism: tasks/data/instructions are assigned a privilege level (PL) tasks running at lower PL cannot access tasks or data segments at a higher PL running programs that are protected from the others

120 80386: Real vs. Protected Modes
Two Ways to Run 8086 Programs: Real Mode Virtual 8086 Mode Virtual 8086 Mode: runs multiple 8086+other 386 (protected mode) programs independently each sees 1 MB (mapped via paging to anywhere in 4GB space) running V8086+ Protected mode simultaneously

121 386 80386 Processor Model

122 80386 Processor Model: BIU+CPU+MMU
control 32-bit address and data buses keep instruction queue full (16 bytes) Address pipelining address of next memory location is output halfway through current bus cycle more address decode time slower memory chip is OK easier to keep up with faster (2 CLK) bus cycle of 386

123 80386 Processor Model: BIU dynamic data bus sizing
switch between 16-/32-bit data bus on the fly accommodate to external 16-bit memory cards or IO devices adjust bus timing to use only the least significant 16 bits

124 80386 Processor Model: BIU External memory 4 memory banks (4x8=32bits)
BE0-BE3 for bank selection access byte or word or double word aligned operands: 1 bus cycle mis-aligned (not %4): 2 bus cycles

125 80386 Processor Model: CPU CPU=IU (instruction) +EU (execution) IU:
fetching & execution overlap IU: retrieval instructions from queue decode store in decoded queue EU:ALU+registers (32-bit) execute decode instructions

126 80386 Processor Model: MMU Segmentation unit Paging Unit
Real mode: generate the 20-bit physical address Protected mode: store base/size/rights in descriptor registers cache descriptor tables in RAM faster operations Paging Unit determines physical addresses associated with active segments (divided into 4K pages) virtual memory support to allow larger programs

127 80386 Programming Model General Purpose Registers
Data & Addresses Groups Status & Control Flags VM, RF, NT, IOPL Segment Group

128 80386 Programming Model Special purpose Registers

129 80386 Programming Model Memory Management segment descriptors Paging
keep base, size, access rights 3 types of tables: global (GDT), local (LDT), interrupt (IDT) addressing: index (to a table) + RPL base + offset (from instruction) Paging TLB

130 80386 Programming Model Protection (PL) Gates task: CPL
instruction: RPL data segment: DPL Gates special descriptors that allows access to higher PL tasks from lower PL tasks

131 80486 Review …

132 80486DX 1989: a polished 386, 6 new OS level instructions
virtually identical to 386 in terms of compatibility RISC design concepts fewer clock cycles per operation, a single clock cycle for most frequently used instructions Max 50MHz 5 stage execution pipeline Portions of 5 instructions execute at once

133 80486DX Highly Integrated: On board 8K memory cache FPP (equivalent to external co-processor) Twice as fast as 386 at any given clock rate 20Mhz 486 ~= 40Mhz 386

134 80486SX 80486SX NOT a 16-bit version for transition purpose
no coprocessor No internal cache For low-end applications Max. 33Mhz only

135 80486DX2/DX4: Overdrive Chips
Processor speed increased too fast Redesign of microcomputer for compatibility becomes harder Solution: Separating internal speed with external speed, improve performance independently 80486DX2/DX4 – internal clock twice/three times (NOT four times) the external clock: runs faster internally

136 80486DX2/DX4: Overdrive Chips
System board design is independent of processor upgrade (less expensive components are allowed) Processor operate at maximum speed data rate internally Only slow access to external data operates at system board rate Internal cache offset the speed gap 486DX2 66: 66 internal, 33 external 486DX4 100: 100 internal, 33 external (3x) Overdrive sockets: for upgrading 486dx/sx to 486dx2/dx4 (with overdrive socket pin-outs)

137 486 Processor Features 386 features: New features Real/Protected Modes
Memory Management PL’s registers & bus sizes New features 6 OS instructions 8K/16K onboard cache (was external before 386)

138 486 Processor Features A better 386 5 stage instruction pipeline
IF/ID/EX => PF/D1/D2/EX/WB PF: instructions => Q (2*16-bytes) D1: determine opcode D2: determine memory address of operands EX: execute indicated OP WB: update register

139 486 Processor Features Reduced Instruction Cycle Times
5 stage instruction pipeline (e.g., Fig. 3.18) instruction cycle times: 8086: 4 CLK 80386: 2 CLK 80486: 1 CLK (close to RISC) about 2X faster than 386

140 486 Processor Model: 386+FPU+Cache
386 units retained: BIU, CPU, MMU new: FPU (80387) + Cache (8K/16K) FPU: 387 onboard 0.8 u => #transistors increased (275K => 1+ millions) simplified system board design speedup FP operations

141

142 486 Processor Model: Cache
Cache (8K/16K (dx4)) Function: bridge processor memory bandwidth 8088: 4.77MHz 80486: 50MHz Pentium: 100MHz Pentium Pro: 133 MHz Main Memory (DRAM): relatively slow Fast Static RAMs (SRAM) as cache

143 486 Processor Model: Cache
Organization: 8K 4-way set associative 4 direct mapped caches wired in parallel each block maps to a set of 4 lines unified: data & code in the same cache write-through: update cache and memory page on write operations

144 486 Processor Model: Cache
locality (why caches help?) spatial locality: e.g., array of data temporal: e.g., loops in codes operations on hit/miss 128-bit cache lines 32-bit x N to catch locality (N=4) 128-bit = 16-byte

145 486 Processor Model: Cache
Mapping: memory => many-to-many => cache Data RAM: save memory data Tag RAM: save memory address information 3 methods of mapping fully associative: memory block to any cache line direct map: memory block to specific line trashing set associative: memory block to a set of cache lines

146 486 Processor Model: Cache
Replacement policy (LRU) valid bits: all 4 lines in use ? NO => use any unused line YES => find one to replace LRU bits: which is least recently used

147

148

149 Pentium Review …

150 Pentium: Superscaler Processor
available in 1992 32-bit architecture Superscaler architecture Scaling: scaling down etchable feature size to increase complexity of IC (e.g., DRAM) 10 microns/4004 to 0.13 microns (2001) Superscaler: go beyond simply scaling down Two instruction pipelines: each with own ALU, address generation circuitry, data cache interface Execute two different instructions simultaneously

151 Pentium: Superscaler Processor
Onboard cache Separate 8K data and code caches to avoid access conflicts FPP Instruction pipeline: 8 stage Optimized floating point functions 5x-10x FLOP’s of 486 2x performance of 486 at any clock rate

152 Pentium: Superscaler Processor
Compatibility with 386/486: Internal 32-bit registers and address bus Data bus expanded to 64-bits for higher data transfer rate Compare 8088 to 386sx transition

153 Pentium: Superscaler Processor
non-clone competition from AMD, Cyrix development of brand identity by Intel

154 Pentium Pro Review …

155 Pentium Pro: Two Chips in One
Became available in 1995 Superscaler of degree 3 Can execute 3 instructions simultaneously Optimized for 32-bit operating systems (e.g., Windows NT, OS2/Warp) Two separate silicon die on the same package Processor: 0.35 u, 5.5 million transistors 256KB(/512K) Level 2 cache included on chip, 15.5 million transistors in smaller area

156 Pentium Pro: Two Chips in One
On Board Level 2 cache Simplifies system board design Requires less space Gains faster communication with processor Internal (level 1) cache: 8K Pentium Pro 133 ~= 2x Pentium 66 ~= 4x 486DX2 66

157 Pentium Pro:Dynamic Execution
Dynamic execution: reduce idle processor time by predicting instruction behaviors Multiple Branch Prediction: look as far as 30 instructions ahead to anticipate program branches Data Flow Analysis: looks at upcoming instructions and determine if they are available for processing, depending on other instructions. Determine optimal execution sequences. Speculative Execution: execute instructions in different order as entered. Speculative results are stored until final states can be determined.


Download ppt "Chapter 1 Microcomputers and Microprocessors"

Similar presentations


Ads by Google