Download presentation
Presentation is loading. Please wait.
1
Chapter 1 Microcomputers and Microprocessors
Microprocessor Evolution and Performance
2
Contents Introduction to microcomputer system Microprocessor evolution
the INTEL processor family Microprocessor performance
3
Introduction to Microcomputer
An microcomputer can be interpreted as a machine with: I/O devices for Input/Output, microprocessor for processing, memory units for storage Buses for connecting the above components In 1970, a microcomputer was normally interpreted as a computer considerably smaller than a mini-computer, possibly using ROM for program storage
4
Basic hardware units Input Microprocessor Memory Output
e.g. keyboard, mouse Microprocessor e.g. 8085, 8086, mc68000 microprocessors Memory e.g. RAM, hard disk Output e.g. monitor, printer
5
Buses Buses: External connections to input/output unit Major Buses:
Address bus: address of memory locations containing instructions or data Data bus: contents of memory locations Control Bus: synchronization and handshaking between components
6
General Architecture Memory Unit Primary memory Secondary memory
Microprocessing unit Input unit Output unit
7
Processor History Vacuum Tubes to IC’s
8
First Generation Computers
Vacuum tube technology Large room, air-conditioned Tube life-time: 3,000 hours Useless Machine? 1951: 1st Univac I (UNIVersal Automatic Computer) delivered 1952: Prediction of presidential election by CBS 1952: IBM Model 710 Data Processing System
9
Second Generation Computers
The Transistor Is Born (Solid-State Era) 1948: invention of bipolar transistors 1956: Nobel physics award: Drs. William Shockley, John Bardeen and Walter H. Brattain (Bell Labs) 1954: Bell Labs: all-transistorized computer (TRADIC) 800 transistors Much less heat More reliable and less costly
10
Second Generation Computers
Mainframe Computers 1958: IBM’s 1st transistorized computer 7070/7090 1959: 1401 (business-oriented model) Built on circuit boards mounted into rack panels, or frames Main frame (mainframe): the CPU portion of the computer Popular with business and industry
11
Third Generation Computers
Invention of IC: 1959 Dr. Robert Noyce (Fairchild) and Jack Kilby (TI) Kilby: fabricating resistors, capacitors and transistors on a germanium wafer, and connecting these parts with fine gold wires Noyce: isolating individual components with reverse-biased diodes, and deposing an adherent metal film over the circuit, thus connecting the components 1st IC: 2-transistor multivibrator By mid 1960s: memory chips with 1,000 components are common
12
Third Generation Computers
1964: IBM 360 Series (32-bit) The first to use IC technology A family of 6 compatible computers 40 different I/O and auxiliary storage devices Memory capacity: 16K words to over 1MB. 32-bit registers x 16 24-bit address bus 128-bit data bus
13
Third Generation Computers
1964: IBM 360 Series (32-bit) 375,000 computations per second (<< 150 mips Pentium 100) $5 billion development cost IBM became the leading mainframe company
14
Minicomputer 1960s: Space Race between US & USSR IC industry boom
A tremendous demand by scientists and engineers for an inexpensive computer that they could operate by themselves 1965: DEC PDP-8 (by Edson de Castro’s group) Low-cost ($25,000) minicomputer 12-bit 16-bit PDP-11 Supermini …
15
Microprocessors: CPU on a Chip
1968: INTEL (Integrated Electronics) Founded by Robert Noyce and Gordon Moore (Fairchild) Original goals: semiconductor memory market 1969: customized IC’s for Busicom for calculator Ted Hoff and Stan Mazor: proposed 4-bit CPU on a single chip, plus ROM, RAM chips
16
Microprocessors: CPU on a Chip
1971: 4000 Family By Fredrico Faggin 4001: 2K ROM with 4-bit I/O port 4002: 320-bit RAM, 4-bit output port 4003: 10-bit serial-in parallel-out shift register 4004: 4-bit processor Processor-on-a-chip: Micro-processor era
17
Microprocessors: CPU on a Chip
1972: 8008, 8-bit 1974: 8080, an improved version
18
Microprocessors: CPU on a Chip
8-bit CPUs 16-bit address (64K) MC6800: Motorola 6502: MOS Technology (spin-off from Motorola) Apple-II, Apple DOS Z-80: Zilog (spin-off from Intel) Z-80 cards on Apple-II, CP/M
19
Microprocessors: CPU on a Chip
16-bit CPUs (Late 1970s) 8086, 80186, 80286: Intel PC, PC-DOS, MS-DOS, SCO-Unix MC68000: Motorola 16-bit instructions Hardware multiply and divide 20-bit address buses (1MB) Workstations: Sun3
20
Microprocessors: CPU on a Chip
32-bit CPUs 80386, 80486: Intel MC68020, 68030: Motorola 64-bit CPUs Pentium, Pentium Pro (64-bit external data bus, 32-bit internal registers, not recognized as 64-bit CPUs in terms of internal register word length)
21
Microcomputers: Computers Based on Microprocessors
1975: MITS Altair 8800 (Kit) $399, i8080, programmed by depositing 1s/0s via front panel switches Other Computers boom 8080: MITS, … 6800: SWTPC 6800, … Z-80: TRS-80, … 6502: Apple I, 8K, programmed with BASIC Steve Jobs & Steve Wozniak, millionaires from PC COM’s …
22
Personal Computers: the Open Architecture Era
1982: IBM PC A system board (mother board) Intel 8088 processor 16K memory 5 expansion slots Third-party vendors to supply various IO adapter cards Open architecture Computer with interchangeable components
23
Micro-controllers: Microcomputers on a Chip
Microcontroller: a computer on a chip Microprocessor, plus On-chip memory, plus Input/output ports 1995: microcontrollers out sold microprocessors 10:1 embedded on various equipments: Thermostat, machine tools, communication, automotive, … Evolution: getting greater IO capabilities Intel: MCS-51, MCS-96, …
24
High-Performance Processors
Supercomputers Aircraft design, global climate modeling, oil-bearing formation, molecular design of new drugs, financial behavior CDC6600, 7600: Seymour Cray Cray-1: 1976, the first true supercomputer ECL, 128 KW power consumption 130 MFLOPS (Pentium 100: 150 MFLOPS) $5.1 million
25
High-Performance Processors
Parallel Processors Tens of gigaflops Multi-processors wired by a common bus Each is given a portion of the problem to solve Hypercube: early 1980s Cosmic Cube, iPSC (with i860/RISC chips) 2D rectangular Mesh architecture: multiple processor at each node Intel: teraflops computer with 4500 nodes, each powered by 2 Pentium Pro 200.
26
RISC vs. CISC RISC: Reduced Instruction Set Computer (1980s)
A small number of fixed-length instructions Simple addressing modes A large number of registers Instructions executed in one clock cycle Intel i860 (“Cray on a Chip”) 82 instructions, 32-bit long each Four addressing modes 32 general-purpose registers
27
RISC vs. CISC CISC: Complex Instruction Set Computer Intel 8086
A large number of variable length instructions Multiple addressing modes A small number of registers Multiple number of clock cycles to execute Intel 8086 Over 3000 instruction forms, 1-6 bytes 9 addressing modes 8 general-purpose registers Execution from 2 to 80+ cycles
28
RISC vs. CISC RISC Control unit is much simpler (simpler instructions, execution in 1 CLK) Faster execution with less total on-chip logic Chip area: 10% (vs 50% for CISC) More area for register file, data and instruction caches, FPU, and co-processor PowerPC: 32-bit, by IBM, Apple, Motorola Sparc: for SunMicro workstations
29
Application-Specific Processors
DSP Chips Mostly for analog signal processing ADC-DSP-DAC architecture Avoid processing analog signals using discrete circuits, involving capacitors and inductance DSP: conduct complex mathematic functions Digital filter, spectrum analysis
30
Application-Specific Processors
DSP Chip Architecture Different data/program areas: Harvard Architecture Hardware multipliers and adders, optimized to execute on a single cycle Arithmetic pipelining: several instructions operated at once Hardware loop control Multiple IO ports for communication with other processors
31
Summary of Processor History
1940s: Vacuum tube, large and consuming large power 1950s: Transistor (1948-) 1959: First IC (second industrial revolution) 1960s: IC was popular to build CPU’s. 1971: Intel 4004 microprocessor (2300 transistors) Starts of the microprocessor age Late 1970’s: 8080/85
32
Summary of Processor History
1980: RISC (reduced instruction set computer) CISC (complicated instruction set computer) vs. RISC CISC family: Intel 80x86, Pentium; Motorola series All others are RISC series.
33
Evolution of INTEL Processors
4004 (’71)-Pentium Pro (’93-)
34
INTEL Integrated Electronics Evolution:
1968: founded by Robert Noyce and Gordon Moore IA: Intel Architecture (e.g, IA-16, IA-32, IA-64) since 8008 (’72) had became the de facto standard Evolution: Internal register sizes External bus widths Real, Protected, and Virtual 8086 modes
35
4-bit Processors 4004 first microprocessor became available in 1971
4-bit microprocessor: 4-bit registers & 4-bit data bus #transistors: 2250 Min. feature size: 10 microns Address bus: 10 bits/1K 0.06 MIPS MHz) No internal cache
36
8-bit Processors 8008, 8080, 8085 became available in 1974
8-bit microprocessor
37
8086: IA standard Became available in 1978
16-bit data bus 20-bit address bus (was 16-bit for 8080) memory organization: 16 segments of 64KB (1 MB limit) Re-organize CPU into BIU (bus interface unit) and EU (execution unit) Allow fetch and execution simultaneously Internal register expanded to 16-bit Allow access of low/high byte separately
38
8086 Hardware multiply and divide instructions
External math co-processor Instruction set compatible with 8080/8085 8086: defined the 80x86 architecture
39
8086 Not quite successful 16-bit data bus: Requires two separate 8-bit memory banks Memory chips were expensive
40
8088: PC standard Became available in 1979, almost identical to 8086
8-bit data bus: for hardware compatibility with 8080 16-bit internal registers and data bus (same as 8086) 20-bit address bus (was 16-bit for 8080) BIU re-designed memory organization: 16 segments of 64KB (1 MB limit) Two memory accesses for 16-bit data (less efficient) But less cost 8088: used by IBM PC (1982), 16K-64K, 4.77MHz
41
80186, 80188: High Integration CPU PC system:
8088 CPU + various supporting chips Clock generator 8251: serial IO (RS232) 8253: timer/counter 8255: PPI (programmable periphial interface) 8257: DMA controller 8259: interrupt controller 80186/80188: 8086/ supporting functions Compatible instruction set (+ 9 new instructions)
42
80286 Became available in 1982 used in IBM AT computer (1984)
16-bit data bus clock speed 25% faster than 8088, throughput 5 times greater than 8088 24-bit address bus (16 MB) (vs. 20-bit/1M 8086)
43
80286: Real vs. Protected Modes
Larger address space: 24-bit address bus Real Mode vs. Protected Mode Real Mode: Power on default mode Function like a 8086: use 20-bit least significant address lines (1M) Software compatible with 286 16 new instructions (for Protected Mode management) Faster 286: redesigned processor, plus higher clock rate (6-8MHz)
44
80286: Real vs. Protected Modes
Multi-program environment Each program has a predetermined amount of memory Addressed via segment selector (physical addresses invisible): 16M addressable Multiple programs loaded at once (within their respective segments), protected from read/write by each other
45
80286: Real vs. Protected Modes
Cannot be switch back to real mode to avoid illegal access by switching back and forth between modes A faster 8086 only? MS-DOS requires that all programs be run in Real Mode
46
Clock Speed Electrical signals cannot change instantaneously (transition period required) System clock provides timing signal for synchronization Cannot be used to compare the performance of microprocessors with different instruction sets e.g., a 66 MHz Pentium is twice as fast as a 66 MHz 80486
47
80386DX (aka. 80386) available in 1985, a major redesign of 86/286
Compatibility commitment through 2000 32-bit data and address buses (4 GB memory) Real Address Mode: 1M visible, 286 real mode Protected Virtual Address Mode: On board MMU Segmented tasks of 1byte to 4G bytes Segment base, limit, attributes defined by a descriptor register Page swapping: 4K pages, up to 64TB virtual memory space Windows, OS/2, Unix/Linux
48
80386DX (aka ) Virtual 8086 mode (a special Protected mode feature): permitted multiple 8086 virtual machines-multitasking (similar to real mode) Windows (multiple MSDOS’s) Clock rate: max. 40MHz, 2 pulses per R/W bus cycle External memory cache to avoid wait Fast SRAM 93% hit rate with 64K cache Compatible instructions (14 new)
49
80386SX 80386SX: (for transition to 32-bit)
16-bit data bus/32-bit register 24-bit address bus
50
80486DX 1989: a polished 386, 6 new OS level instructions
virtually identical to 386 in terms of compatibility RISC design concepts fewer clock cycles per operation, a single clock cycle for most frequently used instructions Max 50MHz 5 stage execution pipeline Portions of 5 instructions execute at once
51
80486DX Highly Integrated: On board 8K memory cache FPP (equivalent to external co-processor) Twice as fast as 386 at any given clock rate 20Mhz 486 ~= 40Mhz 386
52
80486SX 80486SX NOT a 16-bit version for transition purpose
no coprocessor No internal cache For low-end applications Max. 33Mhz only
53
80486DX2/DX4: Overdrive Chips
Processor speed increased too fast Redesign of microcomputer for compatibility becomes harder Solution: Separating internal speed with external speed, improve performance independently 80486DX2/DX4 – internal clock twice/three times (NOT four times) the external clock: runs faster internally
54
80486DX2/DX4: Overdrive Chips
System board design is independent of processor upgrade (less expensive components are allowed) Processor operate at maximum speed data rate internally Only slow access to external data operates at system board rate Internal cache offset the speed gap 486DX2 66: 66 internal, 33 external 486DX4 100: 100 internal, 33 external (3x) Overdrive sockets: for upgrading 486dx/sx to 486dx2/dx4 (with overdrive socket pin-outs)
55
Pentium: Superscaler Processor
available in 1992 32-bit architecture Superscaler architecture Scaling: scaling down etchable feature size to increase complexity of IC (e.g., DRAM) 10 microns/4004 to 0.13 microns (2001) Superscaler: go beyond simply scaling down Two instruction pipelines: each with own ALU, address generation circuitry, data cache interface Execute two different instructions simultaneously
56
Pentium: Superscaler Processor
Onboard cache Separate 8K data and code caches to avoid access conflicts FPP Instruction pipeline: 8 stage Optimized floating point functions 5x-10x FLOP’s of 486 2x performance of 486 at any clock rate
57
Pentium: Superscaler Processor
Compatibility with 386/486: Internal 32-bit registers and address bus Data bus expanded to 64-bits for higher data transfer rate Compare 8088 to 386sx transition
58
Pentium: Superscaler Processor
non-clone competition from AMD, Cyrix development of brand identity by Intel
59
Pentium Pro: Two Chips in One
Became available in 1995 Superscaler of degree 3 Can execute 3 instructions simultaneously Optimized for 32-bit operating systems (e.g., Windows NT, OS2/Warp) Two separate silicon die on the same package Processor: 0.35 u, 5.5 million transistors 256KB(/512K) Level 2 cache included on chip, 15.5 million transistors in smaller area
60
Pentium Pro: Two Chips in One
On Board Level 2 cache Simplifies system board design Requires less space Gains faster communication with processor Internal (level 1) cache: 8K Pentium Pro 133 ~= 2x Pentium 66 ~= 4x 486DX2 66
61
Pentium Pro:Dynamic Execution
Dynamic execution: reduce idle processor time by predicting instruction behaviors Multiple Branch Prediction: look as far as 30 instructions ahead to anticipate program branches Data Flow Analysis: looks at upcoming instructions and determine if they are available for processing, depending on other instructions. Determine optimal execution sequences. Speculative Execution: execute instructions in different order as entered. Speculative results are stored until final states can be determined.
62
What’s More from Moore’s Law?
Processor Future What’s More from Moore’s Law?
63
Moore's Law In 1965, Gordon Moore predicted that:
“The number of transistors per integrated circuit would double every 18 months” He forecast that this trend would continue through 1975
64
Moore’s Law
65
Other Microprocessors
Motorola family from 6809 (Apple II) through 68040 PowerPC joint venture between Apple, IBM, and Motorola RISC Processors DEC Alpha, MIPS, Sun SPARC, etc.
66
CISC vs. RISC CISC (Complex Instruction Set Computer)
CISC processors have a large versatile instruction set that supports many complex addressing modes move complexity from software to hardware RISC (Reduced Instruction Set Computer) RISC processors have a small instruction set move complexity from hardware to software
67
Microprocessor Performance
Two main factors: Respond time the time between the start and completion of a task, also referred to as execution time Throughput the total amount of work done in a given time
68
MIPS Million Instructions Per Second
MIPS = (Instruction count) / (Execution time in micro second X 106) It specifies performance inversely to execution time Faster machines have a higher MIPS rating
69
Some Problems of MIPS Cannot compare computers with different instruction sets, since the instruction count will certainly differ MIPS varies between programs on the same computer
70
iCOMP An index provided by Intel for comparison of performance of their 32-bit microprocessors Based on a variety of performance components that represent integer mathematics, graphics, etc. Combine results of a set of software application benchmarks
72
Chapter 2 Computer Codes, Programming, and Operating Systems
Number Systems Computer Codes Programming Operating Systems
73
Number Systems Decimal: Base 10 Binary: Base 2 Octal: Base 8
Hexadecimal: Base 16
74
Base Conversion: 210 Binary to Decimal Decimal to Binary
D = i=0,n-1 bi x 2i Decimal to Binary Repeated subtraction D’ = i=0,m-1 bi x 2i = D - 2m (bm=1) D <= D’ & m <= m’ (m’: max exp. s.t. (bm’=1) Long division D’ = D/2 … bi & D <= D’
76
MCS-51 Program Development
.SDT Symbol Converter ICE (CVTSYM) Program .SYM Editor Assembler Linker .ASM .OBJ .HEX (X8051) (Link) Target
77
Chapter 3 80x86 Processor Architecture
8086/88 Segmented Memory 80386 80486 Pentium Pentium Pro
78
Processor Model Programming Model
The 8086 and 8088 Processor Model Programming Model
79
8086: IA standard Became available in 1978
16-bit data bus 20-bit address bus (was 16-bit for 8080) memory organization: 16 segments of 64KB (1 MB limit) Re-organize CPU into BIU (bus interface unit) and EU (execution unit) Allow fetch and execution simultaneously Internal register expanded to 16-bit Allow access of low/high byte separately
80
8088: PC standard Became available in 1979, almost identical to 8086
8-bit data bus: for hardware compatibility with 8080 16-bit internal registers and data bus (same as 8086) 20-bit address bus (was 16-bit for 8080) BIU re-designed memory organization: 16 segments of 64KB (1 MB limit) Two memory accesses for 16-bit data (less efficient) But less cost 8088: used by IBM PC (1982), 16K-64K, 4.77MHz
81
80186, 80188: High Integration CPU PC system:
8088 CPU + various supporting chips Clock generator 8251: serial IO (RS232) 8253: timer/counter 8255: PPI (programmable periphial interface) 8257: DMA controller 8259: interrupt controller 80186/80188: 8086/ supporting functions Compatible instruction set (+ 9 new instructions)
82
8086 Processor Model: BIU+EU
Memory & IO address generation EU Receive codes and data from BIU Not connected to system buses Execute instructions Save results in registers, or pass to BIU to memory and IO
83
8086 Processor Model EU BIU BH BL AH AL DH DL CH CL BP DI SI SP CS ES
Address Generation and Bus Control EU BIU BH BL AH AL DH DL CH CL BP DI SI SP CS ES SS DS IP Instruction Queue ALU Flags
84
Fetch and Execution Cycle
BIU+EU allows the fetch and execution cycle to overlap 0. System boot, Instruction Queue is empty 1. IP =>BIU=> address bus && IP++ 2. Mem[(IP-1)] => Instruction Queue[tail++] 3a. InstrQ[head] => EU => execution 3b. Mem[IP++] => InstrQ[tail++] Maybe multiple instructions Repeat 3a+3b (overlapped)
85
Waiting Conditions: Memory Access
BIU+EU: execute (almost) continuously without waiting Waiting Conditions: Accessing memory locations not in queue BIU suspend instruction fetch Issues external memory address Resumes instruction fetch and execution
86
Waiting Conditions: Jump
Next Jump Instruction Instructions in queue are discarded EU wait for the next instruction after the jump location to be fetched by BIU Resume execution
87
Waiting Conditions: Long Instructions
Long Instruction is being executed Instruction Full BIU waits Resume instruction fetch after EU pull one or tow bytes from queue
88
BIU: 8088 vs. 8086 BIU is the major difference 8088:
data bus: 8-bit (vs. 16-bit/8086) Instruction queue: 4 bytes (vs. 6-byte/8086) Only 30% slower than 8086 If queue is kept full
89
8086 Programming Model BH BL AH AL DH DL CH CL BP DI SI SP CS ES SS DS
IP Flags H Flags L
90
8086 Programming Model Data Group: AX (AH+AL): Accumulator
BX (BH+BL): Base CX (CH+CL): Counter DX (DH+DL): Data
91
8086 Programming Model Segment Group: Segment Registers:
CS: Code Segment DS: Data Segment ES: Extra Segment SS: Stack Segment Segment Registers: Base address to particular segments
92
8086 Programming Model Pointer/Index Group: Index Registers:
IP: Instruction Pointer CS SI: Source IndexDS DI: Destination IndexES SP: Stack PointerSS Index Registers: Index (offset) or Pointer to a Base address
93
8086 Flag Word Flag L: CF= 0:No Carry (Add) or Borrow (SUB)
SF ZF X AF X PF X CF CF: Carry Flag CF= 0:No Carry (Add) or Borrow (SUB) CF= 1:high-order bit Carry/Borrow PF: (Even) Parity Flag (even number of 1’s in low-order 8 bits of result) AF: Aux. Carry: Carry/Borrow on bit 3 (Low nibble of AL) ZF: Zero Flag: (1: result is zero) SF: Sign Flag: (0: positive, 1: negative)
94
8086 Flag Word Flag H: X X X X OF DF IF TF TF: Trap flag (single-step after next instruction; clear by single-step interrupt) IF: Interrupt-Enable: enable maskable interrupts DF: Direction flag: auto-decrement (1) or increment(0) index on string operations OF: Overflow: signed result cannot be expressed within #bits in destination operand
95
Segmented Memory Linear vs. Segmented Linear Addressing: Segmented:
The entire memory is regarded as a whole the entire memory space is available all the time Segmented: memory is divided into segments Process is limited to access designated segments at a given time
96
8086 Memory Organization Even and Odd Memory Banks
16-bit data bustwo-byte / two one-byte access Allows processor to work on bytes or on words (16-bit) IO operations are normally conducted in bytes Can handle odd-length instructions Single byte instructions Multiple byte (and very long) instructions
97
8086 Memory Organization Memory Space: Memory Banks 20-bit address bus
Linearly, 1M bytes directly addressable Memory Banks Can read 16-bit data (512K words) from even and odd-addressed simultaneously need Two memory banks in parallel BHE control line: allows addressing even/odd banks or both
98
Memory Organization: Alignment
Endianess: One way to model multi-byte CPU register AX AH+AL Two ways to store operands in memory Big-endian CPU: (IBM370, M68*, Sparc) High-order-byte-first (HOBF) Maps highest-order byte of internal registerlowest (1st) memory byte address Operand addressaddress of MSB MOV R1, N N: 1st byte in memory & MSB of register
99
Memory Organization: Alignment
Little-endian CPU: (DEC, Intel) Low-order-byte-first (LOBF) Maps lowest-order byte of register 1st memory byte Operand address address of LSB (1st memory byte) MOV AX, N N: 1st byte in memory & LSB of register ALN, AHN+1 Configurable: Can switch between Big/Little-endian, or Provide instructions which convert 16-/32-bit data between two byte ordering (80486)
100
8086 Memory Organization Aligned operand Mis-aligned words:
Operand aligned at even-byte (word/dword) boundaries Allows single access to read/write one operand Through internal shift/swap mechanism, if necessary Mis-aligned words: Word operand not start at even address Need 2 read cycles to read/write the word (8086) Issues two addresses to access the two even-aligned words containing the operand in order to access the operand slower but transparent to programmer
101
8086 Memory Organization 8088 always 2 cycles for word operations
Aligned or not Because of 8-bit external data bus Single memory bank is sufficient
102
8086 Memory Map Memory Map: How memory space is allocated
ROM Area: boot, BIOS RAM: OS/User Apps & data Unused Reserved: for future hardware/software uses Dedicated: for specific system interrupt and rest functions, etc.
103
Segment Registers 64K memory segments x 16 16-bit offset each
CS, DS, ES, SS
104
Logical and Physical Addresses
Physical: 20-bit Logical: 16-bit 16-byte segment boundaries Address Translation E.g., CS:IP
105
80286 First with Protection Mode Review of 286 Protected Mode … Next
106
80286 Became available in 1982 used in IBM AT computer (1984)
16-bit data bus clock speed 25% faster than 8088, throughput 5 times greater than 8088 24-bit address bus (16 MB) (vs. 20-bit/1M 8086)
107
80286: Real vs. Protected Modes
Larger address space: 24-bit address bus Real Mode vs. Protected Mode Real Mode: Power on default mode Function like a 8086: use 20-bit least significant address lines (1M) Software compatible with 286 16 new instructions (for Protected Mode management) Faster 286: redesigned processor, plus higher clock rate (6-8MHz)
108
80286: Real vs. Protected Modes
Multi-program environment Each program has a predetermined amount of memory Addressed via segment selector (physical addresses invisible): 16M addressable Multiple programs loaded at once (within their respective segments), protected from read/write by each other
109
80286: Real vs. Protected Modes
Cannot be switch back to real mode to avoid illegal access by switching back and forth between modes A faster 8086 only? MS-DOS requires that all programs be run in Real Mode
110
80386 Model Refine 286 Protect Mode Expand to 32-bit registers
New Virtual 8086 Mode
111
80386 Review
112
80386DX (aka. 80386) available in 1985, a major redesign of 86/286
Compatibility commitment through 2000 32-bit data and address buses (4 GB memory) Real Address Mode: 1M visible, 286 real mode Protected Virtual Address Mode: On board MMU Segmented tasks of 1byte to 4G bytes Segment base, limit, attributes defined by a descriptor register Page swapping: 4K pages, up to 64TB virtual memory space Windows, OS/2, Unix/Linux
113
80386DX (aka ) Virtual 8086 mode (a special Protected mode feature): permitted multiple 8086 virtual machines-multitasking (similar to real mode) Windows (multiple MSDOS’s) Clock rate: max. 40MHz, 2 pulses per R/W bus cycle External memory cache to avoid wait Fast SRAM 93% hit rate with 64K cache Compatible instructions (14 new)
114
80386SX 80386SX: (for transition to 32-bit)
16-bit data bus/32-bit register 24-bit address bus
115
80386: Real vs. Protected Modes
Larger address space: 32-bit address bus (4G) Real Mode vs. Protected Mode (refined from 286) Real Mode: Power on default mode Function like a 8086: (1) use only 20-bit least significant address lines (1M) (2) segmented memory retained (64K) Software compatible with 286 New Real Mode Features: access to 32-bit register set two new segments: F, G
116
80386: Real vs. Protected Modes
new addressing mechanism vs. real mode supports protection levels segment size: 1 to 4G (not 64K, fixed) segment register: pointer to a descriptor table not base address
117
80386: Real vs. Protected Modes
descriptor table: (8 byte per entry) 32-bit base address of segment segment size access rights memory address = base address (in table) + offset (in instruction)
118
80386: Real vs. Protected Modes
Paging mechanism: map 32-bit linear address (base+offset) =>physical address & page frame address (4K page frames in system memory) 64TB of virtual memory
119
80386: Real vs. Protected Modes
Protection mechanism: tasks/data/instructions are assigned a privilege level (PL) tasks running at lower PL cannot access tasks or data segments at a higher PL running programs that are protected from the others
120
80386: Real vs. Protected Modes
Two Ways to Run 8086 Programs: Real Mode Virtual 8086 Mode Virtual 8086 Mode: runs multiple 8086+other 386 (protected mode) programs independently each sees 1 MB (mapped via paging to anywhere in 4GB space) running V8086+ Protected mode simultaneously
121
386 80386 Processor Model
122
80386 Processor Model: BIU+CPU+MMU
control 32-bit address and data buses keep instruction queue full (16 bytes) Address pipelining address of next memory location is output halfway through current bus cycle more address decode time slower memory chip is OK easier to keep up with faster (2 CLK) bus cycle of 386
123
80386 Processor Model: BIU dynamic data bus sizing
switch between 16-/32-bit data bus on the fly accommodate to external 16-bit memory cards or IO devices adjust bus timing to use only the least significant 16 bits
124
80386 Processor Model: BIU External memory 4 memory banks (4x8=32bits)
BE0-BE3 for bank selection access byte or word or double word aligned operands: 1 bus cycle mis-aligned (not %4): 2 bus cycles
125
80386 Processor Model: CPU CPU=IU (instruction) +EU (execution) IU:
fetching & execution overlap IU: retrieval instructions from queue decode store in decoded queue EU:ALU+registers (32-bit) execute decode instructions
126
80386 Processor Model: MMU Segmentation unit Paging Unit
Real mode: generate the 20-bit physical address Protected mode: store base/size/rights in descriptor registers cache descriptor tables in RAM faster operations Paging Unit determines physical addresses associated with active segments (divided into 4K pages) virtual memory support to allow larger programs
127
80386 Programming Model General Purpose Registers
Data & Addresses Groups Status & Control Flags VM, RF, NT, IOPL Segment Group
128
80386 Programming Model Special purpose Registers
129
80386 Programming Model Memory Management segment descriptors Paging
keep base, size, access rights 3 types of tables: global (GDT), local (LDT), interrupt (IDT) addressing: index (to a table) + RPL base + offset (from instruction) Paging TLB
130
80386 Programming Model Protection (PL) Gates task: CPL
instruction: RPL data segment: DPL Gates special descriptors that allows access to higher PL tasks from lower PL tasks
131
80486 Review …
132
80486DX 1989: a polished 386, 6 new OS level instructions
virtually identical to 386 in terms of compatibility RISC design concepts fewer clock cycles per operation, a single clock cycle for most frequently used instructions Max 50MHz 5 stage execution pipeline Portions of 5 instructions execute at once
133
80486DX Highly Integrated: On board 8K memory cache FPP (equivalent to external co-processor) Twice as fast as 386 at any given clock rate 20Mhz 486 ~= 40Mhz 386
134
80486SX 80486SX NOT a 16-bit version for transition purpose
no coprocessor No internal cache For low-end applications Max. 33Mhz only
135
80486DX2/DX4: Overdrive Chips
Processor speed increased too fast Redesign of microcomputer for compatibility becomes harder Solution: Separating internal speed with external speed, improve performance independently 80486DX2/DX4 – internal clock twice/three times (NOT four times) the external clock: runs faster internally
136
80486DX2/DX4: Overdrive Chips
System board design is independent of processor upgrade (less expensive components are allowed) Processor operate at maximum speed data rate internally Only slow access to external data operates at system board rate Internal cache offset the speed gap 486DX2 66: 66 internal, 33 external 486DX4 100: 100 internal, 33 external (3x) Overdrive sockets: for upgrading 486dx/sx to 486dx2/dx4 (with overdrive socket pin-outs)
137
486 Processor Features 386 features: New features Real/Protected Modes
Memory Management PL’s registers & bus sizes New features 6 OS instructions 8K/16K onboard cache (was external before 386)
138
486 Processor Features A better 386 5 stage instruction pipeline
IF/ID/EX => PF/D1/D2/EX/WB PF: instructions => Q (2*16-bytes) D1: determine opcode D2: determine memory address of operands EX: execute indicated OP WB: update register
139
486 Processor Features Reduced Instruction Cycle Times
5 stage instruction pipeline (e.g., Fig. 3.18) instruction cycle times: 8086: 4 CLK 80386: 2 CLK 80486: 1 CLK (close to RISC) about 2X faster than 386
140
486 Processor Model: 386+FPU+Cache
386 units retained: BIU, CPU, MMU new: FPU (80387) + Cache (8K/16K) FPU: 387 onboard 0.8 u => #transistors increased (275K => 1+ millions) simplified system board design speedup FP operations
142
486 Processor Model: Cache
Cache (8K/16K (dx4)) Function: bridge processor memory bandwidth 8088: 4.77MHz 80486: 50MHz Pentium: 100MHz Pentium Pro: 133 MHz Main Memory (DRAM): relatively slow Fast Static RAMs (SRAM) as cache
143
486 Processor Model: Cache
Organization: 8K 4-way set associative 4 direct mapped caches wired in parallel each block maps to a set of 4 lines unified: data & code in the same cache write-through: update cache and memory page on write operations
144
486 Processor Model: Cache
locality (why caches help?) spatial locality: e.g., array of data temporal: e.g., loops in codes operations on hit/miss 128-bit cache lines 32-bit x N to catch locality (N=4) 128-bit = 16-byte
145
486 Processor Model: Cache
Mapping: memory => many-to-many => cache Data RAM: save memory data Tag RAM: save memory address information 3 methods of mapping fully associative: memory block to any cache line direct map: memory block to specific line trashing set associative: memory block to a set of cache lines
146
486 Processor Model: Cache
Replacement policy (LRU) valid bits: all 4 lines in use ? NO => use any unused line YES => find one to replace LRU bits: which is least recently used
149
Pentium Review …
150
Pentium: Superscaler Processor
available in 1992 32-bit architecture Superscaler architecture Scaling: scaling down etchable feature size to increase complexity of IC (e.g., DRAM) 10 microns/4004 to 0.13 microns (2001) Superscaler: go beyond simply scaling down Two instruction pipelines: each with own ALU, address generation circuitry, data cache interface Execute two different instructions simultaneously
151
Pentium: Superscaler Processor
Onboard cache Separate 8K data and code caches to avoid access conflicts FPP Instruction pipeline: 8 stage Optimized floating point functions 5x-10x FLOP’s of 486 2x performance of 486 at any clock rate
152
Pentium: Superscaler Processor
Compatibility with 386/486: Internal 32-bit registers and address bus Data bus expanded to 64-bits for higher data transfer rate Compare 8088 to 386sx transition
153
Pentium: Superscaler Processor
non-clone competition from AMD, Cyrix development of brand identity by Intel
154
Pentium Pro Review …
155
Pentium Pro: Two Chips in One
Became available in 1995 Superscaler of degree 3 Can execute 3 instructions simultaneously Optimized for 32-bit operating systems (e.g., Windows NT, OS2/Warp) Two separate silicon die on the same package Processor: 0.35 u, 5.5 million transistors 256KB(/512K) Level 2 cache included on chip, 15.5 million transistors in smaller area
156
Pentium Pro: Two Chips in One
On Board Level 2 cache Simplifies system board design Requires less space Gains faster communication with processor Internal (level 1) cache: 8K Pentium Pro 133 ~= 2x Pentium 66 ~= 4x 486DX2 66
157
Pentium Pro:Dynamic Execution
Dynamic execution: reduce idle processor time by predicting instruction behaviors Multiple Branch Prediction: look as far as 30 instructions ahead to anticipate program branches Data Flow Analysis: looks at upcoming instructions and determine if they are available for processing, depending on other instructions. Determine optimal execution sequences. Speculative Execution: execute instructions in different order as entered. Speculative results are stored until final states can be determined.
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.