Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 1 Microcomputers and Microprocessors Microprocessor Evolution and Performance.

Similar presentations


Presentation on theme: "Chapter 1 Microcomputers and Microprocessors Microprocessor Evolution and Performance."— Presentation transcript:

1 Chapter 1 Microcomputers and Microprocessors Microprocessor Evolution and Performance

2 Contents zIntroduction to microcomputer system zMicroprocessor evolution ythe INTEL processor family zMicroprocessor performance

3 Introduction to Microcomputer zAn microcomputer can be interpreted as a machine with: yI/O devices for Input/Output, ymicroprocessor for processing, ymemory units for storage yBuses for connecting the above components zIn 1970, a microcomputer was normally interpreted as a computer considerably smaller than a mini-computer, possibly using ROM for program storage

4 Basic hardware units zInput ye.g. keyboard, mouse zMicroprocessor ye.g. 8085, 8086, mc68000 microprocessors zMemory ye.g. RAM, hard disk zOutput ye.g. monitor, printer

5 Buses zBuses: External connections to input/output unit zMajor Buses: yAddress bus: address of memory locations containing instructions or data yData bus: contents of memory locations yControl Bus: synchronization and handshaking between components

6 General Architecture Input unit Microprocessing unit Output unit Secondary memory Primary memory Memory Unit

7 Processor History Vacuum Tubes to IC’s

8 First Generation Computers zVacuum tube technology yLarge room, air-conditioned yTube life-time: 3,000 hours zUseless Machine? y1951: 1 st Univac I (UNIVersal Automatic Computer) delivered y1952: Prediction of presidential election by CBS y1952: IBM Model 710 Data Processing System

9 Second Generation Computers zThe Transistor Is Born (Solid-State Era) y1948: invention of bipolar transistors x1956: Nobel physics award: Drs. William Shockley, John Bardeen and Walter H. Brattain (Bell Labs) y1954: Bell Labs: all-transistorized computer (TRADIC) x800 transistors xMuch less heat xMore reliable and less costly

10 Second Generation Computers zMainframe Computers y1958: IBM’s 1 st transistorized computer 7070/7090 y1959: 1401 (business-oriented model) yBuilt on circuit boards mounted into rack panels, or frames yMain frame (mainframe): the CPU portion of the computer yPopular with business and industry

11 Third Generation Computers zInvention of IC: 1959 yDr. Robert Noyce (Fairchild) and Jack Kilby (TI) yKilby: fabricating resistors, capacitors and transistors on a germanium wafer, and connecting these parts with fine gold wires yNoyce: isolating individual components with reverse- biased diodes, and deposing an adherent metal film over the circuit, thus connecting the components y1 st IC: 2-transistor multivibrator yBy mid 1960s: memory chips with 1,000 components are common

12 Third Generation Computers z1964: IBM 360 Series (32-bit) yThe first to use IC technology xA family of 6 compatible computers y40 different I/O and auxiliary storage devices yMemory capacity: 16K words to over 1MB. y32-bit registers x 16 y24-bit address bus y128-bit data bus

13 Third Generation Computers z1964: IBM 360 Series (32-bit) y375,000 computations per second x(<< 150 mips Pentium 100) y$5 billion development cost zIBM became the leading mainframe company

14 Minicomputer z1960s: Space Race between US & USSR yIC industry boom yA tremendous demand by scientists and engineers for an inexpensive computer that they could operate by themselves y1965: DEC PDP-8 (by Edson de Castro’s group) xLow-cost ($25,000) minicomputer x12-bit x16-bit PDP-11 ySupermini …

15 Microprocessors: CPU on a Chip z1968: INTEL (Integrated Electronics) y Founded by Robert Noyce and Gordon Moore (Fairchild) yOriginal goals: semiconductor memory market y1969: customized IC’s for Busicom for calculator yTed Hoff and Stan Mazor: proposed 4-bit CPU on a single chip, plus ROM, RAM chips

16 Microprocessors: CPU on a Chip z1971: 4000 Family yBy Fredrico Faggin y4001: 2K ROM with 4-bit I/O port y4002: 320-bit RAM, 4-bit output port y4003: 10-bit serial-in parallel-out shift register y4004: 4-bit processor zProcessor-on-a-chip: Micro-processor era

17 Microprocessors: CPU on a Chip z1972: 8008, 8-bit z1974: 8080, an improved version

18 Microprocessors: CPU on a Chip z8-bit CPUs z16-bit address (64K) yMC6800: Motorola y6502: MOS Technology (spin-off from Motorola) xApple-II, Apple DOS yZ-80: Zilog (spin-off from Intel) xZ-80 cards on Apple-II, CP/M

19 Microprocessors: CPU on a Chip z16-bit CPUs (Late 1970s) y8086, 80186, 80286: Intel xPC, PC-DOS, MS-DOS, SCO-Unix yMC68000: Motorola x16-bit instructions xHardware multiply and divide x20-bit address buses (1MB) xWorkstations: Sun3

20 Microprocessors: CPU on a Chip z32-bit CPUs y80386, 80486: Intel yMC68020, 68030: Motorola z64-bit CPUs yPentium, Pentium Pro (64-bit external data bus, 32-bit internal registers, not recognized as 64-bit CPUs in terms of internal register word length)

21 Microcomputers: Computers Based on Microprocessors z1975: MITS Altair 8800 (Kit) y$399, i8080, programmed by depositing 1s/0s via front panel switches zOther Computers boom y8080: MITS, … y6800: SWTPC 6800, … yZ-80: TRS-80, … y6502: Apple I, 8K, programmed with BASIC xSteve Jobs & Steve Wozniak, millionaires from PC COM’s …

22 Personal Computers: the Open Architecture Era z1982: IBM PC yA system board (mother board) yIntel 8088 processor y16K memory y5 expansion slots xThird-party vendors to supply various IO adapter cards xOpen architecture xComputer with interchangeable components

23 Micro-controllers: Microcomputers on a Chip zMicrocontroller: a computer on a chip yMicroprocessor, plus yOn-chip memory, plus yInput/output ports z1995: microcontrollers out sold microprocessors 10:1 yembedded on various equipments: xThermostat, machine tools, communication, automotive, … zEvolution: getting greater IO capabilities yIntel: MCS-51, MCS-96, …

24 High-Performance Processors zSupercomputers yAircraft design, global climate modeling, oil- bearing formation, molecular design of new drugs, financial behavior yCDC6600, 7600: Seymour Cray yCray-1: 1976, the first true supercomputer xECL, 128 KW power consumption x130 MFLOPS (Pentium 100: 150 MFLOPS) x$5.1 million

25 High-Performance Processors zParallel Processors yTens of gigaflops yMulti-processors wired by a common bus yEach is given a portion of the problem to solve yHypercube: early 1980s xCosmic Cube, iPSC (with i860/RISC chips) y2D rectangular Mesh architecture: multiple processor at each node xIntel: teraflops computer with 4500 nodes, each powered by 2 Pentium Pro 200.

26 RISC vs. CISC zRISC: Reduced Instruction Set Computer (1980s) yA small number of fixed-length instructions ySimple addressing modes yA large number of registers yInstructions executed in one clock cycle zIntel i860 (“Cray on a Chip”) y82 instructions, 32-bit long each yFour addressing modes y32 general-purpose registers

27 RISC vs. CISC zCISC: Complex Instruction Set Computer yA large number of variable length instructions yMultiple addressing modes yA small number of registers yMultiple number of clock cycles to execute zIntel 8086 yOver 3000 instruction forms, 1-6 bytes y9 addressing modes y8 general-purpose registers yExecution from 2 to 80+ cycles

28 RISC vs. CISC zRISC yControl unit is much simpler (simpler instructions, execution in 1 CLK) yFaster execution with less total on-chip logic yChip area: 10% (vs 50% for CISC) yMore area for register file, data and instruction caches, FPU, and co-processor yPowerPC: 32-bit, by IBM, Apple, Motorola ySparc: for SunMicro workstations

29 Application-Specific Processors zDSP Chips yMostly for analog signal processing yADC-DSP-DAC architecture yAvoid processing analog signals using discrete circuits, involving capacitors and inductance yDSP: conduct complex mathematic functions xDigital filter, spectrum analysis

30 Application-Specific Processors zDSP Chip Architecture yDifferent data/program areas: Harvard Architecture yHardware multipliers and adders, optimized to execute on a single cycle yArithmetic pipelining: several instructions operated at once yHardware loop control yMultiple IO ports for communication with other processors

31 Summary of Processor History  1940s: Vacuum tube, large and consuming large power  1950s: Transistor (1948-)  1959: First IC (second industrial revolution)  1960s: IC was popular to build CPU’s.  1971: Intel 4004 microprocessor (2300 transistors) Starts of the microprocessor age  Late 1970’s: 8080/85

32 Summary of Processor History  1980: RISC (reduced instruction set computer)  CISC (complicated instruction set computer) vs. RISC  CISC family: Intel 80x86, Pentium; Motorola series  All others are RISC series.

33 Evolution of INTEL Processors 4004 (’71)-Pentium Pro (’93-)

34 INTEL zIntegrated Electronics y1968: founded by Robert Noyce and Gordon Moore yIA: Intel Architecture (e.g, IA-16, IA-32, IA-64) since 8008 (’72) had became the de facto standard zEvolution: xInternal register sizes xExternal bus widths xReal, Protected, and Virtual 8086 modes

35 4-bit Processors z4004 yfirst microprocessor ybecame available in 1971 y4-bit microprocessor: x4-bit registers & 4-bit data bus x#transistors: 2250 xMin. feature size: 10 microns xAddress bus: 10 bits/1K x0.06 MIPS MHz) xNo internal cache

36 8-bit Processors z8008, 8080, 8085 ybecame available in 1974 y8-bit microprocessor 8080

37 8086: IA standard zBecame available in 1978 y16-bit data bus y20-bit address bus (was 16-bit for 8080) ymemory organization: 16 segments of 64KB (1 MB limit) zRe-organize CPU into BIU (bus interface unit) and EU (execution unit) yAllow fetch and execution simultaneously zInternal register expanded to 16-bit yAllow access of low/high byte separately

38 8086 zHardware multiply and divide instructions zExternal math co-processor zInstruction set compatible with 8080/8085 z8086: defined the 80x86 architecture

39 8086 zNot quite successful y16-bit data bus: Requires two separate 8-bit memory banks yMemory chips were expensive

40 8088: PC standard zBecame available in 1979, almost identical to 8086 z8-bit data bus: for hardware compatibility with 8080 z16-bit internal registers and data bus (same as 8086) z20-bit address bus (was 16-bit for 8080) yBIU re-designed zmemory organization: 16 segments of 64KB (1 MB limit) yTwo memory accesses for 16-bit data (less efficient) yBut less cost z8088: used by IBM PC (1982), 16K-64K, 4.77MHz

41 80186, 80188: High Integration CPU zPC system: y8088 CPU + various supporting chips xClock generator x8251: serial IO (RS232) x8253: timer/counter x8255: PPI (programmable periphial interface) x8257: DMA controller x8259: interrupt controller z80186/80188: 8086/ supporting functions yCompatible instruction set (+ 9 new instructions)

42 80286 zBecame available in 1982 zused in IBM AT computer (1984) z16-bit data bus zclock speed 25% faster than 8088, throughput 5 times greater than 8088 z24-bit address bus (16 MB) (vs. 20-bit/1M 8086)

43 80286: Real vs. Protected Modes zLarger address space: 24-bit address bus yReal Mode vs. Protected Mode zReal Mode: yPower on default mode yFunction like a 8086: use 20-bit least significant address lines (1M) ySoftware compatible with 286 y16 new instructions (for Protected Mode management) yFaster 286: redesigned processor, plus higher clock rate (6-8MHz)

44 80286: Real vs. Protected Modes zProtected Mode: yMulti-program environment yEach program has a predetermined amount of memory yAddressed via segment selector (physical addresses invisible): 16M addressable yMultiple programs loaded at once (within their respective segments), protected from read/write by each other

45 80286: Real vs. Protected Modes zProtected Mode: yCannot be switch back to real mode to avoid illegal access by switching back and forth between modes zA faster 8086 only? yMS-DOS requires that all programs be run in Real Mode

46 Clock Speed zElectrical signals cannot change instantaneously (transition period required) zSystem clock provides timing signal for synchronization zCannot be used to compare the performance of microprocessors with different instruction sets ye.g., a 66 MHz Pentium is twice as fast as a 66 MHz 80486

47 80386DX (aka ) zavailable in 1985, a major redesign of 86/286 yCompatibility commitment through 2000 z32-bit data and address buses (4 GB memory) yReal Address Mode: 1M visible, 286 real mode yProtected Virtual Address Mode: xOn board MMU xSegmented tasks of 1byte to 4G bytes Segment base, limit, attributes defined by a descriptor register xPage swapping: 4K pages, up to 64TB virtual memory space xWindows, OS/2, Unix/Linux

48 80386DX (aka ) zVirtual 8086 mode (a special Protected mode feature): permitted multiple 8086 virtual machines- multitasking (similar to real mode) yWindows (multiple MSDOS’s) zClock rate: ymax. 40MHz, 2 pulses per R/W bus cycle yExternal memory cache to avoid wait xFast SRAM x93% hit rate with 64K cache zCompatible instructions (14 new)

49 80386SX z80386SX: (for transition to 32-bit) y16-bit data bus/32-bit register y24-bit address bus

50 80486DX z1989: a polished 386, 6 new OS level instructions zvirtually identical to 386 in terms of compatibility zRISC design concepts yfewer clock cycles per operation, a single clock cycle for most frequently used instructions yMax 50MHz y5 stage execution pipeline xPortions of 5 instructions execute at once

51 80486DX zHighly Integrated: yOn board 8K memory cache yFPP (equivalent to external co-processor) zTwice as fast as 386 at any given clock rate y20Mhz 486 ~= 40Mhz 386

52 80486SX z80486SX yNOT a 16-bit version for transition purpose yno coprocessor yNo internal cache yFor low-end applications yMax. 33Mhz only

53 80486DX2/DX4: Overdrive Chips zProcessor speed increased too fast yRedesign of microcomputer for compatibility becomes harder ySolution: Separating internal speed with external speed, improve performance independently z80486DX2/DX4 – internal clock twice/three times (NOT four times) the external clock: runs faster internally

54 80486DX2/DX4: Overdrive Chips zSystem board design is independent of processor upgrade (less expensive components are allowed) zProcessor operate at maximum speed data rate internally yOnly slow access to external data operates at system board rate yInternal cache offset the speed gap z486DX2 66: 66 internal, 33 external z486DX4 100: 100 internal, 33 external (3x) zOverdrive sockets: for upgrading 486dx/sx to 486dx2/dx4 (with overdrive socket pin-outs)

55 Pentium: Superscaler Processor zavailable in 1992 z32-bit architecture zSuperscaler architecture yScaling: scaling down etchable feature size to increase complexity of IC (e.g., DRAM) x10 microns/4004 to 0.13 microns (2001) ySuperscaler: go beyond simply scaling down yTwo instruction pipelines: each with own ALU, address generation circuitry, data cache interface yExecute two different instructions simultaneously

56 Pentium: Superscaler Processor zOnboard cache ySeparate 8K data and code caches to avoid access conflicts zFPP zInstruction pipeline: 8 stage zOptimized floating point functions y5x-10x FLOP’s of 486 y2x performance of 486 at any clock rate

57 Pentium: Superscaler Processor zCompatibility with 386/486: yInternal 32-bit registers and address bus yData bus expanded to 64-bits for higher data transfer rate xCompare 8088 to 386sx transition

58 Pentium: Superscaler Processor znon-clone competition from AMD, Cyrix zdevelopment of brand identity by Intel

59 Pentium Pro: Two Chips in One zBecame available in 1995 zSuperscaler of degree 3 yCan execute 3 instructions simultaneously zOptimized for 32-bit operating systems (e.g., Windows NT, OS2/Warp) zTwo separate silicon die on the same package yProcessor: 0.35 u, 5.5 million transistors y256KB(/512K) Level 2 cache included on chip, 15.5 million transistors in smaller area

60 Pentium Pro: Two Chips in One zOn Board Level 2 cache ySimplifies system board design yRequires less space yGains faster communication with processor zInternal (level 1) cache: 8K zPentium Pro 133 ~= 2x Pentium 66 ~= 4x 486DX2 66

61 Pentium Pro:Dynamic Execution zDynamic execution: reduce idle processor time by predicting instruction behaviors yMultiple Branch Prediction: look as far as 30 instructions ahead to anticipate program branches yData Flow Analysis: looks at upcoming instructions and determine if they are available for processing, depending on other instructions. Determine optimal execution sequences. ySpeculative Execution: execute instructions in different order as entered. Speculative results are stored until final states can be determined.

62 Processor Future What’s More from Moore’s Law?

63 Moore's Law zIn 1965, Gordon Moore predicted that: z“The number of transistors per integrated circuit would double every 18 months” zHe forecast that this trend would continue through 1975

64 Moore’s Law

65 Other Microprocessors zMotorola family yfrom 6809 (Apple II) through zPowerPC yjoint venture between Apple, IBM, and Motorola zRISC Processors yDEC Alpha, MIPS, Sun SPARC, etc.

66 CISC vs. RISC zCISC (Complex Instruction Set Computer) yCISC processors have a large versatile instruction set that supports many complex addressing modes ymove complexity from software to hardware zRISC (Reduced Instruction Set Computer) yRISC processors have a small instruction set ymove complexity from hardware to software

67 Microprocessor Performance zTwo main factors: zRespond time ythe time between the start and completion of a task, also referred to as execution time zThroughput ythe total amount of work done in a given time

68 MIPS zMillion Instructions Per Second yMIPS = (Instruction count) / (Execution time in micro second X 10 6 ) zIt specifies performance inversely to execution time zFaster machines have a higher MIPS rating

69 Some Problems of MIPS zCannot compare computers with different instruction sets, since the instruction count will certainly differ zMIPS varies between programs on the same computer

70 iCOMP zAn index provided by Intel for comparison of performance of their 32-bit microprocessors zBased on a variety of performance components that represent integer mathematics, graphics, etc. zCombine results of a set of software application benchmarks

71

72 Chapter 2 Computer Codes, Programming, and Operating Systems Number Systems Computer Codes Programming Operating Systems

73 Number Systems zDecimal: Base 10 zBinary: Base 2 zOctal: Base 8 zHexadecimal: Base 16

74 Base Conversion: 2  10 zBinary to Decimal yD =  i=0,n-1 b i x 2 i zDecimal to Binary yRepeated subtraction xD’ =  i=0,m-1 b i x 2 i = D - 2 m (b m =1) xD <= D’ & m <= m’ (m’: max exp. s.t. (b m’ =1) yLong division xD’ = D/2 … b i & D <= D’

75

76 MCS-51 Program Development Editor Assembler Linker Symbol Converter Symbol Converter ICE Target Program.ASM.OBJ.HEX.SYM.SDT (X8051)(Link) (CVTSYM)

77 Chapter 3 80x86 Processor Architecture 8086/88 Segmented Memory Pentium Pentium Pro

78 The 8086 and 8088 Processor Model Programming Model

79 8086: IA standard zBecame available in 1978 y16-bit data bus y20-bit address bus (was 16-bit for 8080) ymemory organization: 16 segments of 64KB (1 MB limit) zRe-organize CPU into BIU (bus interface unit) and EU (execution unit) yAllow fetch and execution simultaneously zInternal register expanded to 16-bit yAllow access of low/high byte separately

80 8088: PC standard zBecame available in 1979, almost identical to 8086 z8-bit data bus: for hardware compatibility with 8080 z16-bit internal registers and data bus (same as 8086) z20-bit address bus (was 16-bit for 8080) yBIU re-designed zmemory organization: 16 segments of 64KB (1 MB limit) yTwo memory accesses for 16-bit data (less efficient) yBut less cost z8088: used by IBM PC (1982), 16K-64K, 4.77MHz

81 80186, 80188: High Integration CPU zPC system: y8088 CPU + various supporting chips xClock generator x8251: serial IO (RS232) x8253: timer/counter x8255: PPI (programmable periphial interface) x8257: DMA controller x8259: interrupt controller z80186/80188: 8086/ supporting functions yCompatible instruction set (+ 9 new instructions)

82 8086 Processor Model: BIU+EU zBIU yMemory & IO address generation zEU yReceive codes and data from BIU xNot connected to system buses yExecute instructions ySave results in registers, or pass to BIU to memory and IO

83 Processor Model BHBL AHAL DHDL CHCL BP DI SI SP ALU Flags CS ES SS DS IP  Address Generation and Bus Control Instruction Queue EU BIU

84 Fetch and Execution Cycle zBIU+EU allows the fetch and execution cycle to overlap y0. System boot, Instruction Queue is empty y1. IP =>BIU=> address bus && IP++ y2. Mem[(IP-1)] => Instruction Queue[tail++] y3a. InstrQ[head] => EU => execution y3b. Mem[IP++] => InstrQ[tail++] xMaybe multiple instructions yRepeat 3a+3b (overlapped)

85 Waiting Conditions: Memory Access zBIU+EU: execute (almost) continuously without waiting zWaiting Conditions: Accessing memory locations not in queue yBIU suspend instruction fetch yIssues external memory address yResumes instruction fetch and execution

86 Waiting Conditions: Jump zNext Jump Instruction yInstructions in queue are discarded yEU wait for the next instruction after the jump location to be fetched by BIU yResume execution

87 Waiting Conditions: Long Instructions zLong Instruction is being executed yInstruction Full yBIU waits yResume instruction fetch after EU pull one or tow bytes from queue

88 BIU: 8088 vs zBIU is the major difference z8088: ydata bus: 8-bit (vs. 16-bit/8086) yInstruction queue: 4 bytes (vs. 6-byte/8086) zOnly 30% slower than 8086 yIf queue is kept full

89 Programming Model BHBL AHAL DHDL CHCL BP DI SI SP CS ES SS DS IP Flags HFlags L

90 8086 Programming Model zData Group: yAX (AH+AL): Accumulator yBX (BH+BL): Base yCX (CH+CL): Counter yDX (DH+DL): Data

91 8086 Programming Model zSegment Group: yCS: Code Segment yDS: Data Segment yES: Extra Segment ySS: Stack Segment zSegment Registers: yBase address to particular segments

92 8086 Programming Model zPointer/Index Group: yIP: Instruction Pointer  CS ySI: Source Index  DS yDI: Destination Index  ES ySP: Stack Pointer  SS zIndex Registers: yIndex (offset) or Pointer to a Base address

93 8086 Flag Word Flag L : SF ZF X AF X PF X CF CF: Carry Flag CF= 0 : No Carry (Add) or Borrow (SUB) CF= 1 : high-order bit Carry/Borrow AF: Aux. Carry: Carry/Borrow on bit 3 (Low nibble of AL) SF: Sign Flag: (0: positive, 1: negative) ZF: Zero Flag: (1: result is zero) PF: (Even) Parity Flag (even number of 1’s in low-order 8 bits of result)

94 8086 Flag Word Flag H : X X X X OF DF IF TF TF: Trap flag (single-step after next instruction; clear by single-step interrupt) IF: Interrupt-Enable: enable maskable interrupts DF: Direction flag: auto-decrement (1) or increment(0) index on string operations OF: Overflow: signed result cannot be expressed within #bits in destination operand

95 Segmented Memory zLinear vs. Segmented yLinear Addressing: xThe entire memory is regarded as a whole xthe entire memory space is available all the time ySegmented: xmemory is divided into segments xProcess is limited to access designated segments at a given time

96 8086 Memory Organization zEven and Odd Memory Banks y16-bit data bus  two-byte / two one-byte access yAllows processor to work on bytes or on words (16-bit) xIO operations are normally conducted in bytes yCan handle odd-length instructions xSingle byte instructions xMultiple byte (and very long) instructions

97 8086 Memory Organization zMemory Space: y20-bit address bus yLinearly, 1M bytes directly addressable zMemory Banks yCan read 16-bit data (512K words) from even and odd-addressed simultaneously x  need Two memory banks in parallel x  BHE control line: allows addressing even/odd banks or both

98 Memory Organization: Alignment zEndianess: yOne way to model multi-byte CPU register xAX  AH+AL yTwo ways to store operands in memory zBig-endian CPU: (IBM370, M68*, Sparc) yHigh-order-byte-first (HOBF) yMaps highest-order byte of internal register  lowest (1 st ) memory byte address yOperand address  address of MSB xMOV R1, N  N: 1 st byte in memory & MSB of register

99 Memory Organization: Alignment zLittle-endian CPU: (DEC, Intel) yLow-order-byte-first (LOBF) yMaps lowest-order byte of register  1 st memory byte yOperand address  address of LSB (1 st memory byte) xMOV AX, N  N: 1 st byte in memory & LSB of register xAL  N, AH  N+1 zConfigurable: yCan switch between Big/Little-endian, or yProvide instructions which convert 16-/32-bit data between two byte ordering (80486)

100 8086 Memory Organization zAligned operand yOperand aligned at even-byte (word/dword) boundaries yAllows single access to read/write one operand xThrough internal shift/swap mechanism, if necessary zMis-aligned words: yWord operand not start at even address yNeed 2 read cycles to read/write the word (8086) xIssues two addresses to access the two even-aligned words containing the operand in order to access the operand xslower but transparent to programmer

101 8086 Memory Organization z8088 yalways 2 cycles for word operations xAligned or not yBecause of 8-bit external data bus xSingle memory bank is sufficient

102 8086 Memory Map zMemory Map: How memory space is allocated yROM Area: boot, BIOS yRAM: OS/User Apps & data yUnused yReserved: for future hardware/software uses yDedicated: for specific system interrupt and rest functions, etc.

103 Segment Registers z64K memory segments x 16 z16-bit offset each zCS, DS, ES, SS

104 Logical and Physical Addresses zPhysical: 20-bit zLogical: 16-bit y16-byte segment boundaries zAddress Translation yE.g., CS:IP

105 80286 zFirst with Protection Mode zReview of 286 Protected Mode … Next

106 80286 zBecame available in 1982 zused in IBM AT computer (1984) z16-bit data bus zclock speed 25% faster than 8088, throughput 5 times greater than 8088 z24-bit address bus (16 MB) (vs. 20-bit/1M 8086)

107 80286: Real vs. Protected Modes zLarger address space: 24-bit address bus yReal Mode vs. Protected Mode zReal Mode: yPower on default mode yFunction like a 8086: use 20-bit least significant address lines (1M) ySoftware compatible with 286 y16 new instructions (for Protected Mode management) yFaster 286: redesigned processor, plus higher clock rate (6-8MHz)

108 80286: Real vs. Protected Modes zProtected Mode: yMulti-program environment yEach program has a predetermined amount of memory yAddressed via segment selector (physical addresses invisible): 16M addressable yMultiple programs loaded at once (within their respective segments), protected from read/write by each other

109 80286: Real vs. Protected Modes zProtected Mode: yCannot be switch back to real mode to avoid illegal access by switching back and forth between modes zA faster 8086 only? yMS-DOS requires that all programs be run in Real Mode

110 80386 Model zRefine 286 Protect Mode zExpand to 32-bit registers zNew Virtual 8086 Mode

111 80386 Review

112 80386DX (aka ) zavailable in 1985, a major redesign of 86/286 yCompatibility commitment through 2000 z32-bit data and address buses (4 GB memory) yReal Address Mode: 1M visible, 286 real mode yProtected Virtual Address Mode: xOn board MMU xSegmented tasks of 1byte to 4G bytes Segment base, limit, attributes defined by a descriptor register xPage swapping: 4K pages, up to 64TB virtual memory space xWindows, OS/2, Unix/Linux

113 80386DX (aka ) zVirtual 8086 mode (a special Protected mode feature): permitted multiple 8086 virtual machines- multitasking (similar to real mode) yWindows (multiple MSDOS’s) zClock rate: ymax. 40MHz, 2 pulses per R/W bus cycle yExternal memory cache to avoid wait xFast SRAM x93% hit rate with 64K cache zCompatible instructions (14 new)

114 80386SX z80386SX: (for transition to 32-bit) y16-bit data bus/32-bit register y24-bit address bus

115 80386: Real vs. Protected Modes zLarger address space: 32-bit address bus (4G) yReal Mode vs. Protected Mode (refined from 286) zReal Mode: yPower on default mode yFunction like a 8086: (1) use only 20-bit least significant address lines (1M) (2) segmented memory retained (64K) ySoftware compatible with 286 zNew Real Mode Features: yaccess to 32-bit register set ytwo new segments: F, G

116 80386: Real vs. Protected Modes zProtected Mode: ynew addressing mechanism vs. real mode ysupports protection levels ysegment size: 1 to 4G (not 64K, fixed) ysegment register: pointer to a descriptor table xnot base address

117 80386: Real vs. Protected Modes zProtected Mode: ydescriptor table: (8 byte per entry) x32-bit base address of segment xsegment size xaccess rights ymemory address = base address (in table) + offset (in instruction)

118 80386: Real vs. Protected Modes zProtected Mode: yPaging mechanism: xmap 32-bit linear address (base+offset) =>physical address & page frame address x  (4K page frames in system memory) x64TB of virtual memory

119 80386: Real vs. Protected Modes zProtected Mode: yProtection mechanism: xtasks/data/instructions are assigned a privilege level (PL) xtasks running at lower PL cannot access tasks or data segments at a higher PL xrunning programs that are protected from the others

120 80386: Real vs. Protected Modes zTwo Ways to Run 8086 Programs: yReal Mode yVirtual 8086 Mode zVirtual 8086 Mode: yruns multiple 8086+other 386 (protected mode) programs independently yeach sees 1 MB (mapped via paging to anywhere in 4GB space) yrunning V8086+ Protected mode simultaneously

121 80386 Processor Model 386

122 80386 Processor Model: BIU+CPU+MMU zBIU ycontrol 32-bit address and data buses ykeep instruction queue full (16 bytes) zAddress pipelining yaddress of next memory location is output halfway through current bus cycle ymore address decode time yslower memory chip is OK yeasier to keep up with faster (2 CLK) bus cycle of 386

123 80386 Processor Model: BIU zdynamic data bus sizing yswitch between 16-/32-bit data bus on the fly yaccommodate to external 16-bit memory cards or IO devices yadjust bus timing to use only the least significant 16 bits

124 80386 Processor Model: BIU zExternal memory y4 memory banks (4x8=32bits) yBE0-BE3 for bank selection yaccess byte or word or double word xaligned operands: 1 bus cycle xmis-aligned (not %4): 2 bus cycles

125 80386 Processor Model: CPU zCPU=IU (instruction) +EU (execution) yfetching & execution overlap zIU: yretrieval instructions from queue ydecode ystore in decoded queue zEU:ALU+registers (32-bit) yexecute decode instructions

126 80386 Processor Model: MMU zSegmentation unit yReal mode: generate the 20-bit physical address yProtected mode: store base/size/rights in descriptor registers xcache descriptor tables in RAM xfaster operations zPaging Unit ydetermines physical addresses associated with active segments (divided into 4K pages) yvirtual memory support to allow larger programs

127 80386 Programming Model zGeneral Purpose Registers yData & Addresses Groups yStatus & Control Flags xVM, RF, NT, IOPL ySegment Group

128 80386 Programming Model zSpecial purpose Registers

129 80386 Programming Model zMemory Management ysegment descriptors xkeep base, size, access rights x3 types of tables: global (GDT), local (LDT), interrupt (IDT) xaddressing: index (to a table) + RPL base + offset (from instruction) yPaging xTLB

130 80386 Programming Model zProtection (PL) ytask: CPL yinstruction: RPL ydata segment: DPL zGates yspecial descriptors that allows access to higher PL tasks from lower PL tasks

131 80486 Review …

132 80486DX z1989: a polished 386, 6 new OS level instructions zvirtually identical to 386 in terms of compatibility zRISC design concepts yfewer clock cycles per operation, a single clock cycle for most frequently used instructions yMax 50MHz y5 stage execution pipeline xPortions of 5 instructions execute at once

133 80486DX zHighly Integrated: yOn board 8K memory cache yFPP (equivalent to external co-processor) zTwice as fast as 386 at any given clock rate y20Mhz 486 ~= 40Mhz 386

134 80486SX z80486SX yNOT a 16-bit version for transition purpose yno coprocessor yNo internal cache yFor low-end applications yMax. 33Mhz only

135 80486DX2/DX4: Overdrive Chips zProcessor speed increased too fast yRedesign of microcomputer for compatibility becomes harder ySolution: Separating internal speed with external speed, improve performance independently z80486DX2/DX4 – internal clock twice/three times (NOT four times) the external clock: runs faster internally

136 80486DX2/DX4: Overdrive Chips zSystem board design is independent of processor upgrade (less expensive components are allowed) zProcessor operate at maximum speed data rate internally yOnly slow access to external data operates at system board rate yInternal cache offset the speed gap z486DX2 66: 66 internal, 33 external z486DX4 100: 100 internal, 33 external (3x) zOverdrive sockets: for upgrading 486dx/sx to 486dx2/dx4 (with overdrive socket pin-outs)

137 486 Processor Features z386 features: yReal/Protected Modes yMemory Management yPL’s yregisters & bus sizes zNew features y6 OS instructions y8K/16K onboard cache (was external before 386)

138 486 Processor Features zA better 386 y5 stage instruction pipeline xIF/ID/EX => PF/D1/D2/EX/WB xPF: instructions => Q (2*16-bytes) xD1: determine opcode xD2: determine memory address of operands xEX: execute indicated OP xWB: update register

139 486 Processor Features zReduced Instruction Cycle Times y5 stage instruction pipeline (e.g., Fig. 3.18) yinstruction cycle times: x8086: 4 CLK x80386: 2 CLK x80486: 1 CLK (  close to RISC) xabout 2X faster than 386

140 486 Processor Model: 386+FPU+Cache z386 units retained: BIU, CPU, MMU znew: FPU (80387) + Cache (8K/16K) zFPU: y387 onboard x0.8 u => #transistors increased (275K => 1+ millions) xsimplified system board design xspeedup FP operations

141

142 486 Processor Model: Cache zCache (8K/16K (dx4)) yFunction: bridge processor memory bandwidth x8088: 4.77MHz x80486: 50MHz xPentium: 100MHz xPentium Pro: 133 MHz xMain Memory (DRAM): relatively slow yFast Static RAMs (SRAM) as cache

143 486 Processor Model: Cache zOrganization: y8K y4-way set associative x4 direct mapped caches wired in parallel xeach block maps to a set of 4 lines yunified: data & code in the same cache ywrite-through: update cache and memory page on write operations

144 486 Processor Model: Cache zlocality (why caches help?) yspatial locality: e.g., array of data ytemporal: e.g., loops in codes zoperations on hit/miss z128-bit cache lines y32-bit x N to catch locality (N=4) y128-bit = 16-byte

145 486 Processor Model: Cache zMapping: ymemory => many-to-many => cache yData RAM: save memory data yTag RAM: save memory address information z3 methods of mapping yfully associative: memory block to any cache line ydirect map: memory block to specific line xtrashing yset associative: memory block to a set of cache lines

146 486 Processor Model: Cache zReplacement policy (LRU) yvalid bits: all 4 lines in use ? xNO => use any unused line xYES => find one to replace yLRU bits: which is least recently used

147

148

149 Pentium Review …

150 Pentium: Superscaler Processor zavailable in 1992 z32-bit architecture zSuperscaler architecture yScaling: scaling down etchable feature size to increase complexity of IC (e.g., DRAM) x10 microns/4004 to 0.13 microns (2001) ySuperscaler: go beyond simply scaling down yTwo instruction pipelines: each with own ALU, address generation circuitry, data cache interface yExecute two different instructions simultaneously

151 Pentium: Superscaler Processor zOnboard cache ySeparate 8K data and code caches to avoid access conflicts zFPP zInstruction pipeline: 8 stage zOptimized floating point functions y5x-10x FLOP’s of 486 y2x performance of 486 at any clock rate

152 Pentium: Superscaler Processor zCompatibility with 386/486: yInternal 32-bit registers and address bus yData bus expanded to 64-bits for higher data transfer rate xCompare 8088 to 386sx transition

153 Pentium: Superscaler Processor znon-clone competition from AMD, Cyrix zdevelopment of brand identity by Intel

154 Pentium Pro Review …

155 Pentium Pro: Two Chips in One zBecame available in 1995 zSuperscaler of degree 3 yCan execute 3 instructions simultaneously zOptimized for 32-bit operating systems (e.g., Windows NT, OS2/Warp) zTwo separate silicon die on the same package yProcessor: 0.35 u, 5.5 million transistors y256KB(/512K) Level 2 cache included on chip, 15.5 million transistors in smaller area

156 Pentium Pro: Two Chips in One zOn Board Level 2 cache ySimplifies system board design yRequires less space yGains faster communication with processor zInternal (level 1) cache: 8K zPentium Pro 133 ~= 2x Pentium 66 ~= 4x 486DX2 66

157 Pentium Pro:Dynamic Execution zDynamic execution: reduce idle processor time by predicting instruction behaviors yMultiple Branch Prediction: look as far as 30 instructions ahead to anticipate program branches yData Flow Analysis: looks at upcoming instructions and determine if they are available for processing, depending on other instructions. Determine optimal execution sequences. ySpeculative Execution: execute instructions in different order as entered. Speculative results are stored until final states can be determined.


Download ppt "Chapter 1 Microcomputers and Microprocessors Microprocessor Evolution and Performance."

Similar presentations


Ads by Google