Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Chapter 3 80x86 Processor Architecture 8085 (review) – typical, single segment 8086/88 – pipeline + segments 80286/386 – real(8086)/protected mode 80386.

Similar presentations


Presentation on theme: "1 Chapter 3 80x86 Processor Architecture 8085 (review) – typical, single segment 8086/88 – pipeline + segments 80286/386 – real(8086)/protected mode 80386."— Presentation transcript:

1 1 Chapter 3 80x86 Processor Architecture 8085 (review) – typical, single segment 8086/88 – pipeline + segments 80286/386 – real(8086)/protected mode 80386 – MMU (+paging) 80486 – cache memory Pentium P6 (Pentium Pro, II, Celeron, III, Xeon, …) Pentium 4, Core 2 – 64 bit extension

2 2 8086 and 8088 Processors Processor Model: BIU+EU Programming Model: Data Registers + Segments Memory Banks Issues

3 3 8086 and 8088 Processors Review of 8088/8086 Characteristics (differences with 8085)

4 4 8086: IA standard zBecame available in 1978 y16-bit registers (8-/16-bit operations) +16-bit data bus y20-bit address bus (was 16-bit for 8080, 64K => 1M) ymemory organization: 64KB segments (1 MB limit) xCS (code segment), DS (data), SS (stack), ES (extra segment) zRe-organize CPU into BIU (bus interface unit) and EU (execution unit) [Fig 3.1, p.74, John Uffenbeck, 2ed] yAllow fetch and execution simultaneously zInternal register expanded to 16-bit yAllow access of low/high byte simultaneously or separately yTwo Memory banks for odd/even-byte access

5 5 8088: PC standard zBecame available in 1979, almost identical to 8086 z8-bit data bus: for hardware compatibility with 8080 z16-bit internal registers and data bus (same as 8086) z20-bit address bus (was 16-bit for 8080) yBIU re-designed zmemory organization: 64KB segments (1 MB limit) yTwo memory accesses for 16-bit data (less efficient) yBut less costly z8088: used by IBM PC (1982), 16K-64K, 4.77MHz

6 6 80186, 80188: High Integration CPU zPC system: y8088 CPU + various supporting chips xClock generator x8251: serial IO (RS232) x8253: timer/counter x8255: PPI (programmable peripheral interface) x8257: DMA controller x8259: interrupt controller z80186/80188: 8086/8088 + supporting functions yCompatible instruction set (+ 9 new instructions)

7 7 8086 and 8088 Processors Processor Model: BIU+EU Programming Model: Data Registers + Segments Memory Banks Issues

8 8 8086 Processor Model: BIU+EU zBIU yMemory & IO address generation yRead/Write Instructions/Data from/to Data Bus zEU yReceive codes and data from BIU xNot connected to system buses yExecute instructions ySave results in registers, or pass to BIU to memory and IO

9 9 CPU: i8085 L REG (8) A REG

10 10 8086 Processor Model BHBL AHAL DHDL CHCL BP DI SI SP ALU Flags CS ES SS DS IP  Address Generation and Bus Control Instruction Queue (6 or 4) EU BIU Data BusAddress Bus

11 11 Fetch and Execution Cycle zBIU+EU allows the fetch and execution cycle to overlap y0. System boot, Instruction Queue is empty y1. IP =>BIU=> address bus y2. Mem[(IP++)] => Instruction Queue[tail++] y3a. InstrQ[head] => EU => execution y3b. Mem[IP++] => InstrQ[tail++] xMaybe multiple instructions yRepeat 3a+3b (can be overlapped) IF EXE IF

12 12 Waiting Conditions zBIU+EU: execute (almost) continuously without waiting, except … zWaiting Conditions: yExternal Memory Access yNext Jump Instruction yLong & Slow Instruction

13 13 Waiting Conditions: Memory Access zExternal Memory Access: Accessing external memory locations not in queue (needs A-bus for execution) yBIU suspend instruction fetch yIssues this external memory address yResumes instruction execution (EU) and fetch (BIU)

14 14 Waiting Conditions: Jump zNext Jump Instruction yInstructions in queue are discarded yEU wait while the instruction at the jump address is fetched by BIU yResume execution (empty) Inst-@jmp addr BIU EUJMP destination Inst-6 Inst-5 Inst-4 Inst-3 Inst-2 Inst-1 BIU

15 15 Waiting Conditions: Long & Slow Instructions zLong slow Instruction is being executed yInstruction Q. Full yBIU waits yResume instruction fetch after EU pull one or two bytes from queue yE.g., AAM (ASCII adjust for multiplication): 83 CLKs  Instruction Fetch: 4 CLKs

16 16 BIU: 8088 vs. 8086 zBIU is the major difference z8088: yRegister: 16-bit (same as 8086) yData bus: 8-bit (vs. 16-bit/8086) yInstruction queue: 4 bytes (vs. 6-byte/8086) zOnly 30% slower than 8086, Why? yIf queue is kept full, instructions are executed without interruption yonly slightly affected by the data bus width difference: 16 vs 8-bit

17 17 8086 and 8088 Processors Processor Model: BIU+EU Programming Model: Data Registers + Segments Memory Banks Issues

18 18 8086/8088 Programming Model: Processor Features vs. Model zALU: 16-bit vs. 8-bit data register ySome operations requires 8 bits, some 16 bits x=> Groups of 8-bit registers as 16-bit registers x=> Memory copy: two bytes from two memory banks for 16-bit operations zAddress: 20-bit (much more memory locations) yWith 16-bit address pointers (same as 8085) yMemory management: Segmented Memory xDivide memory into 64KB segments (16-bit addressable within each segment) x16-bit Pointers: for Segment address & Offset within segment

19 19 8086/8088 Programming Model: EU Register Functions BHBL AHAL DHDL CHCL BP DI SI SP CS ES SS DS IP Flags HFlags L Data Registers Status and Control Flags Segment Registers Pointers & Index Registers AX BX CX DX 16-bit registers8-bit registers 8-bit flags x 2 (Memory Access) (Register Operands)

20 20 8086 Programming Model: Data Registers zData Group: (8/16-bit registers) x16-bit registers, byte/word accessible (16=8x2) xData registers: Save temporary results as long as possible For Efficiency: avoid costly (slow) external memory access yAX (= AH+AL): 16-bit Accumulator (Acc.) xAL = Accumulator for 8-bit operations xAcc: default source operand & result for arithmetic/logic operations E.g., ADD AX, BX  AX := AX + BX. yBX (= BH+BL): typically used as a Base register yCX (= CH+CL): typically used as a Counter yDX (= DH+DL): typically used as a general Data register

21 21 8086 Status and Control Flags zFlags: status vs. control yStatus: indication of results xFor conditional flow control: JNZ, JNC, … yControl: set or clear to control subsequent operations z8086: ystatus x 6 [C, P, A, Z, S, O] ycontrol x 3 [TF, IF, DF]

22 22 Flag L : (Same as 8085 status register) SF ZF X AF X PF X CF CF: Carry Flag CF= 0 : No Carry (Add) or Borrow (SUB) CF= 1 : high-order bit Carry/Borrow AF: Aux. Carry: Carry/Borrow on bit 3 (Low nibble of AL) SF: Sign Flag: (0: positive, 1: negative) ZF: Zero Flag: (1: result is zero) PF: (Even) Parity Flag (even number of 1’s in low-order 8 bits of result) 8086 Status and Control Flags

23 23 Flag H : (TF, IF, DF: control bits, others: status) X X X X OF DF IF TF TF: Trap flag (single-step after next instruction; clear by single-step interrupt) IF: Interrupt-Enable: enable maskable interrupts DF: Direction flag: auto-decrement (1) or increment(0) index on string (block move) operations OF: Overflow: signed result cannot be expressed within #bits in destination operand 8086 Status and Control Flags

24 24 8086 Status and Control Flags zTF: software single step yJump to trap address on each execution (if set) zIF: INTR enable yINT, ISR, INTV, IRET zDF: block move (string) operation direction yAuto-increment or auto-decrement

25 25 FIGURE 3-5 8086 flag word, DF, IF, and TF can be set or reset to control the operation of the processor. The remaining flags are status indicators. Bits marked X are undefined. John Uffenbeck The 80x86 Family: Design, Programming, and Interfacing, 3e Copyright ©2002 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved.

26 26 8086 Programming Model: Segment Registers zSegment Group: divide memory into yCS: Code Segment [64KB for each segment] yDS: Data Segment yES: Extra Segment ySS: Stack Segment zSegment Registers: CS, DS, ES, SS ySave Base addresses to particular segments ySEG(16-bit):OFFSET(16-bit) is used by BIU to calculate (20-bit) physical memory address [DS:0] = byte 0 (the 1 -st byte) in Data Segment 0 = offset to begin-of-DS

27 27 FIGURE 3-9 The 8086 divides its 1 MB of memory address space into four segments, the data, code, stack, and extra segments. The four segment registers DS, CS, SS, and ES point to location 0 of the current segment. In this example, the stack and extra segments are partially overlapped. (From J. Uffenbeck, Microcomputers and Microprocessors: The 8080, 8085, and Z-80, Prentice Hall, Englewood Cliffs, NJ, 1985.) Copyright ©2002 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. John Uffenbeck The 80x86 Family: Design, Programming, and Interfacing, 3e

28 28 –think of segments as windows that can be moved over any area of memory to access data or code –a program can have more than four or six segments, but only access four or six segments at a time Figure 2 – 4 A memory system showing the placement of four memory segments.

29 29 –a program placed in memory by DOS is loaded in the TPA at the first available area of memory above drivers and other TPA programs –area is indicated by a free- pointer maintained by DOS –program loading is handled automatically by the program loader within DOS Figure 2 – 5 An application program containing a code, data, and stack segment loaded into a DOS system memory.

30 30 Figure 2 – 3 The 8086 (or real mode) memory-addressing scheme, using a segment address plus an offset. –this shows a memory segment beginning at 10000H, ending at location 1FFFFH 64K bytes in length –also shows how an offset address, called a displacement, of F000H selects location 1F000H in the memory

31 31 8086 Programming Model: Index Registers zPointer/Index Group: (16-bit) as memory pointers yIP: Instruction Pointer  CS (point to CS by default) x(next instruction to be fetched by BIU; physically part of BIU) ySI: Source Index  DS yDI: Destination Index  ES ySP: Stack Pointer  SS zIndex Registers: IP, SI, DI, SP ySave Index (or offset) or Pointer to a Base address xE.g., MOV AH, [SI] ; AH := (*DS:SI), // Mem. Addr. in SI

32 32 Segmented Memory zMemory Organization: Linear vs. Segmented yLinear Addressing: (MC68K, i8085) xThe entire memory is regarded as a whole Specify absolute addresses in instructions xThe entire memory space is available all the time ySegmented Addressing: (ix86) xMemory is divided into segments Specify an address as offset relative to segment base address Programs use offsets as logical address, independent of where segments are located (relocatable) xProcess is limited to access designated segments at a given time

33 33 Segment Registers z8086: 1M, divided into 64K (2^16) memory segments y16-bit offset/logical address (relative to segment base address) z4 active segments, pointed to by yCS (program codes), DS (data for program), ES (extra/shared data), SS (stack or subroutine/ISR return addresses) segment registers z8085: 64K x 1, for program and data yStack contents may overwrite data and code yLimited program code size

34 34 8086 Programming Model: Segment Registers zSegment Group: divide memory into yCS: Code Segment [64KB for each segment] yDS: Data Segment yES: Extra Segment ySS: Stack Segment zSegment Registers: CS, DS, ES, SS ySave Base addresses to particular segments ySEG(16-bit):OFFSET(16-bit) is used by BIU to calculate (20-bit) physical memory address [DS:0] = byte 0 (the 1 -st byte) in Data Segment 0 = offset to begin-of-DS

35 35 Logical and Physical Addresses zPhysical Address: 20-bit zIndex/segment registers: 16-bit yLogical address in index registers: 16-bit yBase address in segment registers: 16-bit+0000 2 x16-byte segment boundaries xCan define a segment at absolute addresses: 16N+0. yAddress Translation: xPhys-address = Base*16+Index (or offset) xE.g., CS:IP, DS:2345H, …

36 36 Default Segment Registers zThe default segment register depends on the instruction being used yMOV [BP], AL ; AL := *(SS:BP) yNext instruction  CS:IP zAlternative segment: yDefault segment can be changed using segment override operator (for some memory reference types)

37 37 Default Segment Registers Type of referenceDefault segment Alternative segment Offset (logical address) Instruction fetchCS-IP Stack operationsSS-SP General dataDSCS,ES,SSEffective address String sourceDSCS,ES,SSSI String destinationES-DI BX used as pointerDSCS,ES,SSEffective address BP used as pointerSSCS,ES,DSEffective address

38 38 Segmented Memory zAdvantages yCS+DS1/DS2/…: different DS’s for one program yCS1 => CS2: xre-allocatable codes for task switching xRun at any location (if no reference to physical address) zDisadvantages yComplex hardware: requires two registers (e.g., DS:SI) yLimited segment size for a program x64K or, if larger, switching between 64K’s x386~: large segment up to 4G (flat mode, disabling segmented memory)

39 39 8086 and 8088 Processors Processor Model: BIU+EU Programming Model: Data Registers + Segments Memory Banks Issues

40 40 8086 Memory Organization: Memory Banks Operation Issues zMemory Space: y20-bit address bus yIf linearly organized, 1M bytes addressable zFeatures of 8086 Memory Organization yMemory Banks (x2): xfor concurrently accessing two bytes ySegmented Memory Address Space xfor flexible memory management xcompatible with 8085 memory management (16-bit addresses)

41 41 FIGURE 3-6 The memory space of the 8086 consists of 1,048,576 bytes or 524,288 16-bit words. John Uffenbeck The 80x86 Family: Design, Programming, and Interfacing, 3e Copyright ©2002 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. Mis-aligned words (started at odd-byte address) Aligned words (started at even-byte address)

42 42 8086 Memory Organization: Memory Banks (1) z2**20 addresses can be arranged as: y2**20 (8-bit) bytes in continuous locations, or y2**19 16-bit aligned words (in parallel banks) xPlus 2**19-1 mis-aligned words yNot a single way, but … zMemory organization & interfacing should reflects the access behavior of the processor in the most natural way. Two types of operations: yByte: I/O, character data (e.g., ASCII) yWords: large 16-bit integers arithmetic operations

43 43 8086 Memory Organization: Memory Banks (2) zEven and Odd Memory Banks y16-bit data bus  two-byte / two one-byte access yAllows processor to work on bytes or on (16-bit) words xIO operations are normally conducted in bytes xArithmetic operations may use large 16-bit words yCan handle odd-length instructions xSingle byte instructions xMultiple byte (and very long) instructions

44 44 FIGURE 3-7 (a) By reading from an even-addressed bank and an odd-addressed bank the 8086 can read two bytes from memory simultaneously. (b) If the 16-bit word begins at an odd address, the 8086 will require two memory read or write cycles. word 0 word 4 Copyright ©2002 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. John Uffenbeck The 80x86 Family: Design, Programming, and Interfacing, 3e

45 45 odd bank (bytes 1, 3, 5, 7) even bank (bytes 0, 2, 4, 6) D0~D7 D8~D15 BLE (A0)BHE A1~A19 (word address) 8086 uses A19~A1 to specify a word address, and uses A0 (also know as BLE) and BHE to select individual bytes in the word, or select both (the word). Word address (with 2 banks) = Byte address div 2 (= discarding the least significant address bit) e.g., bytes xxxx…zzz0 and xxxx…zzz1 belong to word xxxx…zzz (x,z: 0’s or 1’s) Even/Odd Banks, Word Addresses vs. Byte Addresses

46 46 8086 Memory Organization: Memory Banks (3a) zMemory Banks yCan read 16-bit data simultaneously x  One from odd-addressed byte, another from even- addressed byte x  Need TWO memory banks in parallel yByte Access = Word Access + Byte Enable xSelect byte(s) in accessed word xBLE: Byte/Bank Low Enable Labeled as A0 in 8086 Confusing: No A0, A1 in later x86 CPUs xBHE: Byte/Bank High Enable

47 47 8086 Memory Organization: Memory Banks (3b) zAddress Decoding (difference with 8085/88) y8085/8088: (byte-oriented memory access) xA0~A15: issue byte addresses to address individual bytes xA0~An-1 (low order n address bits): connected to A0~An-1 of memory chips with n address pins xAn~A15 (high order address bits): for chip selection (CS) y8086: (word-oriented memory access) xA1~A19: issue word addresses to address words (= 2 parallel bytes) A1~An: Connected to A0~An-1 of two parallel memory banks An+1~A19: for chip selection (CS) xA0(BLE#) and BHE# control line: to select a byte from even/odd bank for byte-based operations and operations involving misaligned operands, or select both bytes for word-based operations.

48 48 FIGURE 7-22 64K x 8 8088(/85) SRAM interface. Only a single memory chip is required. John Uffenbeck The 80x86 Family: Design, Programming, and Interfacing, 3e Copyright ©2002 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. A0-A15 A19-A16A15-A0 Processor (Byte) Address Byte address D0~D7 CS1#

49 49 FIGURE 7-26 128K x 8 8086 SRAM interface. John Uffenbeck The 80x86 Family: Design, Programming, and Interfacing, 3e Copyright ©2002 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. (BLE#) A19-A17A16-A1 Processor (Word) Address BH/BLE Word address LSB MSB (aligned) A1~A16 AL AH A17~A19 (BHE#)

50 50 Memory Organization: Alignment zEndianess: ySingle way to model multi-byte CPU register xAX  AH+AL (high order byte in AH, low order byte in AL) yTwo ways to store operands in memory zBig-endian CPU: (IBM370, M68*, Sparc) yHigh-order-byte-first (HOBF) yMaps highest-order byte of internal register  lowest (1 st ) memory byte address yOperand address == address of 1st memory byte (MSB) xMOV R1, [N]  N: addr. of 1 st byte in memory & MSB of register xN, N+1, N+2, …: addresses of all bytes, MSB at addr [N] Addrdata 0000H1 0001L1 0002H2 0003L2 0004H3 0005L3

51 51 Memory Organization: Alignment zLittle-endian CPU: (DEC, Intel) yLow-order-byte-first (LOBF) yMaps lowest-order byte of register  1 st memory byte yOperand address == address of 1 st memory byte (LSB) xMOV AX, [N]  N: addr of 1 st byte in memory & LSB of register xAL  [N], AH  [N+1] (addr [N], +1, +2, …: low-to-high order) zConfigurable CPU: yCan switch between Big/Little-endian, or yProvide instructions which convert 16-/32-bit data between two byte ordering (80486) Addrdata 0000L1 0001H1 0002L2 0003H2 0004L3 0005H3

52 52 Memory Organization: Alignment b-AddrData 0000L1 0001H1 0002L2 0003H2 0004B3 0005L4 0006H4 0007L5 0008H5 0009… 000a… 000b… b-AddrData 0000L1 0002L2 0004B3 0006H4 0008H5 000a b-AddrData 0001H1 0003H2 0005L4 0007L5 0009… 000b w0 w1 w2 w3 w4 w5 w-Addr w0 w1 w2 w3 w4 w5 Even bankOdd bank Registers AHAL BHBL CHCL DHDL Aligned Mis-Aligned Single memory chip (8-bit data bus) Little endian 16-bit data bus

53 53 8086 Memory Organization zAligned operand: yOperand aligned at even-byte (word/dword) boundaries yAllows single access to read/write one operand zMis-aligned words: yWord operand does not start at even address xThrough internal shift/swap mechanism, if necessary (e.g., load mis-aligned MSB/LSB to AH/AL) yNeed 2 read cycles to read/write the word (8086) xIssues two addresses to access the two even-aligned words containing the operand in order to access the operand xslower but transparent to programmer

54 54 FIGURE 7-26 128K x 8 8086 SRAM interface. John Uffenbeck The 80x86 Family: Design, Programming, and Interfacing, 3e Copyright ©2002 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. AL AH Word address LSB MSB (mis-aligned) A1~A16 (BLE#)(BHE#) A19-A17A16-A1 Processor (Word) Address BH/BLE A17~A19

55 55 8086 Memory Organization z8088: 8-bit data bus (same as 8085) yAlways 2 cycles for word-based operations xAligned or not xSlower word operations ybecause of 8-bit external data bus xUse single memory bank to store multi-byte operands (like i8085)

56 56 8086 Memory Map zMemory Map: How memory space is allocated yROM Area: boot, BIOS yRAM: OS/User Apps & data yUnused yReserved: for future hardware/software uses yDedicated: for specific system interrupt and rest functions, etc.

57 57 386 Processor Review 286/386 Protected Modes Processor Model: BIU+CPU+MMU Programming Model: GPR + SPR (Segments vs. Paging) + PL

58 58 80286 (Review) zFirst with Protection Mode zReview of 286 Protected Mode … Next

59 59 80286 zBecame available in 1982 zused in IBM AT computer (1984) y16-bit data bus y24-bit address bus (16 MB) x(vs. 20-bit/1M 8086) zclock speed 25% faster than 8088, throughput 5 times greater than 8088

60 60 80286: Real vs. Protected Modes zLarger address space: 24-bit address bus yReal Mode vs. Protected Mode zReal Mode: (8086 Mode) yPower on default mode yFunction like a 8086: use 20-bit least significant address lines (1M) ySoftware compatible with 8086, 186 y16 new instructions (for Protected Mode management) yFaster 286: redesigned processor, plus higher clock rate (6-8MHz)

61 61 80286: Real vs. Protected Modes zProtected Mode: yMulti-program environment yEach program has a predetermined amount of memory yAddressed via segment selector (physical addresses invisible): 16M addressable xEasy program switching y“Protected mode”: Multiple programs loaded at the same time (within their respective segments), protected from read/write by each other; a program running in another segment cannot Read/Write other segments

62 62 80286: Real vs. Protected Modes zProtected Mode: yCannot be switch back to real mode to avoid illegal access by switching back and forth between modes zA faster 8086 only? yMS-DOS requires that all programs be run in Real Mode

63 63 80386 Model zRefine 286 Protect Mode yReal & Protected Modes zExpand to 32-bit registers zNew Virtual 8086 Mode zComponents: BIU, CPU, MMU

64 64 80386 Review

65 65 80386DX (aka. 80386) zavailable in 1985, a major redesign of 86/286 yCompatibility commitment through 2000 z32-bit data and address buses (4 GB memory) yReal Address Mode: 1M visible, 286 real mode yProtected Virtual Address Mode: xOn board MMU xSegmented tasks of 1byte to 4G bytes Segment base, limit, attributes defined by a descriptor register xPage swapping: 4K pages, up to 64TB virtual memory space xWindows, OS/2, Unix/Linux

66 66 80386DX (aka. 80386) zVirtual 8086 mode (a special Protected mode feature): permitted multiple 8086 virtual machines- multitasking (similar to real mode) yWindows (multiple MSDOS’s) zClock rate: ymax. 40MHz, 2 pulses per R/W bus cycle yExternal memory cache to avoid wait xFast SRAM x93% hit rate with 64K cache zCompatible instructions (14 new)

67 67 80386 Review zEnd of review …

68 68 80386: Real vs. Protected Modes zLarger address space: 32-bit address bus (4G) yReal Mode vs. Protected Mode (refined from 286) zReal Mode: (8086 Mode) yPower on default mode yFunction like a 8086: (1) use only 20-bit least significant address lines (1M) (2) segmented memory retained (64K) ySoftware compatible with 286 zNew Real Mode Features: yaccess to 32-bit register set ytwo new segments: F, G

69 69 80386: Real vs. Protected Modes zProtected Mode: ynew addressing mechanism x(in contrast to that of real mode) ysupports protection levels (PLs) ySegment size: 1 to 4G (not fixed size 64K) ySegment register (16-bit): as POINTER to a descriptor table xNOT as BASE address of a segment x13-bit index to table, 1-bit local/global flag, 2-bit: RPL

70 70 80386: Real vs. Protected Modes zProtected Mode: (cont.) ydescriptor table: (8 byte per entry) x32-bit base address of segment xsegment size (20-bit): in byte (max=1M, G flag=0) or in 4k-page (max=4G, G=1) xaccess rights (and status & control information) ymemory address x= base address (in table) + offset (in instruction)

71 71 80386: Real vs. Protected Modes zProtected Mode: (cont.) yPaging mechanism for virtual memory: xmap 32-bit linear address (base+offset) => physical address & page frame address x  (4K page frames in system memory) x  up to 64TB of virtual memory yPaging mechanism can be turned off

72 72 MOV AL, [00000FFFh] 00000FFFh 00000FFFh 12345FFFh

73 73 80386: Real vs. Protected Modes zProtected Mode: (cont.) yProtection mechanism: xtasks/data/instructions are assigned a privilege level (PL) xtasks running at lower PL cannot access tasks or data segments at a higher PL yOS runs multiple programs that are protected from the others

74 74 80386: Real vs. Protected Modes zTwo Ways to Run 8086 Programs: yReal Mode yVirtual 8086 Mode zVirtual 8086 Mode: yruns multiple 8086 +other 386 (protected mode) programs independently yeach task sees 1 MB (mapped via paging to anywhere in 4GB space) yrunning V8086+ Protected mode simultaneously y8086 tasks is assigned the lowest privilege level, cannot access programs/data in other segments

75 75 386 Processor Review 286/386 Protected Modes Processor Model: BIU+CPU+MMU Programming Model: GPR + SPR (Segments vs. Paging) + PL

76 76 John Uffenbeck The 80x86 Family: Design, Programming, and Interfacing, 3e Copyright ©2002 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. execution fetching Address generation (paging) Address generation (segment) decode FIGURE 3-11 The processor model for the 80386 microprocessor consists of the bus interface unit (BIU), central processing unit (CPU), and the memory management unit (MMU).

77 77 80386 Processor Model

78 78 80386 Processor Model: BIU+CPU+MMU zBIU ycontrol 32-bit address and data buses ykeep instruction queue full (16 bytes) yNew features: address pipelining & dynamic bus sizing

79 79 80386 Processor Model: BIU+CPU+MMU zAddress Pipelining yaddress of next memory location is output halfway through current bus cycle xGives external memory more address decode time xslower memory chip is OK xeasier to keep up with faster (2 CLK) bus cycle of 386 address data R / W address data R / W

80 80 80386 Processor Model: BIU zDynamic Data Bus Sizing ySwitch between 16  32-bit data bus on the fly xaccommodate to external 16-bit memory cards or IO devices yAdjust bus timing to use only the least significant 16 bits

81 81 80386 Processor Model: BIU zExternal Memory Banks y4 memory banks (4x8=32bits) yA2~A31: issues double word (32bit) addresses yBE0-BE3 for bank selection (there is no A0~A1) xaccess byte or word or double word xaligned operands: 1 bus cycle xmis-aligned (addr not at 4N, i.e., %4 !=0): 2 bus cycles

82 82 80386 Processor Model: BIU zExternal Memory Banks BE0 bank-0 (bytes 0, 4, 8) bank-1 (bytes 1, 5, 9) bank-2 (bytes 2, 6, a) bank-3 (bytes 3, 7, b) BE1BE3BE2D0~D7D8~D15D16~D23D24~D31 A2~A31 (double word address)

83 83 80386 Processor Model: CPU zCPU=IU (instruction) +EU (execution) yfetching & execution can overlap zIU: yretrieval instructions from queue yDecode instruction ystore in decoded queue zEU: ALU + registers (32-bit) yexecute decoded instructions

84 84 80386 Processor Model: MMU=Segmentation+Paging Units zSegmentation unit yReal mode: generate the 20-bit physical address yProtected mode: store base/size/rights in descriptor registers xcache descriptor tables in RAM xfaster switching between tasks zPaging Unit ydetermines physical addresses associated with active segments (divided into 4K pages) yvirtual memory support to allow larger programs

85 85 386 Processor Review 286/386 Protected Modes Processor Model: BIU+CPU+MMU Programming Model: GPR + SPR (Segments vs. Paging) + PL

86 86 80386 Programming Model: GPR zGeneral Purpose Registers (GPR) yData & Addresses Groups yStatus & Control Flags ySegment Group

87 87 80386 Programming Model: GPR zGeneral Purpose Registers (Ia) yData & Addresses Groups xData/Pointer/Index registers x32-bit width (max) yData Group: x8-bit: AL, AH, BL, BH, CL, CH, DL, DH x16-bit: AX, BX, CX, DX x32-bit: EAX, EBX, ECX, EDX yAccumulators: x8-bit: AL, 16-bit: AX, 32-bit: EAX

88 88 80386 Programming Model: GPR zGeneral Purpose Registers (Ib) yData & Addresses Groups xData/Pointer/Index registers x32-bit width (max) yPointer & Index Group: x keep offset (logic address) relative to base address of a segment x 16-bit: SP, BP, SI, DI, IP => x 32-bit: ESP, EBP, ESI, EDI, EIP

89 89 John Uffenbeck The 80x86 Family: Design, Programming, and Interfacing, 3e Copyright ©2002 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. FIGURE 3-12.a Programming model for the 80386. The general-purpose registers (a) are used by applications programmers. The special-purpose registers (b) are intended to be used by the operating system software.

90 90 80386 Programming Model: GPR zGeneral Purpose Registers (II) yStatus & Control Flags: xEFLAGS: 32-bit, 4 new flags xVM: used to switch to V8086 mode xRF: resume from debug mode to normal execution (used with debugging registers) xNT: nested task (current task was called from another task) To determine type of return instruction xIOPL: current I/O privilege level (2-bit, PL 0-3) required to execute I/O instructions OS control over I/O access

91 91 John Uffenbeck The 80x86 Family: Design, Programming, and Interfacing, 3e Copyright ©2002 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. FIGURE 3-13 The 80386 flag word is 32 bits long. Four new flags have been added compared to the 8086: VM, RF, NT, and IOPL. (Courtesy of Intel Corporation.)

92 92 80386 Programming Model: GPR zGeneral Purpose Registers (III) ySegment Group: xCS, SS, DS, ES + FS, GS (new, not as default segment) xRemain 16-bit (NOT 32-bit) xReal mode: as segment base (for 8086 mode operation) xProtected mode: pointer to description table NOT as base address of segment base address is saved in a descriptor table

93 93 80386 Programming Model: SPR zSpecial Purpose Registers (I) (3.11(b)) yFor Protected Mode control & testing yCR0 (32): used to enable paging mechanism, monitor task switching, enable co-processor emulation, select protected mode yCR2 (32) (page fault linear address): a reference to a page/segment that must be loaded into memory yCR3 (32) (page directory base): base address of the page table xTable: starting address of each page frame and access information of that frame

94 94 80386 Programming Model: SPR zSpecial Purpose Registers (II) ySystem address registers x 4 xGDTR/32(+limit/16), IDTR/32(+limit/16), TR/16, LDTR/16(+16 0’s) xDescriptor table information (more …) yDebug registers (32) x 6 xTo set program break points yTest registers (16) x 2 xTo test the RAM in Translation Lookaside Buffer (TLB, for virtual-to-physical address translation) (more …)

95 95 John Uffenbeck The 80x86 Family: Design, Programming, and Interfacing, 3e Copyright ©2002 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. FIGURE 3-12.b Programming model for the 80386. The general-purpose registers (a) are used by applications programmers. The special-purpose registers (b) are intended to be used by the operating system software.

96 96 80386 Programming Model: Segments zMemory Management ySegment descriptors xFunction: keep base address, size, access rights x3 types of tables: global (GDT), local (LDT), interrupt (IDT) xGDT: pointing to segments that may be accessible to all tasks xLDT: pointing to segments associated with a given task Each task may have one LDT xIDT: point to the starting addresses of interrupt service routines Max. 256 ISR’s (processor faults, hardware/software INT’s)

97 97 80386 Programming Model: Segments zMemory Management (cont.) ySegment addressing: xSegment Register = Index + 1-bit Global/Local Flag + RPL (requesting privilege level) xIndex => point to a descriptor table xAddress = base (in table) + offset (from instruction) y13-bit index: 8K descriptors x(8K GDT+8K LDT)x4G = 64T virtual space x=(13+1+32)= 46-bit virtual address (1: GDT/LDT) yBase+limit is stored in GDTR/LDTR xLDTR: 16-bit register, padded with 16 0’s (i.e, 64K descriptor table boundaries) to form a 32-bit base address to LDT

98 98 FIGURE 3-15 80386 Protected Mode addressing. Physical addresses are computed by adding the instruction offset to the segment base address stored in a descriptor table. The upper 13 bits of the segment register are used to point to a specific descriptor. The base address and limit of the descriptor tables are stored in the global and local descriptor table registers (GDTR and LDTR). John Uffenbeck The 80x86 Family: Design, Programming, and Interfacing, 3e Copyright ©2002 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. LDTR (16b): padded with 16 0’s

99 99 E (xecutable) C/ED(confirming/exec utable direction) R/W (read/write) A(accessed) (bytes/pages)

100 100 FIGURE 3-14 In Protected Mode each segment register points to the base of a descriptor table. Entries in these tables, called descriptors, are eight bytes long and specify the starting address of the segment, its size limit, and its attributes. John Uffenbeck The 80x86 Family: Design, Programming, and Interfacing, 3e Copyright ©2002 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. Limit/Size: G=0  2^20 bytes (=1MB) G=1  2^20 pages (4k) (= 4GB)

101 101 John Uffenbeck The 80x86 Family: Design, Programming, and Interfacing, 3e Copyright ©2002 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. FIGURE 3-16 Example showing register LDTR pointing to the base of the local descriptor table (LDT) at address 00050000H. The CS register is pointing to descriptor number two in this table. The eight bytes that make up this descriptor specify a 512 KB memory segment beginning at address 20000000H. LDTR: padded with 16 0’s 0017 = 2 + LDT(1) + 11 Base=20.00.00.00 Size=07.FF.FF

102 102 80386 Programming Model: Paging zMemory Management (cont.) yPaging: (CR0 bit31=enable paging) x13-bit index + Global/Local flag = 8K x 2 descriptors xEach descriptor point to 4G (2**32) xAddressing space for a task: 16K x 4 G = 64T xNeed a paging mechanism to support virtual memory if less than 64T physical memory (only 4G for 386) yA page translation mechanism is added when paging is enabled (Fig. 3.14) xTo Compute physical address within a 4 K page frame & the address of the page frame xvia. Page Directory & Page Table

103 103 80386 Programming Model: Paging zMemory Management (cont.) yPage Fault: requested page is not in real memory yPage Swapping: xSwap out unused and swap in requested pages xnormally by LRU (Least Recently Used) strategy yTLB (Translation Lookaside Buffer): xContains the addresses of the 32 most recently accessed page frames (coverage: 4K x 32 = 128K bytes) xFor fast page look-up xFor reducing page miss: 98% page hit (in TLB)

104 104 Dir(10)+Pag(10)+Offset(12)  4k repageable pages  [VM image]

105 105 Memory Paging z The memory paging mechanism allows any physical memory location to be assigned to any linear address. y Linear address: the address generated by a program. y Physical address: the actual memory location accessed by a program. z With memory paging, the linear address is invisibly translated to any physical address. y Data are not stored where the program think of

106 106 Figure 2 – 13 The paging mechanism in the 80386 through Core2 microprocessors. 4M Phys Mem. 4K Pages. 1024 page tables per dir 32-bit addrs

107 107 Figure 2 – 11 The control register structure of the microprocessor.

108 108 Figure 2 – 12 The format for the linear address (a) and a page directory or page table entry (b).

109 109 FIGURE 3-17 When paging is enabled, linear addresses are translated into physical addresses via the Page Directory and Page Translation tables. John Uffenbeck The 80x86 Family: Design, Programming, and Interfacing, 3e Copyright ©2002 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved.

110 110 FIGURE 3-18 Example showing how linear address 00803FFE is translated into physical address A0000FFE. John Uffenbeck The 80x86 Family: Design, Programming, and Interfacing, 3e Copyright ©2002 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved.

111 111 80386 Programming Model: PL zProtection: yassign PL (Privilege Level) to resources to prevent lower privilege tasks from accessing high privilege resources yPL 0~3, 0: highest privilege ytask: CPL yInstruction/segment register: RPL ydata segment: DPL yRule: EPL > DPL => general protection fault x where EPL=max(RPL,CPL)

112 112 80386 Programming Model: PL zProtection (cont.): Gates ySpecial descriptors that allows access to higher PL tasks from lower PL tasks xBy accessing lower PL gates, which can access higher PL resources: EPL <= DPL(gate) zTypes yCall gates: provide access to high PL codes yTask gates: for task switching yInterrupt gates: to specify ISR’s yTrap gates: to specify trap (error) handling routines

113 113 486 Processor Processor Model: 386 + FPU + Cache Programming Model: =386

114 114 80486 Model … zFactors that make it faster than 386 zOperation of Direct mapped cache z486 cache diagrams yCache RAM, Tag RAM, LRU SRAM yData & control registers within FPU

115 115 80486 Review …

116 116 80486DX z1989: a polished 386, 6 new OS level instructions zvirtually identical to 386 in terms of compatibility zRISC design concepts yfewer clock cycles per operation, a single clock cycle for most frequently used instructions yMax 50MHz y5 stage execution pipeline xPortions of 5 instructions execute at once

117 117 80486DX zHighly Integrated: yOn board 8K memory cache yFPP (equivalent to external 80387 co-processor) zTwice as fast as 386 at any given clock rate y20Mhz 486 ~= 40Mhz 386

118 118 80486SX z80486SX yNOT a 16-bit version for transition purpose yno coprocessor yNo internal cache yFor low-end applications yMax. 33Mhz only

119 119 80486DX2/DX4: Overdrive Chips zProcessor speed increased too fast yRedesign of microcomputer for compatibility becomes harder ySolution: Separating internal speed with external speed, improve performance independently z80486DX2/DX4 – internal clock twice/three times (NOT four times) the external clock: runs faster internally

120 120 80486DX2/DX4: Overdrive Chips zSystem board design is independent of processor upgrade (less expensive components are allowed) zProcessor operate at maximum speed data rate internally yOnly slow access to external data operates at system board rate yInternal cache offset the speed gap z486DX2 66: 66 internal, 33 external z486DX4 100: 100 internal, 33 external (3x) zOverdrive sockets: for upgrading 486dx/sx to 486dx2/dx4 (with overdrive socket pin-outs)

121 121 486 Processor Features z386 features: yReal/Protected Modes yMemory Management yPL’s yregisters & bus sizes zNew features y6 OS instructions y8K/16K onboard cache (was external before 386) yFPU (was external)

122 122 486 Processor Features zA Better 386 y5 stage instruction pipeline xIF/ID/EX => PF/D1/D2/EX/WB xPF: pre-fetch instructions => Q (2*16-bytes) xD1: determine opcode xD2: determine memory address of operands xEX: execute indicated OP xWB: update internal register

123 123 486 Processor Features zReduced Instruction Cycle Times y5 stage instruction pipeline (e.g., Fig. 3.18) yinstruction cycle times: x8086: 4 CLK x80386: 2 CLK x80486: 1 CLK (  close to RISC) xabout 2X faster than 386

124 124 486 Processor Model: 386+FPU+Cache z386 units retained: BIU, CPU, MMU znew: FPU (80387) + Cache (8K/16K) zFPU: y387 onboard x0.8 u => #transistors increased (275K => 1+ millions) xsimplified system board design xspeedup FP operations

125 125 FIGURE 3-11 The processor model for the 80386 microprocessor consists of the bus interface unit (BIU), central processing unit (CPU), and the memory management unit (MMU). John Uffenbeck The 80x86 Family: Design, Programming, and Interfacing, 3e Copyright ©2002 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved.

126 126 John Uffenbeck The 80x86 Family: Design, Programming, and Interfacing, 3e Copyright ©2002 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. execution fetching Address generation (paging) Address generation (segment) decode FIGURE 3-11 The processor model for the 80386 microprocessor consists of the bus interface unit (BIU), central processing unit (CPU), and the memory management unit (MMU).

127 127 FIGURE 3-24 The processor model for the 80486 microprocessor is the same as that for the 80386 except for the on-board cache and floating-point cache. John Uffenbeck The 80x86 Family: Design, Programming, and Interfacing, 3e Copyright ©2002 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved.

128 128

129 129 486 Processor Model: Cache zCache (8K/16K (dx4)) yFunction: bridge processor memory bandwidth x8088: 4.77MHz x80486: 50MHz xPentium: 100MHz xPentium Pro: 133 MHz x  Main Memory (DRAM): relatively slow yFast Static RAMs (SRAM) as cache xProcessor  Fast Cache (lines)  Slow Main Memory (blocks) xA running block of main memory is copied to cache line when needed and not in cache Cache miss: An unused cache line is updated and trashed if all cache lines are being used while trying to copy

130 130 486 Processor Model: Cache zOrganization: ySize: 8K = 2K x 4 yMapping: 4-way set associative x4 direct mapped caches wired in parallel xeach block maps to a set of 4 cache lines yUnified: data & code in the same cache yWrite-through update policy: update cache and memory page on write operations

131 131 486 Processor Model: Cache zLocality: Why caches help? yspatial locality: e.g., array of data ytemporal: e.g., loops in codes zOperations on cache hit/miss yHit: memory copy is found in cache, use cached copy yMiss: memory copy not found in cache xLoad memory copy to one of allowed free/unused cache blocks xIf none is free, replace a less recently used (update replaced block) zSize of cache line: 128-bit (16-byte) cache lines y32-bit x N to catch locality (N=4) y128-bit = 16-byte = 4 x 32-bit double-words

132 132 486 Processor Model: Cache zMapping: yMemory => Cache: many-to-many xNeed to remember where the cached data came from To decide if a memory copy is in cache To update cache copy to right memory block (when replaced) yCache = Data RAM + Tag RAM xData RAM: save memory data xTag RAM: save memory address & access status information

133 133 MOV EAX, [addr-32] Memory Block  Cache Line

134 134 486 Processor Model: Cache zmain memory size = 2**(X+W+B) bytes ymain memory divided into blocks of size 2**(W+B) bytes ymain memory address: X+W+B bits yX (the “block number”) is treated differently depending on the cache organization yE.g., 16-byte as a block (  a cache line) xB=2 [32-bit-word=4-byte=2**2] xW=2 [16-byte=2**4=2**2x2**2=2**2 x 32-bit-word] xX=32-4=28 [2*28 blocks, each having 16 bytes]

135 135 MOV EAX, [addr-32]

136 136 486 Processor Model: Cache z3 methods of mapping yFully associative: map any memory block to any cache line yDirect map: map each memory block to specific cache line ySet associative (M-way): map each memory block to a set of M cache lines

137 137 8-byte/block & 7-bit address…

138 138 Cache: Fully Associative Cache  Fully associative: memory block to any cache line ( 有空位就停 : 停車容易, 找車難 ) yFlexible to save memory blocks into cache lines xSmall trashing rate y#tag_bits is largest xSince all X bits for block numbers are used as tags y#comparators=#cache_lines xparallel comparison with all cache lines xlargest number of comparators

139 139 8-byte/block & 7-bit address… Cache index need not be saved as tag #tag_bits is small #Comparators=1

140 140 Cache: Direct Mapped Cache  Direct map: memory block to specific cache line ( 指定車位 : 停車難, 找車容易 ) yEasily mapped to: block_num % number_of_cache_lines y#tag_bits: smallest xsince index to cache line need not be saved as tags y#comparators=1 (the cache-index-selected one) yTrashing: repeatedly access memory that maps to the same cache; repeatedly swapped in/out; increase access time

141 141 Tag RAM (2 bits)

142 142 Cache: M-way Set Associative  Set associative: memory block to a set of M cache lines (A compromise between fully associative and direct mapped organizations) ( 指定 M 車位 : 停找均易 ) y#sets = #cache_lines/M (-way) x e.g., 2-way into 8 lines => 4 sets (addressed by 2-bit set index) y#tag_bits: medium (X – set index bits) xSet index need not be saved as tags A set index => M Cache indexes (e.g., M= lines_per_set = 2) y#comparators = M xParallel comparison with set-index-selected M lines yTrashing_rate: medium

143 143 486 Processor Model: 4-way Set Associative Cache zReplacement policy (LRU) y4 valid bits: all 4 lines in use ? xNO => use any unused line (& save tag bits) xYES => find one to replace yLRU bits: which is least recently used xB0, B1, B2 => L0 ~ L3 xB0=0 => L0/L1, if B1=0 => L0, else L1 xB0=1 => L2/L3, if B2=0 => L2, else L3 yBurst mode to fill cache line: 4 bytes per cycle (Chapter 7)

144 144

145 145 Pentium Processor Processor Model: 486 + u, v Programming Model: =386

146 146 Pentium Review … zBlock diagram: BIU, CPU, FPU, MMU, cache zU, v pipelines

147 147 Pentium: Superscaler Processor zavailable in 1992 z32-bit architecture zSuperscaler architecture yScaling: scaling down etchable feature size to increase complexity of IC (e.g., DRAM) x10 microns/4004 to 0.13 microns (2001) ySuperscaler: go beyond simply scaling down yTwo instruction pipelines: each with own ALU, address generation circuitry, data cache interface yExecute two different instructions simultaneously

148 148 Pentium: Superscaler Processor zOnboard cache ySeparate 8K data and code caches to avoid access conflicts zFPP zInstruction pipeline: 8 stage zOptimized floating point functions y5x-10x FLOP’s of 486 y2x performance of 486 at any clock rate

149 149 Pentium: Superscaler Processor zCompatibility with 386/486: yInternal 32-bit registers and address bus yData bus expanded to 64-bits for higher data transfer rate xCompare 8088 to 386sx transition

150 150 Pentium: Superscaler Processor znon-clone competition from AMD, Cyrix zdevelopment of brand identity by Intel

151 151 Pentium Model … zBlock diagram: (3.23) yBus interface: 64-bit data bus (was 32) xBurst mode of transfer for fast cache fill xBTB: pre-fetch for jumped instructions yCPU, MMU yFPU: 8-stage pipeline yCache units: x32 bytes (was 16) x8K data + 8K code x2-way (not 4-way) set associative xHigher hit rate (32-byte lines) xHigher trashing (2-way, not 4-way)

152 152 Pentium Model … zOperations of u & v pipelines yu: all instructions yv: simple integer instructions yPre-fetcher sort the incoming instructions y2 simple instructions per clock cycle (in parallel) zVersions of processors

153 153 Pentium Pro Model … zProcessing cycles against sequential fetch+execution processors zBlock diagram: yBIU, CPU, MMU, FPU, cache, APIC zComparison in performance

154 154 Pentium Pro Review …

155 155 Pentium Pro: Two Chips in One zBecame available in 1995 zSuperscaler of degree 3 yCan execute 3 instructions simultaneously zOptimized for 32-bit operating systems (e.g., Windows NT, OS2/Warp) zTwo separate silicon die on the same package yProcessor: 0.35 u, 5.5 million transistors y256KB(/512K) Level 2 cache included on chip, 15.5 million transistors in smaller area

156 156 Pentium Pro: Two Chips in One zOn Board Level 2 cache ySimplifies system board design yRequires less space yGains faster communication with processor zInternal (level 1) cache: 8K zPentium Pro 133 ~= 2x Pentium 66 ~= 4x 486DX2 66

157 157 Pentium Pro:Dynamic Execution zDynamic execution: reduce idle processor time by predicting instruction behaviors yMultiple Branch Prediction: look as far as 30 instructions ahead to anticipate program branches yData Flow Analysis: looks at upcoming instructions and determine if they are available for processing, depending on other instructions. Determine optimal execution sequences. ySpeculative Execution: execute instructions in different order as entered. Speculative results are stored until final states can be determined.


Download ppt "1 Chapter 3 80x86 Processor Architecture 8085 (review) – typical, single segment 8086/88 – pipeline + segments 80286/386 – real(8086)/protected mode 80386."

Similar presentations


Ads by Google