Download presentation
Presentation is loading. Please wait.
Published byShawn Carter Modified over 7 years ago
1
EC6013-ADVANCED MICROPROCESSOR AND MICROCONTROLLER
2
Objectives Study the fundamentals of microprocessor architecture .
Learn the advanced features in microprocessors and microcontrollers. Study the Architecture of Various microcontrollers. 2
3
Syllabus UNIT I -HIGH PERFORMANCE CISC ARCHITECTURE – PENTIUM 9
CPU Architecture- Bus Operations – Pipelining – Branch prediction – floating point unit- Operating Modes –Paging – Multitasking – Exception and Interrupts – Instruction set – addressing modes – Programming the Pentium processor. UNIT II-HIGH PERFORMANCE RISC ARCHITECTURE – ARM 9 Arcon RISC Machine – Architectural Inheritance – Core & Architectures - Registers – Pipeline - Interrupts – ARM organization - ARM processor family – Co-processors - ARM instruction set- Thumb Instruction set - Instruction cycle timings - The ARM Programmer‟s model – ARM Development tools ARM Assembly Language Programming - C programming – Optimizing ARM Assembly Code – Optimized Primitives. 3
4
Syllabus UNIT III-ARM APPLICATION DEVELOPMENT (9)
Introduction to DSP on ARM –FIR filter – IIR filter – Discrete fourier transform – Exception handling – Interrupts – Interrupt handling schemes- Firmware and bootloader – Embedded Operating systems – Integrated Development Environment- STDIO Libraries – Peripheral Interface – Application of ARMProcessor - Caches – Memory protection Units – Memory Management units-Future ARM Technologies. UNIT IV - MOTOROLA 68HC11 MICROCONTROLLERS (9) Instruction set addressing modes – operating modes- Interrupt system- RTC- Serial Communication Interface – A/D Converter PWM and UART. UNIT V - PIC MICROCONTROLLER (9) CPU Architecture – Instruction set – interrupts- Timers- I2C Interfacing – UART- A/D Converter –PWM and introduction to C-Compilers TOTAL: 45 PERIODS 4
5
Text Books [1] Andrew N.Sloss, Dominic Symes and Chris Wright “ ARM System Developer‟s Guide : Designing and Optimizing System Software” , First edition, Morgan Kaufmann Publishers, 2004. 5
6
References 1.Steve Furber , “ARM System –On –Chip architecture”, Addision Wesley, Daniel Tabak , “Advanced Microprocessors”, Mc Graw Hill. Inc., James L. Antonakos , “ The Pentium Microprocessor”, Pearson Education, Gene .H.Miller, “Micro Computer Engineering”, Pearson Education , John .B.Peatman , “Design with PIC Microcontroller”, Prentice Hall, James L.Antonakos, “An Introduction to the Intel family of Microprocessors”, Pearson Education, 1999. 6
7
UNIT I -HIGH PERFORMANCE CISC ARCHITECTURE – PENTIUM 9
Objective Study the Architecture of Pentium processor Programming the pentium Processor. 7
8
Overview Introduction to Pentium Pentium Architecture Addressing Modes
Instruction Set Assembly Language Programming Bus Operations Pipelining Branch Prediction Exception and Interrupts Floating point unit Operating Modes. Paging and Multitasking. 8
9
INTRODUCTION
10
MICROPROCESSOR A microprocessor is a computer processor which incorporates the functions of a computer's central processing unit (CPU) on a single integrated circuit (IC),or at most a few integrated circuits. Microprocessor might only include an arithmetic logic unit (ALU) and a control logic section. The ALU performs operations such as addition, subtraction, and operations such as AND or OR.
11
MICROPROCESSOR
12
MICROCONTROLLER A microcontroller (or MCU, short for microcontroller unit) is a small computer (SoC) on a single integrated circuit containing a processor core, memory, and programmable input/output peripherals. Microcontrollers are used in automatically controlled products and devices
13
BLOCK DIAGRAM
14
DIFFERENCE BETWEEN μP & μC
Microprocessor contains only a CPU. In contrast Microcontroller contains few other components apart from CPU, which includes RAM, ROM and other peripherals like ports, clock, timer, UART (Universal Asynchronous Receiver Transmitter), ADC (Analog to digital converter), DAC (Digital to analog converter), Drivers for LCD, etc., Microprocessor can be considered as just the processor, while microcontroller can be seen as a small computer which is embedded on a single IC (Eg. 8051).
15
So to summarize, we can state the difference between both as: “Microprocessor is present inside a Microcontroller”. This is valid to some extent because: Microcontroller = Microprocessor + Few Extra components
16
ADVANCED MICROPROCESSOR & MICROCONTROLLER MEANS?
Then, ADVANCED MICROPROCESSOR & MICROCONTROLLER MEANS?
17
With added features like
High memory capacity More number of I/O pins High performance More external interfacing options etc., Example for microprocessor: Intel Pentium, Pentium-I, Pentium-II…,i3,i5,i7,8085 & 8086 Example for microcontroller: 8051,PIC, ARM, Arduino…
18
MICROPROCESSOR DEVELOPMENT CYCLE
19
INTEL MICROPROCESSOR DEVELOPMENT CYCLE
20
MICROCONTROLLER DEVELOPMENT CYCLE
21
Pentium-branded processors P5 microarchitecture based
Pentium is a brand used for a series of x86-compatible microprocessors produced by Intel since 1993. In its current form, Pentium processors are considered entry-level products that Intel rates as "two stars", meaning that they are above the low-end Atom and Celeron series but below the faster Core i3, i5 and i7. Pentium-branded processors P5 microarchitecture based Pentium P6 microarchitecture based Pentium Pro Pentium II Pentium III Netburst microarchitecture based Pentium 4 Pentium D Pentium M microarchitecture based Pentium M Pentium Dual-Core Core microarchitecture based Pentium (2009)
22
CPU ARCHITECTURE
24
The two integer pipelines, the U pipeline and V pipeline are responsible for executing the 80x86 instructions. The floating point unit is included on the chip to execute mathematical functions. The Pentium communicates with the outside world Via 32 bit address bus and 64 bit data bus. An 8KB instruction cache is used to provide quick access to frequently used instructions. When an instruction is not found in cache , it is read from the external data bus and copy paste into the instruction cache for future reference. Branch target buffer and prefetch buffers: work together with instruction cache to fetch instruction as fast as possible.
25
Prefetch buffers maintains the copy of next 32 bytes of prefetched instruction code.
Branch prediction: Technique to maintain steady flow of instructions into pipeline. To support branch prediction, the branch target buffer maintains a copy of instruction in a different parts of the program located at the address called branch address. Example: CALL XYZ Branch target buffer stores the copy of the memory location A separate 8KB data cache stores a copy of the most frequently accessed memory data.
26
The Pentium: A CISC Architecture
27
What is CISC? CISC stands for Complex Instruction Set Computer
CISC takes its name from the very large number of instructions (typically hundreds) and addressing modes.
28
History: CISC The first PC microprocessors developed were CISC chips, because all the instructions the processor could execute were built into the chip. Memory was expensive in the early days of PCs, and CISC chips saved memory because their programming could be fed directly into the processor.
29
History: CISC CISC chips were improved mainly by adding more instructions to the processor design. This also meant that programming changed with new CISC designs. CISC designs grew complex and somewhat bulky
30
Examples of CISC Processors
Examples of CISC processors are VAX PDP-11 Motorola family Intel x86/Pentium CPU’s
31
Advantages of CISC CISC has varying lengths to reduce wasted space in memory. Has developed a process to manage power which adjusts clock speed and voltage. Uses less instructions to perform similar instructions than RISC
32
Disadvantages of CISC CISC chips are relatively slow (compared to RISC chips) per instruction. CISC chips require many more transistors than comparable RISC designs . Harder to pipeline using CISC architecture. Expensive to produce.
33
RISC vs CISC RISC puts a greater burden on the software. Software needs to become more complex and Software developers need to write more lines of code to perform similar tasks. But by doing this RISC architecture takes the burden away form the hardware resulting in an increase in performance(mainly speed).
34
OPERATING MODES
35
Real mode and Protected mode
Real mode: The advanced microprocessors, including the Pentium, simply operate like 8086 with associated 1MB memory. Real mode is automatically selected upon power up. So Pentium boots up into DOS operating system in real mode. Protected mode: The full 4 GB of memory is available to the processor, as are special privileged instruction and architectural goodies, including multitasking, virtual memory addressing, memory management and control over internal data and instruction cache. Writing program in protected mode needs special knowledge.
36
Software model of Pentium
37
Software model of Pentium
38
Pentium Microprocessor: Registers
Registers are in the CPU and are referred to by specific names Data registers Hold data for an operation to be performed There are 4 data registers (EAX,EBX, ECX, EDX) All are 32 bit wide. Lower 16 bit registers are called AX,BX,CX,DX. May be Split up into halves of 8 bits each. Address registers Hold the address of an instruction or data element Segment registers (CS, DS, ES, SS,FS,GS) Pointer registers (ESP, EBP, EIP) Index registers (ESI, EDI) Status register Keeps the current status of the processor The status register is called the FLAG register 38
39
Data Registers: EAX,EBX, ECX,EDX
Instructions execute faster if the data is in a register.(E---Stands for Extended) Data Registers are general purpose registers but they also perform special functions AX, BX, CX, DX are the 16 bit data registers. Low and High bytes of the data registers can be accessed separately AH, BH, CH, DH are the high bytes AL, BL, CL, DL are the low bytes 8086 Architecture (continued…) 39
40
8086 Architecture (continued…)
AX Accumulator Register Used in Arithmetic, Logic and Data Transfer instructions Used in Multiplication and Division operations Used in I/O operations BX Base Register Also serves as an address register Used in array operations Used in Table Lookup operations (XLAT) CX Count register Used as a Loop Counter Used in shift and rotate operations DX Data register Used in Multiplication and Division Also used in I/O operations 8086 Architecture (continued…) 40
41
Pointer and Index Registers
Contains the offset addresses of memory locations Can also be used in Arithmetic and other operations SP: Stack pointer Used with SS to access the stack segment BP: Base Pointer Primarily used to access data on the stack Can be used to access data in other segments 8086 Architecture (continued…) 41
42
Pointer and Index Registers 8086 Architecture (continued…)
SI: Source Index register is required for some string operations SI is associated with the DS in string operations. DI: Destination Index register is also required for some string operations. DI is associated with the ES in string operations. The SI and the DI registers may also be used to access data stored in arrays 8086 Architecture (continued…) 42
43
Segment Registers - CS, DS, SS and ES
CS: Code segment---Used during instruction fetches. DS:Data Segment---Used when reading or writing data. SS:stack Segment---During stack operations such as subroutine calls and returns. ES:Extra Segment---Used for anything the Programmer wishes. GS and FS:---Used for anything the Programmer wishes. 8086 Architecture (continued…) 43
44
Segment Registers - CS, DS, SS and ES
Are Address registers Stores the memory addresses of instructions and data Memory Organization 20 bit address line addresses 1 MB of memory Each byte in memory has a 20 bit address Addresses are expressed as 5 hex digits from FFFFF Problem: 20 bit addresses are TOO BIG to fit in 16 bit registers! Solution: Memory Segment A segment number is a 16 bit number Segment numbers range from 0000 to FFFF Block of 64K (65,536) (i.e 216)consecutive memory bytes Within a segment, a particular memory location is specified with an offset An offset also ranges from 0000 to FFFF 8086 Architecture (continued…) 44
45
8086 Architecture (continued…)
Segmented memory addressing: Absolute Address = Four bit left shifted16-bit segment value added to a 16-bit offset 1 MB Memory Space 00000 10000 20000 30000 40000 50000 60000 70000 80000 90000 A0000 B0000 C0000 D0000 E0000 F0000 5000:0000 5000:FFFF 5000:0250 SegAddr:Offset Starting Address of each segment 8086 Architecture (continued…) 45
46
Physical Memory Address Generation
The BIU has a dedicated adder for determining Physical memory addresses Physical Address (20 Bits) Adder Offset Value or Effective address (16 bits) Segment Register (16 bits) 8086 Architecture (continued…) 46
47
Physical Memory Address Generation
Logical Address is specified as Segment:Offset Physical address is obtained by shifting the segment address 4 bits to the left and adding the offset address Thus the physical address of the logical address A4FB:4872 is A4FB0 A9822 1010 0100 1111 1011 0000 0100 1000 0111 0010 1001 1001 1000 0010 0010 8086 Architecture (continued…) 47
48
Advantages of using Segment Registers
Even though addresses associated with the instructions are 16 bits only, allows the memory capacity to be 1MB Permit a program and/or its data to be put into different areas of memory each time the program is executed. 8086 Architecture (continued…) 48
49
Flags Priority level of current task Carry flag Overflow flag
current task is nested Carry flag Overflow flag Parity flag Direction flag Auxiliary flag Interrupt enable Trap flag Zero flag 6 - status flags 3 - control flags Sign flag 49
50
8086 Architecture (continued…)
Flags Flags: - 32 bit flag register. -Used only in Protected mode. Status or Conditional flags: These are set according to the results of the arithmetic or logic operations. Need not be altered by the user. Control flags: Used to control some operations of the MPU. These flags are to be set by the user, in order to achieve some specific purposes. 8086 Architecture (continued…) 50
51
Status or Conditional or Condition Code Flags
CF (carry) Contains carry from leftmost bit following arithmetic, also contains last bit from a shift or rotate operation. PF (parity) Indicates the number of 1 bits that result from an operation.(1=even) AF (auxiliary carry) Contains carry out of bit 3 into bit 4 for specialized arithmetic (BCD). ZF (zero) Indicates when the result of arithmetic or a comparison is zero. (1=yes) SF (sign) Contains the resulting sign of an arithmetic operation (1=negative) OF (overflow) Indicates overflow of the leftmost bit during arithmetic. 8086 Architecture (continued…) 51
52
8086 Architecture (continued…)
Control flags: DF (direction) Indicates left or right for moving or comparing string data. IF (interrupt) Indicates whether external interrupts are being processed or ignored. TF (trap) Permits operation of the processor in single step mode. 8086 Architecture (continued…) 52
53
32 bit Flag register
54
8086 Architecture (continued…)
Example Assume that the previous instruction performed the following addition, SF= ZF= PF= CF= AF= OF= 0101 0011 0010 0100 1001 0001 1110 0101 1001 0100 0101 0011 1010 0110 SF= ZF= PF= CF= AF= OF= 1 1 0011 1001 0101 1 8086 Architecture (continued…) 54
55
Addressing Modes 55
56
Addressing Modes Various methods used to access instruction operands is called as Addressing Mode General Instruction Format Operands may be contained in Registers, Memory I/O ports. Three basic modes of addressing are Immediate Register OPCODE Operand Addressing Modes 56
57
Addressing Modes (continued...)
Example: If CS=24F6h & IP=634Ah, show the; 1- The logical address 2- The offset address 3- The physical address 4- The lower range of the segment 5- The upper range of the segment Solution: 1- The logical address is the CS: IP content which is: 24F6:634A 2- The offset address is the content of the IP register which is: 634A 3- The physical address: Addressing Modes (continued...) 57
58
Addressing Modes (continued...)
Addressing modes - classified according to flow of instruction execution Sequential flow instructions Arithmetic Logical Data transfer Processor control Control transfer instructions INT CALL RET JUMP Addressing Modes (continued...) 58
59
Addressing Modes (continued...)
Sequential flow instructions Implied Addressing mode Immediate addressing mode Direct addressing mode Register addressing mode Register Indirect addressing mode Indexed addressing mode Register Relative addressing mode Based Indexed addressing mode Relative Based Indexed addressing mode Control transfer instructions Intersegment Direct addressing mode Intersegment Indirect addressing mode Intra segment Direct addressing mode Intra segment Indirect addressing mode Addressing Modes (continued...) 59
60
Addressing Modes (continued...)
Sequential Flow Instructions Implied Addressing - The data value/data address is implicitly associated with the instruction. AAA AAS AAM AAD DAA DAS XLAT Addressing Modes (continued...) 60
61
Sequential Flow Instructions
Immediate Addressing – Data / operand is part of the instruction MOV AX, 25BF ; MOV AL, 8EH ; Direct Addressing – Data is pointed by 16 bit offset value specified in the instruction MOV AX, [5000H] ; Destination Source [ AX25BF H ] 16 Bit Data [ AL8E ] 8 Bit Data Effective Addr = 5000 PhyAddr = 10H*DS H Addressing Modes (continued...) 61
62
Addressing Modes (continued...)
Register Addressing – Data is in the register specified in the instruction MOV BX, AX No PhyAddr, since data is in regr 16 BIT Operand Registers - AX, BX, CX,DX, SI, DI, SP, BP 8 BIT Operand Registers - AL, AH, BL, BH, CL, CH, DL, DH Addressing Modes (continued...) 62
63
Addressing Modes (continued...)
Register Indirect Addressing – Data is pointed by the offset value in the register, specified in the instruction MOV AX, [BX] Default Segment - DS or ES Offset – BX or SI or DI PhyAddr = DS ES BX SI DI + 10H * If DS=5000H; BX=10FF; Then EffectiveAddr = 10FF and PhyAddr = 10H*5000H + 10FFH = 510FFH Addressing Modes (continued...) 63
64
Addressing Modes (continued...)
Indexed Addressing Data is pointed by the offset in the index register specified in the instruction DS is the default segment register for SI and DI MOV AX, [SI] Data is available in the logical address [DS:SI] Effective Addr = [SI] PhyAddr = DS SI DI + 10H * Addressing Modes (continued...) 64
65
Addressing Modes (continued...)
Register Relative Addressing Data is pointed by the sum of 8 bit or 16 bit displacement specified in the instruction plus Offset specified in the registers –BX, BP, SI, DI Default segment registers – DS, ES MOV AX, 50H [BX] EffectiveAddr = 50H+[BX] PhyAddr = DS ES BX BP SI DI + 10H * Addressing Modes (continued...) 65
66
Addressing Modes (continued...)
Based Indexed Addressing Data is pointed by content of base register specified in the instruction plus Content of index register specified in the instruction Default segment registers – DS, ES MOV AX, [BX] [SI] BX BP SI DI + EffectiveAddr = BX BP SI DI + 10H * DS ES + PhyAddr = Addressing Modes (continued...) 66
67
Addressing Modes (continued...)
Register Relative Addressing Data is pointed by the sum of 8 bit or 16 bit displacement specified in the instruction plus Offset specified in the base registers –BX, BP plus Offset specified in the index registers – SI, DI Default segment registers – DS, ES 8 bit 16 bit + BX BP SI DI + EffectiveAddr = BX BP SI DI + 10H * DS ES + 8 bit 16 bit + PhyAddr = Addressing Modes (continued...) 67
69
BUS OPERATION
70
The Pentium processor perform a number of different operations over its address and data buses.
Data transfer, Interrupt acknowledgement, Inquire cycle for examining the internal code and data cache, and I/O operations. Decoding a bus cycle: The Pentium bus logic indicates the type of bus cycle, currently with the use of its cycle definition signals. The signals are M/IO,D/C,W/R,CACHE,KEN
72
Special bus cycle requires additional decoding and use the byte enable outputs for selection.
73
Bus cycle states: There are six possible states the Pentium bus may be in, depending on what type of cycle is being processed. The states are Ti,T1,T2,T12,T2P,TD. Ti: This is the bus idle state. In this state, no bus cycles are being run. The processor may or may not be driving the address and status pins T1: This is the first clock of a bus cycle. Valid address and status are driven out T2: This is the second and subsequent clock of the first outstanding bus cycle. In state T2, data is driven out (if the cycle is a write), or data is expected
74
T12: This state indicates there are two outstanding bus cycles, and that the processor is starting the second bus cycle at the same time that data is being transferred for the first. In T12, the processor drives the address and status T2P: This state indicates there are two outstanding bus cycles, and that both are in their second and subsequent clocks. In T2P, data is being transferred TD: This state indicates there is one outstanding bus cycle, that its address, status already been driven sometime in the past (in state T12) (DEAD LOCK TIME)
75
Processor bus control state machine:
0: No bus cycle requested 1: New bus cycle started. ADS is taken low. 2: Second clock cycle of current bus cycle. 3: Stay in T2 until BDRY is active or new bus cycle is requested 4: Go back to T1 if a new request is pending. 5: Bus cycle complete; go back to idle state. 6: Begin second bus cycle 7: Current cycle is finished and no dead clock is needed. 8: A dead clock is needed after the current cycle is finished. 9: Go to T2P to transfer data 10: Wait in T2P until data is transferred. 11: Current cycle is finished and no dead clock is needed. 12: A dead clock is needed after the current cycle is finished. 13: Begin a pipelined bus cycle if NA is active 14: No new bus cycle is pending
76
SINGLE TRANSFER CYCLE:
This cycle transfers up to 8 bytes of non cacheable data between processor and memory. The cycle begins during clock cycle T1, when ADS goes low CACHE is taken high to indicate to external circuitry that the data is not going to, or coming from the internal cycle. If BDRY goes low during the T2 clock cycle, the data will be transferred and operation completes during clock cycle Ti. If BDRY is not low during T2, addition T2 clock cycle are generated, these extra clock cycle are called WAIT CYCLE.
78
BURST CYCLE: Supports burst read and write of 32 bytes. The cache uses burst cycle for line load and write back. During a burst operation, a new eight byte chunk can be transferred every clock cycle. LOCKED OPERATION: Many operating systems processes depend on what is called atomic access to data stored in memory. An atomic operation cannot be broken down into smaller sub-operations. The data accessed during the atomic operation often comes in the form of a semaphore.(uninterruptable operation). Example: XCHG instruction
79
BOFF: The BOFF input provides a way for other processors in a multiprocessor system to instantly take over the Pentium buses. BOFF low put bus into high impedance state and allows the other processor to use bus. BOFF high allows the Pentium to use bus(interrupts the process in between if BOFF goes high) BUS HOLD: The HOLD input provides a second way for a different bus master to take control of the Pentium’s buses. Unlike BOFF, HOLD completed the current bus cycle.
80
INTERRUPT ACKNOWLEDGE:
The processor runs two interrupts acknowledge cycles in response to an INTR request. Both cycles are locked. To maintain hardware compatibility with earlier 80x86 machines, the data is ignored by the processor during the first interrupt acknowledge and accept during the second acknowledge. SHUTDOWN: If the Pentium detects an internal parity error, a shutdown cycle is run. Execution is suspended while in shutdown. Until the processor receives an NMI,INIT or RESET request. HALT: Similar to shutdown, except that the INTR signal may also be used to resume execution.
81
PIPELINED CYCLE: It process the second cycle before the current one is completed. It does so through pipelined read and write logic. In response to a request on NA input. INQUIRE CYCLE: Maintain cache coherency in a multiprocessor system. The Pentium processor is able to watch the system bus in multiprocessor system. This is called BUS SNOOPING. If the Pentium detects a memory read/write operation being performed by another CPU, it runs an internal inquire cycle to determine whether the address in the bus is stored in one of its internal caches. If so, the cache may need to be updated.
82
PIPELINING
83
Integer Pipeline
84
Integer Pipeline The pipelines are called “u” and “v” pipes.
The u-pipe can execute any instruction, while the v-pipe can execute “simple” instructions as defined in the “Instruction Pairing Rules”. When instructions are paired, the instruction issued to the v-pipe is always the next sequential instruction after the one issued to u-pipe.
85
Integer Instruction Pairing Rules
86
Integer Instruction Pairing Rules
To issue two instructions simultaneously they must satisfy the following conditions: Both instructions in the pair must be “simple”. There must be no read-after-write(RAW) or write-after- write register(WAW) dependencies RAW: i1. R2 R1 + R3 i2. R4 R2 + R3 WAW: i1. R2 R4 + R7 i2. R2 R1 + R3
87
The following integer instructions are considered simple and may be paired:
1. mov reg, reg/mem/imm 2. mov mem, reg/imm 3. alu reg, reg/mem/imm 4. alu mem, reg/imm 5. inc reg/mem 6. dec reg/mem 7. push reg/mem 8. pop reg 9. lea reg,mem 10. jmp/call/jcc near 11. nop 12. test reg, reg/mem 13. test acc, imm
88
Instruction Issue Algorithm
Decode the two consecutive instructions I1 and I2 If the following are all true I1 and I2 are simple instructions I1 is not a jump instruction Destination of I1 is not a source of I2 Destination of I1 is not a destination of I2 Then issue I1 to u pipeline and I2 to v pipeline Else issue I1 to u pipeline
89
PIPELINE STAGES: Prefetch. During Prefetch, the next instruction to be executed is copied from cache memory to the CPU. Instruction Decode, Part 1 Instruction Decode, Part 2 Execution. Write Back. Registers and memory locations are updated.
90
Integer Pipeline The integer pipeline stages are as follows:
Prefetch(PF) : Instructions are prefetched from the on-chip instruction cache or memory. Decode1(D1): Two parallel decoders attempt to decode and issue the next two sequential instructions It determines the current pair of instruction can execute together.
91
Integer Pipeline 3. Decode2(D2): Execute (EX): Writeback(WB):
Decodes the control word Address of memory resident operands are calculated Execute (EX): The instruction is executed in ALU Data cache is accessed at this stage For both ALU and data cache access requires more than one clock. Writeback(WB): The CPU stores the result and updates the flags
92
C C C C C C C C C9
93
Pipeline Stalls: When paired instruction reach the EX stage, it is possible that one or other will stall and require additional cycles to execute. A pipeline stall lowers performance, since no work is done during stall Instruction stall for various reasons, most notably when their operands are not available in data cache. If the instruction in the U pipeline stalls, then V-pipeline does the same. If the V pipeline stalls, the instruction in the U-pipeline may continue executing. Both instructions must process to the WB stage before another pair may enter the EX stage.
94
Branch Prediction Logic
95
Flushing of pipeline problem
Performance gain through pipelining can be reduced by the presence of program transfer instructions (such as JMP,CALL,RET and conditional jumps). They change the sequence causing all the instructions that entered the pipeline after program transfer instruction invalid.
96
Flushing of pipeline problem
Suppose instruction I3 is a conditional jump to I50 at some other address(target address), then the instructions that entered after I3 is invalid and new sequence beginning with I50 need to be loaded in. This causes bubbles in pipeline, where no work is done as the pipeline stages are reloaded.
97
Flushing of pipeline problem
To avoid this problem, the Pentium uses a scheme called Dynamic Branch Prediction. In this scheme, a prediction is made concerning the branch instruction currently in pipeline. Prediction will be either taken or not taken. If the prediction turns out to be true, the pipeline will not be flushed and no clock cycles will be lost.
98
Flushing of pipeline problem
If the prediction turns out to be false, the pipeline is flushed and started over with the correct instruction. It results in a 3 cycle penalty if the branch is executed in the u- pipeline and 4 cycle penalty in v-pipeline.
99
Dynamic Branch Prediction Mechanism
It is implemented using a 4-way set associative cache with 256 entries. This is referred to as the Branch Target Buffer(BTB). The directory entry for each line contains the following information: Valid Bit : Indicates whether or not the entry is in use History Bits: track how often the branch has been taken Source memory address that the branch instruction was fetched from (address of I3) If its directory entry is valid, the target address of the branch is stored in corresponding data entry in BTB
101
Dynamic Branch Prediction Mechanism
The first time that a branch instruction enters either pipeline, the BTB uses its source memory address to perform a lookup in the cache. Since the instruction has not been seen before, this results in a BTB miss.
102
Dynamic Branch Prediction Mechanism
It means the prediction logic has no history on instruction. It then predicts that the branch will not be taken and program flow is altered. Even unconditional jumps will be predicted as not taken the first time that they are seen by BTB.
103
Dynamic Branch Prediction Mechanism
When the instruction reaches the execution stage, the branch will be either taken or not taken. If taken, the next instruction to be executed should be the one fetched from branch target address. If not taken, the next instruction is the next sequential memory address.
104
Dynamic Branch Prediction Mechanism
When the branch is taken for the first time, the execution unit provides feedback to the branch prediction logic. The branch target address is sent back and recorded in BTB. A directory entry is made containing the source memory address and history bits set as strongly taken
105
Dynamic Branch Prediction Mechanism
Strongly Taken Weakly Taken Weakly Not Taken Strongly Not Taken
106
Dynamic Branch Prediction Mechanism
History Bits Resulting Description Prediction Made If branch is taken If branch is not taken 11 Strongly Taken Branch Taken Remains Strongly Taken Downgrades to Weakly Taken 10 Weakly Taken Upgrades to Strongly Taken Downgrades to Weakly Not Taken 01 Weakly Not Taken Branch Not Taken Upgrades to Weakly Taken Downgrades to Strongly Not Taken 00 Strongly Not Taken Upgrades to Weakly Not Taken Remains Strongly Not Taken
107
FLOATING POINT UNIT(FPU)
108
Floating-Point Pipeline
The floating point pipeline has 8 stages as follows: Prefetch(PF) : Instructions are prefetched from the on-chip instruction cache Instruction Decode(D1): Two parallel decoders attempt to decode and issue the next two sequential instructions It decodes the instruction to generate a control word
109
Floating-Point Pipeline
3. Address Generate (D2): Decodes the control word Address of memory resident operands are calculated Memory and Register Read (Execution Stage) (EX): Register read, memory read or memory write performed as required by the instruction to access an operand. Floating Point Execution Stage 1(X1): Information from register or memory is written into FP register. Data is converted to floating point format before being loaded into the floating point unit
110
Floating-Point Pipeline
Floating Point Execution Stage 2(X2): Floating point operation performed within floating point unit. Write FP Result (WF): Floating point results are rounded and the result is written to the target floating point register. Error Reporting(ER) If an error is detected, an error reporting stage is entered where the error is reported and FPU status word is updated
111
Instruction Issue for Floating Point Unit
The rules of how floating-point (FP) instructions get issued on the Pentium processor are : FP instructions do not get paired with integer instructions. When a pair of FP instructions is issued to the FPU, only the FXCH instruction can be the second instruction of the pair. The first instruction of the pair must be one of a set F where F = [ FLD,FADD, FSUB, FMUL, FDIV, FCOM, FUCOM, FTST, FABS, FCHS]. FP instructions other than FXCH and instructions belonging to set F, always get issued singly to the FPU. FP instructions that are not directly followed by an FXCH instruction are issued singly to the FPU.
112
Floating –point registers
Bypass1 Read port 1 Floating –point registers ST(0) ST(7) Write port 1 Read port 2 Write port 2 80 bits X1 Ex WF Bypass2 FPU Register File
113
PAGING
114
• Paging is enabled by making PG = 1 in CR0 register (required in
The Pentium supports translation of virtual (linear) addresses into physical addresses through the use of special tables that map portions of the virtual address into actual physical memory locations. Physical memory is divided into fixed-size page frames of 4KB each. Paging is controlled by three flags in the processor’s control registers: • Paging is enabled by making PG = 1 in CR0 register (required in multitasking in virtual 8086 model) In Pentium no bit mode to disable segmentation • PSE (page size extensions) flag, bit 4 of CR4. { set => page size 2MB or 4MB • PAE (physical address extension) flag, bit 5 of CR4).
115
Paging • Page directory—An array of 32-bit page-directory entries contained in a 4-KByte page. Up to 1024 page-directory entries can be held in a page directory. • Page table—An array of 32-bit page-table entries contained in a 4-KByte page. Up to 1024 page-table entries can be held in a page table. (Page tables are not used for 2-MByte or 4-MByte pages. These page sizes are mapped directly from one or more page directories.) • Page—A 4-KByte, 2-MByte, or 4-MByte flat address space.
116
Paging 32-bit virtual (linear) addresses generated by a running task select entries in the systems page directory and page table, which translate the upper 20 bits of the virtual address into the actual physical address where a page frame is located. The lower 12 bits of the virtual address are not translated and point to one of 4,096 byte locations within a page frame. How is a 32-bit virtual address translated into a physical address? The upper 10 bits of the virtual address select one of 1,024 entries in the page directory. The base address of the page directory is stored in the page directory base register (PDBR). Each entry in the page directory is 4 bytes wide and contains the base address of a page table. The next 10 bits from the virtual address select one of 1,024 entries in the page table pointed to by the page directory entry. This entry is also 4 bytes wide and contains the base address of the actual physical memory page frame. This address is combined with the lower 12 bits of the virtual address to access the desired location in memory.
117
Paging Displacement or Offset PDE & PTE format 31 – 12( PT Address)
11- 0 ( control & status flags
118
Paging
119
Translation lookaside buffers(TLBs)
To improve the performance, the internal instruction and data cache of the Pentium contain small, special caches called TLBs that automatically translate the upper 20 bits of the virtual address into upper 20 bits of physical address. So it requires only one clock cycle to process. TLBs contains only the address of the most recently used pages. If the required translation is not available in TLB, then the processor access the page directory and page table from RAM and store it TLBs. Prior to doing this it may be necessary to invalidate the contents in TLBs.
120
PDE: PTE: Page frame address(12-31) Avail. A PCD PWT U W F
A PCD PWT U W F PTE: Page frame address(12-31) Avail. A PCD PWT U W F
121
D-Dirty bit: It is set if a write has been performed to the page pointed by PTE.
A-Accessed: It is set if a read or write was performed to the page selected by the PTE and PDE. PCD-cache disable: This bit determines whether the current memory accessed is cache. PWT-Writethrough: This bit enables writethrough operations between cache and memory. U-user: This bit is set when performing protection check in memory W-writable: This bit determines whether page may be written to and is also used in protection checks P-Present: This page indicates page is actually stored in memory. If new page is needed, processor creates one and updates TLBs.
122
Paging Summary…. Page translation allows the physical memory used by a system to be much smaller than the linear addressing space. For instance, the Pentium’s 4GB linear addressing space may be mapped to a physical memory of only 512MB. The pages used by a program do not need to be stored consecutively. A program’s code and data may be spread out all over physical memory, and even moved around (with help from the hard disk) while the program is executing! This helps to explain why the linear addresses are also called virtual addresses, since they have no relation to the actual physical memory address used, except for the lower 12 bits.
123
MULTITASKING
124
Multitasking VS Multithreading
Tasks are like jobs. So, multi tasking means doing multiple jobs at the same time. Threads run within a process or task. So, multi threading means many sub tasks being done within a main task. Like, using Microsoft word and PowerPoint is multi tasking. while typing and using the grammar and spell check means you are running 2 threads within Microsoft word.
125
MULTITASKING Ability to support execution of multiple programs ( Tasks) simultaneously Actually one program is running at one point in time, but the ability to switch the Task to Task at very high speed gives the impression of multitasking
126
Task state segment (TSS). TSS descriptor. Task register
The processor defines four data structures for handling task related activities: Task state segment (TSS). TSS descriptor. Task register Task gate descriptor.
127
Each task executes for a period of time called TIME SLICE.
TASK SWITCH is used to switch from one task to another task. But rapidly switching from task to task gives the impression that all tasks are running at the same time. 1.Task State Segment: During the task switch, the contents of all processor register, as well as information saved for the task being suspended and new information is loaded for the next task. This information is not saved on the stack, but saved on special memory structure called the TASK STATE SEGMANT(TSS) It contains storage areas for all of Pentiums Registers, segment selectors, stack pointers
128
When a task is created, the task’s LDR, PDBR, Protection level stack, T-bit, I/O permission map bit are filled in. During the task switch, these items are not altered. Only the register portion EIP to GS is modified during task switching.
129
2. TSS descriptor -Defines the various characteristics of the segments exhibits. TSS utilizes this descriptor.
130
B – task is currently running or waiting to run.
P – segment is in memory or not ( sometimes suspended if page fault occurs) G- determines how the limit field is interpreted. Clear-segment size from 1 byte to 1MB. Set- Segment size from 4KB to 4GB(in chunks of 4KB) If the segment is available for use then AVL bit will be set. DPL- indicates privilege level of the segment and is used in protection check.
131
3. Task Register (TR) 1. The task register holds the 16-bit segment selector, 32-bit base address, 32-bit segment limit, and descriptor attributes for the TSS of the current task 2. The TSS actually in use is accessed through TR (using STR and LTR commands) TSS descriptor may only be loaded into the GDT(global descriptor table). When multiple TSS is stored in GDT. The currently in use is accessed through the use of TR
132
The task register may be loaded with a new TSS selector with the LTR(Load Task Register) instruction. LTR requires a 16-bit register or memory operand and may only executed in protected mode. 4. Task Gate Descriptor 1. A task switch may results in a privilege violation if the new task has a lower priority then the current executing task. Task Gate provides a way to facilitate task switching. 2.A task gate descriptor provides an indirect, protected reference to a task. A task gate descriptor can be placed in the GDT or LDT. 3. It allows a single busy bit to be used for a segment ( contained in TSS descriptor) 4. By this approach it safe guards the processor in facilitating Multitasking using DPL and Busy bit.
134
TASK SWITCH The following steps take place during task switch
The new TSS descriptor or task gate must have sufficient privilege to allow a task switch. The new TSS descriptor must have its present bit set. The state of current task is saved. The task register is loaded with the selector of the new TSS descriptor The state of the new task is loaded from its TSS and execution is resumed.
135
TASK ADDRESSING SPACE If paging is not enabled, the linear addresses generated by a task are the same as the physical addresses sent to the memory system. When paging is enabled it is possible for each task to have its own separate, protected addressing space, through the use of PDBR(Page Directory Base Register) stored in TSS.
136
INTERRUPTS AND EXCEPTIONS
137
Interrupts typically occur at random times during the execution of a program, in response to signals from hardware. They are used to handle events external to the processor, such as requests to service peripheral devices. Software can also generate interrupts by executing the INT n instruction. Exceptions occur when the processor detects an error condition while executing an instruction, such as division by zero. The processor detects a variety of error conditions including protection violations, page faults, and internal machine faults.
138
When an interrupt is received or an exception is detected, the currently running procedure or task is automatically suspended while the processor executes an interrupt or exception handler. When execution of the handler is complete, the processor resumes execution of the interrupted procedure or task. The resumption of the interrupted procedure or task happens without loss of program continuity
139
INTERRUPTS Non- maskable interrupts (NMIs). These interrupts are received on the processor’s NMI# input pin. The processor does not provide a mechanism to prevent nonmaskable interrupts. Maskable interrupts. These interrupts are received either at the processor's INTR# (interrupt) pin from an external, system-based interrupt controller (8259A) or as a serial message on the LINT[1:0] pins from a system-based I/O APIC. The processor does not act on maskable interrupts unless the IF (interrupt-enable) flag in the EFLAGS register is set. Software-generated interrupts. These are generated by INT n instruction. The processor does not provide a mechanism for masking interrupts generated in this manner.
140
EXCEPTIONS Processor-detected exceptions. These are generated when the processor detects program and machine errors. They are further classified as faults, traps, and aborts. Software-generated exceptions. The INTO, INT3, BOUND, and INTn instructions generate exceptions. (The INTn instruction generates an exception when an exception vector number as an operand.)
141
The processor associates an identification number, called a vector, with each interrupt and exception. The NMI interrupt and the exceptions are assigned vectors in the range 0 through 31. Not all of these vectors are currently used by the processor. Unassigned vectors in this range are reserved for possible future uses. The vectors in the range 32 to 255 are provided for maskable interrupts, generated either by asserting the INTR pin or by sending interrupt messages over the APIC bus. (Advanced Programmable interrupt controller) External interrupt controllers (such as Intel's 8259A Programmable Interrupt Controller) deliver one of these vectors to the processor on the system bus during its interrupt-acknowledge cycle.
143
INTERRUPT DESCRIPTOR TABLE (IDT)
Real mode uses a 1KB Interrupt Vector Table(IVT) beginning at address H. Each 4-byte entry in the IVT. Protected mode relies on an Interrupt Descriptor Table(IDT) to support interrupts and exceptions. IDT comprises 8-byte gate descriptor for task, trap or interrupt gates. The IDT has a maximum size of 256 descriptors. The size of IDT is controlled by a 16-bit limit value stored in Interrupt Table Descriptor Register(ITDR). ITDR is a 48-bit register contains the 32-bit base address for the IDT and the 16-bit size limit. It can be placed anywhere in physical memory.
145
IDT DESCRIPTORS The IDT may contain any of three kinds of gate descriptors: • Task gate descriptor • Interrupt gate descriptor • Trap gate descriptor
147
The P-bit in each descriptor stands for present, and indicates whether the segment is present in memory. The DPL field specifies the descriptor privilege level. When fewer interrupts/exceptions are required, the limit field of the IDTR is used to specify the addressable limit within the IDT. The Pentium will enter shutdown mode if the limit is exceeded.
148
Interrupt 0—Divide Error Exception
Indicates the divisor operand for a DIV or IDIV instruction is 0 or that the result cannot be represented in the number of bits specified for the destination operand. Interrupt 1—Debug Exception Indicates that one or more of several debug-exception conditions has been detected. Whether the exception is a fault or a trap depends on the condition Trap or Fault. The exception handler can distinguish between traps or faults by examining the contents of the DR6 register and other debug registers.
149
Interrupt 2—NMI Interrupt
The non-maskable interrupt (NMI) is generated externally by asserting the processor’s NMI pin. This interrupt causes the NMI interrupt handler to be called. Interrupt 3—Breakpoint Exception Indicates that a breakpoint instruction (INT3) was executed, causing a breakpoint trap to be generated. Typically, a debugger sets a breakpoint by replacing the first opcode byte of an instruction with the opcode for the INT3 instruction. Breakpoint handler is responsible for replacing the original byte of the instruction modified.
150
Interrupt 4—Overflow Exception Indicates that an overflow trap occurred when an INTO instruction was executed. If the OF flag is set, an overflow trap is generated. Interrupt 5—BOUND Range Exceeded Exception Indicates that a BOUND-range-exceeded fault occurred when a BOUND instruction was executed. It detects the array subscript out of range. Interrupt 6—Invalid Opcode Exception Attempted to execute an invalid or reserved opcode. Interrupt 7—Device Not Available Exception On earlier 80x86 machines, This exception was used to indicate that there was no external floating point coprocessor interfaced to the CPU
151
Interrupt 8—Double Fault Exception
Indicates that the processor detected a second exception while calling an exception handler for a prior exception. Interrupt 9—CoProcessor Segment Overrun This was previously used to signal the page fault but it is not available in Pentium. Interrupt 10—Invalid TSS Exception Indicates that a task switch was attempted that referenced an invalid TSS. Interrupt 11—Segment Not Present Indicates that the present flag of a segment or gate descriptor is clear. It indicates segment is not present in memory.
152
Interrupt 12—Stack Fault Exception
A limit violation is detected during an operation that refers to the SS register. Operations that can cause a limit violation include stack-oriented instructions Interrupt 13—General Protection Exception Indicates that the processor detected one of a class of protection violations called “general protection violations.” Violations like Exceeding the segment limit when accessing the CS, DS, ES, FS, or GS segments. Writing to a code segment or a read-only data segment. Reading from an execute-only code segment.
153
Interrupt 14—Page Fault Exception
It occurs when processor attempts to access a page that is not in memory Interrupt 16—Floating-Point Error Exception Indicates that the FPU has detected a floating-point-error exception. Interrupt 17—Alignment Check Exception Indicates that the processor detected an unaligned memory operand when alignment checking was enabled. Interrupt 18—Machine Check Exception Indicates that the processor detected an internal machine error.
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.