Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Chapter 2: IA-32 Processor Architecture Assembly Language for Intel-Based Computers, 5th edition Kip R. Irvine 2/21/07.

Similar presentations


Presentation on theme: "1 Chapter 2: IA-32 Processor Architecture Assembly Language for Intel-Based Computers, 5th edition Kip R. Irvine 2/21/07."— Presentation transcript:

1 1 Chapter 2: IA-32 Processor Architecture Assembly Language for Intel-Based Computers, 5th edition Kip R. Irvine 2/21/07

2 2 IA-32 Processor Architecture u 2.1 General Concepts u 2.2 IA-32 Processor Architecture u 2.3 IA-32 Memory Management u 2.4 Components of an IA-32 Microcomputer u 2.5 Input-Output System In the first part of the chapter we will talk about computers in general instead of emphasizing the IA-32 architecture.

3 3 2.1 General Concepts The major components of a computer u Central Processor Unit (CPU) Does all the calculations and logic operations (our text) “The CPU is the brains of the computer. Sometimes referred to simply as the processor or central processor, the CPU is where most calculations take place. In terms of computing power, the CPU is the most important element of a computer system.” (www.webopedia.com)CPU u Memory Storage Unit Stores data and instructions while programs are running u I/O Unit I/O devices such as disks, keyboard, monitor, printer, mouse, …

4 4 2.1 General Concepts The bus u Used to connect the various units u Consists of wires that carry electrical signals u Three parts: –Address bus –Data bus –Control bus

5 5 2.1 General Concepts Double bus arrangements Memory Storage Unit CPUI/O Unit Memory Storage Unit CPUI/O Unit

6 6 2.1 General Concepts Single bus arrangements Memory Storage Unit CPUI/O Unit Address bus Data bus Control bus

7 7 2.1 General Concepts Bus arrangements u The single bus setup is common because: –It is less expensive –It is easier to add new devices –It allows the CPU to move data between the units. –It allows direct memory access (DMA) where data is moved between the I/O and memory units without active control by the CPU. u Intel computers use a single bus arrangement

8 8 2.1 General Concepts Parts of the CPU u Arithmetic and Logic Unit: Does arithmetic and logic u Control Unit: Controls the action of the CPU in order to execute instructions. u Clock: Synchronizes the various operations in the CPU and other units. u Registers: Very high speed memory storage in the CPU (very costly)

9 9 2.1 General Concepts Operation of the CPU: Add y to AX u The CPU puts address of the next instruction on the address bus. u The CPU sends a signal on the control bus to read memory. u Memory puts the instruction on the data bus. u Memory signals that the instruction is ready. u The CPU reads the instruction. u The CPU interprets the instruction. u The CPU puts the address of y on the address bus. u Memory puts the data on the data bus. u Memory signals that data is ready. u The CPU reads the data. u The CPU uses the ALU to add the contents of AX and value of y u The CPU stores the result in register AX. Fetch Instruction Decode Fetch Operands Execute Store result

10 10 2.1 General Concepts Multi-stage pipelines – the problem u Machine execution cycle – simplified with units used –A. Fetch instruction 1,2,5,6 –B. Decode instruction 3 –C. Fetch operands 1,5,6 –D. Execute 4 –E. Store result 1, 5, 6 u Observe that the Code Prefetch Unit, the Decode Unit and the Execution Units are idle most of the time. u Parts of the processor: –1. Bus interface Unit –2. Code Prefetch Unit –3. Instruction Decode Unit –4. Execution Unit –5. Segment Unit –6. Paging Unit

11 11 2.1 General Concepts Units u S1: Bus Interface Unit: Access memory and I/O. u S2: Code Prefetch Unit: Gets instructions from the Bus Interface Unit and stores them until needed. u S3: Instruction Decoder Unit: Decodes instructions. u S4: Execution Unit: Executes instructions. u S5: Segment Unit: Convert logical addresses to linear address. u S6: Paging unit: Converts linear addresses to physical addresses

12 12 2.1 General Concepts Non-Pipelined Instruction Execution u Notice that the stages are doing nothing most of the time. u It is a major waste to build, for example, an Execution Unit (S4) that is doing nothing 5/6 of the time.

13 13 2.1 General Concepts Pipelined Instruction Execution u A pipelined processor works like an assembly line. When S1 is finished with I-1, it starts I-2 u Beginning with cycle 6, it can complete one instruction per cycle. u The CPU is 6 times faster

14 14 2.1 General Concepts Problems with Pipelined CPU’s u Branch instructions (e.g. if (x >y)) – The CPU doesn’t know what instruction will be processed next. Cycles are skipped until the next instruction is determined. u It takes longer to do a multiplication than an addition: Possible solutions: –Slow pipeline to match slowest operation. –Use multiple multiplication units. The first multiplication goes to first multiplier, the second to the next multiplier.

15 15 2.1 General Concepts Superscalar u A superscalar architecture executes more than one instruction during a single pipeline stage by pre-fetching multiple instructions and simultaneously dispatching them to redundant functional units on the processor.instructionpipelinepre-fetching http://en.wikipedia.org/wiki/Superscalar

16 16 2.1 General Concepts Superscalar –Problem: Six stage pipeline. But Stage S4 takes twice as long as the other stages. –Solution: use two S4s –Instructions alternate. One instruction uses S4a for two cycles and then the next instruction uses S4b for two cycles.

17 17 2.1 General Concepts Memory u Very fast Dell gamers computers (1/18/06): Quad core CPU speed (GHz): 2.66 3.2 (overclocked) Memory speed (MHz) 667 667 CPU/Memory speed ratio: 4.0 4.8 CPU/Memory speed ratio per core: 16.0 19.2 System cost $4,714 $5,499 Problem: the CPU is many times faster than memory! Solution: Use cache memory.

18 18 Memory u Fast Dell business work stations (1/18/06): Dual core CPU speed (GHz): 3.00 Quad core CPU speed (GHz): 2.66 Memory speed (MHz) 533 667 CPU/Memory speed ratio: 4.5 4.0 CPU/Memory speed ratio per core: 9.0 16.0 System cost $3,489 $3,899 Problem: the CPU is many times faster than memory! Solution: Use cache memory.

19 19 2.1 General Concepts Memory: Speed vs Cost u Engineers can make fast memory, we just can’t afford to pay for large quantities of fast memory. Speed Cost

20 20 2.1 General Concepts Memory (con’t) u Fact: Faster memory is available, we just can’t afford large quantities of it. u CPU registers are very fast / very expensive. There are relatively few registers. u Cache memory typically runs at CPU speed or half of CPU speed. It is expensive. We can only afford “small” amounts of it. u Main memory is slower but cheaper.

21 21 2.1 General Concepts Cache Memory u Cache memory is a small, fast memory located between the large main memory and the CPU. u Use of Cache is entirely controlled by hardware. It “cannot” be controlled by the programmer. Main memory Cache memory CPU

22 22 2.1 General Concepts Cache (con’t) u CPU needs a value. u Check to see if it is in cache, if so use it. u If not, fetch it and some nearby locations from main memory. u The next time the value is needed, it will already be in cache. u Result: Machines with small caches and large main memory run almost as fast as cache. Action

23 23 2.1 General Concepts Cache (con’t) u Cache memory works because if you use a memory location: –1. you will probably use it again. –2. you will probably use the next location. –3. you will probably use nearby locations.

24 24 2.1 General Concepts Running programs u 1. User requests running a program u 2. The computer searches for the program in the disk directory u 3. Basic info including files size is loaded u 4. Sufficient memory space is located and allocated and the program is loaded u 5. The computer branches to the program u 6. It runs the program u 7. The process ends and it releases it resources and returns control to the operating system

25 25 2.1 General Concepts Multitasking u More or less independent processes are put in separate threads. u The CPU jumps from one task to another. –When the task reaches some time limit –When the CPU must wait for I/O. u The scheduler picks the task with the greatest priority. u Both hardware and software support is needed.

26 26 2.2 IA-32 Architecture Some History u Before mid 1960’s each kind of computer had its own architecture and machine langauge u The mid 1960’s: IBM 360: introduced a family of computers with the same architecture and machine language. u 1970s: Early microprocessors typically had 8 bit registers and an 8 bit bus u 1978: Intel 8086: 16 bit registers, 16 bit data bus, 20 bit addresses, could address 1 MByte Memory u Intel 8088: Like the 8086 but used an 8 bit bus. Used in the original IBM PC u 1982: Intel 80286: 24 bit addresses so it can address 4 MByte. Introduced protected mode

27 27 2.2 IA-32 Architecture Some History* (con’t) u 1985: Intel 80386: IA-32 architecture (32 bit registers, addresses and bus), virtual memory, can pretend to be an 8086. Can address 4 GBytes of memory. u 1989: Intel 80486: IA-32 architecture, internal floating point unit (FPU), 5 stage pipeline, 8-KByte cache u 1993: Pentium: New micro-architecture, 2 nd pipeline for superscalar performance, doubled cache size u *Some history information adapted from http://www.intel.com/design/processor/manuals/253665.pdf (see chapter 2) as well as from the textbook http://www.intel.com/design/processor/manuals/253665.pdf

28 28 2.2 IA-32 Architecture Some History (con’t) u 1995-1999: P6: New microarchitecture allowed processing instructions out of order. u Pentium Pro: 3 way superscalar. Better internal processing. 2 levels of on board cache u Pentium II: Added 64 bit MMX (SIMD) allowing multiple data values to be processed at the same time u Pentium II Xeon processor: Allows 4 and 8 way scalability u Intel Celeron processor: Lower cost version u Pentium III: Streaming SIMD Extensions - XMM - has 128 bit registers so it can process multiple data items at the same time

29 29 2.2 IA-32 Architecture Some History (con’t) u 2000-2006: Pentium 4: New Net-Burst micro-architecture with 20 stage pipeline. Some have hyperthreading, some have dual cores. Some have 64 bit technology. u 2001-2006: Intel Xeon: Most based on Net-Burst micro- architecture. Some have 64 bit technology. Some have Dual- Core technology. Some have quad core technology u 2003- : Pentium M: Low power u 2005-2007: Pentium Processor Extreme Edition: Dual-core technology u 2006- : Core Solo and Core Duo processors: 1 or 2 core processors. Low power. Enhanced Pentium M

30 30 2.2 IA-32 Architecture Multiple cores u Hyper-Threading: Allows a single processor to process 2 (or more) code streams using shared resources. Has 2 (or more) logical processors but share connections to the bus u Dual Core: 2 processors on 1 chip that have their own connection to the system bus u Quad Core (Core 2 Quad-core): Combine Dual core and hyperthreading. 4 logical processors on a single chip sharing 2 connections to the system bus(es)

31 31 EAX: 32 bits (IA-32) EAX 2.2 IA-32 Architecture Registers u Registers: Very high speed memory in CPU. Faster than cache, much faster than main memory. u General purpose registers: Used for arithmetic and data storage. (8, 16, or 32 bits) u Example: AX: 16 bits (8086 & IA-32) AX AH, AL: 8 bits (8086 & IA-32) AH AL

32 32 2.2 IA-32 Architecture Programming Model EAX (AX, AH, AL) EDX (DX, DH, DL) EBX (BX, BH, BL) ECX (CX, CH, CL) EBP (BP) ESP (SP) ESI (SI) EDI (DI) EFLAGS (FLAGS) EIP (IP) SS CS ES FS* GS*DS * Only in 386 and newer xxx - 386 or newer xx xx – all

33 33 2.2 IA-32 Architecture Registers (part 1) u General purpose registers (with some special purposes): –EAX: (Extended Accumulator) Especially good for calculations, used automatically for multiplication and division. –EBX –ECX: Often used for counting –EDX: Used automatically for multiplication and division u These 32 bit registers (e.g EBX) can also be used as 16 bit registers (e.g. BX) or as two 8 bit registers (e.g. BH and BL)

34 34 2.2 IA-32 Architecture Registers (part 2) u General Purposes registers and their special uses –ESI and EDI (Extended source index and extended destination index): Often used for subscripts –ESP (Extended stack pointer): Used to point to the top of the stack. –EBP (Extended base pointer): Sometimes called “Frame pointer”. Used to point to function parameters and local variables stored on the stack u These 32 bit registers (e.g. ESI) are also used as 16 bit registers (e.g. SI) in 16 bit programs

35 35 2.2 IA-32 Architecture Registers (part 3) u Instruction pointer: EIP (Extended instruction pointer): Points to the next instruction. (This 32 bit register contains IP, the 16 bit instruction pointer for 16 bit programs) u Segment registers: SS, CS, DS, ES, FS, GS Point to segments: Used by hardware in 32 bit programs, programmable in 16 bit programs

36 36 2.2 IA-32 Architecture Registers (part 4) u EFLAGS : (Extended flag register) Contains Control flags: Control CPU operations Status flags: Shows the outcomes of the previous operation: –Carry Flag (CF): Last unsigned calculation overflowed a register –Overflow Flag (OF): Last signed calculation overflowed –Sign Flag (SF): Signals if last operation was + or – –Zero Flag (ZF): Signals if the last operation resulted in 0 –Auxiliary Carry Flag: Used in decimal arithmetic –Parity Flag: (PF) Specifies odd or even parity of last result

37 37 2.2 IA-32 Architecture Other CPU Components u Floating-Point unit (FPU): Used for floating point arithmetic. Has it own set of registers. (See section 17.3) u MMX (Multimedia extension): Registers shared with FPU. Allowed calculating 1, 2, 4 or 8 integer values at a time. (64 bits) u XMM: Like MMX but has its own registers and allows calculating 2, 4, 8, or 16 values at a time. (128 bits)

38 38 2.2 IA-32 Architecture RISC u RISC - Reduced Instruction Set Computer: –Simple instructions. –Takes more instructions to complete a task –Processes instructions faster. –Simpler control unit –“Cheaper” –Allows using available chip space for things that make the computer faster. –.exe files are normally longer

39 39 2.2 IA-32 Architecture CISC u CISC – Complex Instruction Set Computer: –More complicated, powerful instructions –Takes fewer instructions to complete a task –Processes instructions slower –Complex control unit –“Expensive” –More chip space required for control unit –.exe files are normally shorter

40 40 2.2 IA-32 Architecture CISC versus RISC Code Comparisons  Java: int I, J, K; I = J + K;  VAX: ADDW3 J, K, I (“Ultimate” CISC)  IA-32: MOV EAX, J (Simpler CISC) ADD EAX, K MOV I, EAX  RISC: LOAD EAX, J (hypothetical) LOAD EBX, K ADD EAX, EBX STOR I, EAX

41 41 2.2 IA-32 Architecture CISC versus RISC comments u IA-32 architecture is a CISC architecture u New CISC architectures are rare u Pentiums claim to use RISC internally

42 42 2.3 IA-32 Memory Management Modes of Operation u Protected mode: (native mode) Multiple programs can run at the same time, each with its own assigned memory and the processor prevents memory access out of their own area. u Virtual-8086 mode: (sub-mode of the Protected mode) Allows running 8086 – 16 bit programs in protected mode memory. u Real address mode: The CPU runs like a 8086. No memory protection. Used for old games and boot up. u System management mode: Special mode for the operating system. For things like power management and system security.

43 43 2.3 IA-32 Memory Management Your Turn: Address Space u Protected mode: 32 bits addresses  can address ____ GBytes of memory. u Virtual 8086 mode: Each 8086 program can use ____ Mbyte of memory (see below) u Real mode: 20 bit addresses  can address ____ MByte

44 44 2.3 IA-32 Memory Management Background about Programs u The words function, procedure, method, subroutine are almost interchangeable words. u Computers use a stack to keep track of parameters and local variables u Programs use memory in 3 (or 4) ways: –Code –Data –Stack –(Heap)

45 45 2.3 IA-32 Memory Management CISC versus Programs u The memory associated with either code, data, and stack is called a segment. –Code: read only –Data: read and write –Stack: read and write u When a program loads, the operating system allocates memory for each segment u Memory is assigned at run time. Why?

46 46 2.3 IA-32 Memory Management Memory Usage – A Problem u You decide to run 5 programs at the same time. u All 5 programs use location (offset) 200h u Solution: The operating system determines actual storage location at run time

47 47 2.3 IA-32 Memory Management Memory Usage: Protected Mode u 32 bit mode - Windows u The operating system sets the segments registers u The segment registers point to information in a table. u Start Length Use DS (Data Segment)  50000 200 RW CS (Code Segment)  50200 400 R SS (Stack Segment)  50600 1000 RW u Suppose the program tries to read data location 50202. u Suppose the program tries to modify code location 50235. u Suppose another program tries to change location 50644

48 48 00000h 2.3 IA-32 Memory Management Real Mode 0000025AEIP CS Simplified Base address Limit Access 2140 50 R Entry in Global Descriptor Table 50K

49 49 2.3 IA-32 Memory Management Memory Usage: Real Mode u 16 bit - DOS u Operating system sets SS and CS but the program must set up DS u While real mode only runs one program at a time, some memory is reserved for the operating system, BIOS, video and so on u Suppose the program requests 200 bytes of memory but tries to use 300. What happens?

50 50 2.3 IA-32 Memory Management Memory Usage: Real mode u 16 bit registers u 16 bit offsets in program u 20 bit memory address (??? How can this work?) u Suppose that DS = 89AB. The data segment actually begins at 89AB0.  Suppose the program refers to a data item with offset 105 DS 89AB_  implied 0 offset 105 Memory loc 89BB5 u Likewise for SS, CS, ES

51 51 FFFFFh 00000h 2.3 IA-32 Memory Management Real Mode 2140 025A CS IP 2165Ah 2140_ + 025A 2165A

52 52 Your turn: 8086 memory locations u Suppose: SS = 3524 h DS = 3B3A h CS = 224E h u The program says to jump to the instruction at offset 1CE4. What is the actual memory location used for the instruction?

53 53 2.3 IA-32 Memory Management Virtual 8086 Mode u Like real mode except the program is given 1 Mbyte of Protected memory. u Other programs may be running at the same time in their own memory u A virtual 8086 mode program can use memory illegally in its own space but not in space belonging to other programs

54 54 2.3 IA-32 Memory Management Paging u Main memory is fast but expensive. People typically cannot afford as much memory as they would like u Disk memory is slow but cheap u Solution: Let disk memory substitute for main memory. u Sometimes referred to as paging or virtual memory

55 55 2.3 IA-32 Memory Management Paging (con’t) u Memory is divided into (4096 byte) pages u When the computer runs out of memory, it writes some little used pages to disk u Page fault: a program asks for memory location stored in a page on disk u The computer starts working on another program u A page that has not been used lately is written to disk u The requested page is read into memory. u Ideally this happens infrequently u Thrashing: In old systems, the system spends all its time moving pages back and forth and very little, if anything, gets accomplished.

56 56 2.4 Components of an IA-32 Computer Motherboard u Sockets for CPU u Sockets for external cache (if any) u Sockets for main memory u CMOS RAM (with battery) u BIOS chip u IDE controller chip for hard disks and CDROMs* (Newer computers use SATA for hard drives) u Simple graphic chip. Sound synthesizer* u Parallel, serial, USB, video, keyboard, and mouse ports*. Network adapter* u PCI and ISA slots for plug in cards. AGP slot for graphics card u * These may be in plug in cards IDE is also called ATA (Advanced Technology Attachment) SATA: Serial ATA

57 57 2.4 Components of an IA-32 Computer Motherboard (Con’t) u Chipset: A set of chips designed to operate with the CPU to process or control –DMA –Interupts –Timing –Bridge to the PCI bus and PCI to ISA bridge –Memory –Keyboard and mouse controller u Video output –Video memory (RAM or VRAM) –Video controller

58 58 2.4 Components of an IA-32 Computer Bus u We said the Intel computers are single bus computers but it is more complicated u PCI – Peripheral Component Interconnect Bus (faster components) u ISA – Industry Standard Architecture Bus (slower components) u Backside bus - (connects CPU to cache) u USB – Universal Serial Bus – external bus connecting up to 127 devices

59 59 2.4 Components of an IA-32 Computer Buses in INTEL Computers Cache CPU Memory Backside bus PCI Bus System or Frontside Bus Faster I/O Slower I/O ISA Bus – http://www.pcguide.com/ref/cpu/arch/ext.htm - Page 55 of 4 th edition of textbook

60 60 2.4 Components of an IA-32 Computer Memory – ROM u ROM u PROM u EPROM u Flash memory

61 61 2.4 Components of an IA-32 Computer Memory – RAM u DRAM – Dynamic RAM u SDRAM – Synchronous DRAM u VRAM – Video RAM u SRAM – Static RAM u CMOS RAM

62 62 2.4 Components of an IA-32 Computer Memory – Speed and Cost u Speed: ROM DRAM SRAM faster u Cost: DRAM SRAM more expensive

63 63 2.4 Components of an IA-32 Computer USB (Universal Serial Bus) u Intelligent high speed connections between a computer and USB peripherals u USB 2.0 speeds of up to 480 MBits/sec u Allows connecting simple devices, compound devices, and hubs u Allows connecting devices to a running computer with automatic dectection of the device u See page 45

64 64 2.4 Components of an IA-32 Computer Other I/O ports and connections u Parallel u Serial u FireWire u IDE (Intelligent or integrated Drive Electronics). Also known as ATA u SATA

65 65 2.5 Input – Output System Device drivers u Device drivers: Routines that contain the special code needed to control a particular device. Much like a BIOS routine.

66 66 2.5 Input – Output System I/O Operations u Levels: –0. Hardware level –1. BIOS (Basic Input-Output System) –2. Operating system –3. High level language library –4. High level application

67 67 2.5 Input – Output System I/O Operations (Con’t) u Example: High level language writes a string to the console screen u 4. Application program 3. Language library 2. Operating System 1. BIOS 0. Hardware Assembly program Operating systems may restrict assembly programs without special permissions


Download ppt "1 Chapter 2: IA-32 Processor Architecture Assembly Language for Intel-Based Computers, 5th edition Kip R. Irvine 2/21/07."

Similar presentations


Ads by Google