Presentation on theme: "Microprocessors. Von Neumann architecture Data and instructions in single read/write memory Contents of memory addressable by location, independent of."— Presentation transcript:
ALU – Arithmetic Logic Unit Function units Floating-point unit (FPU) –Stack oriented Communication –Control and status buses –RISC has several –Input – registers, Output – via storage to registers. Registers connected via signal paths.
Control Unit – Function Fetch instruction, put it into IR, increment PC Decode and execute instruction Micro-orders Micro-instructions Micro-program
Control Units Microprogrammed –programmed Conventional –Hard-wired RISC vs CISC
Operations and IS Each instruction = 100, 1000, even 1000000 logic operations. 1 instruction triggers cascade of logical operations
CPU Hardwired design Instruction activates circuits –PLUS -> Provides for fast execution –MINUS -> no flexibility Changes in hardware of machine require changes in code which means changes in hardwiring
Microcode Developed by IBM Chip executes program – on nanoprocessor Plus -> easier to make complex processor Minus -> slower operation To compensate, microcode allows very complex instructions to be used; so fewer instructions are necessary
CPU-controlled I/O “Write A to Device N” Challenge to keep CPU utilization high –Multiprogrammed Operating Systems –Multi-ported Memory Systems –I/O processors DMA channels, peripheral processing units (PPUs)
Memory-mapped I/O Memory addresses reserved for interface devises Each interface has several port addresses (control ports, status ports, input ports, output ports) Requires no special I/O instructions CPU can’t distinguish I/O from normal address operation
DMA I/O Hardware devices that directly control transfer of data No CPU intervention Interrupts CPU when finished Can control bus during transfer
DMA examples IBM – DMA Channels –Selector channels – multiple devices, one at a time –Multiplexor channels – multiple devices simultaneously Peripheral-processing units (Ppus) – Main frames – Control Data Corp.
Closer Look – Pentium III and 2 P3 –L1 cache - operating at speed of CPU –L2 cache 2M (in XEON 2M - 2GB) –28.1 x 10 6 transistors P2 –7.5 x 10 6 transistors –16K L1 instruction cache, and 16K data cache
Closer look continued P3 – 2 ALUs, 2 FPUs –MMX unit
Clocked Logic Instructions are not carried out immediately as code signals reaches pins – there is a wait time Early processors did not execute 1 instruction/clock cycles, many instructions required as many as 100
Clocked Logic Using current RISC techniques, many instructions take less than 1 clock cycle (multiple ALUs, pipelined ALUs, SIMD, …) Clock multipliers allow CPU to run faster than system clock Basically, clock speed is not a good metric for different processors, just good indicator for identical processors with different speeds.
Clocked Logic Example: Suppose that processor P1 requires an average of six clock cycles per instruction and the system clock runs at 400 MHz. Processor P2 requires an average of two clock cycles per instruction and the system clock runs at 200 MHz. For P1 = 400 MHz / 6 cycles / instruction 67 For P2 = 200 MHz / 2 cycles / instruction 100 Thus, (67 - 100)/67 -50% or P1 is 50% slower than P2 even though it's clock speed is twice as fast as P1.
Modern CPUs Need faster processing –Reduce number of steps microprocessor must take Make processor complex (so can combine steps) Make instructions simpler Operating on more than 1 instruction at a time (pipelining and superscalar)
Pentium II L2 cache – 512 K, not part of CPU (MCM) –Speed – ½ of CPU L1 cache double of that before to deal with bus bottleneck Bus Interface Unit –Data is duplicated to L1 and L2 While fetch/Decode is pulling instructions, Branch Target Buffer looks for already completed instructions. BTB also looks for branching and predicts results with rate of 90%