Presentation is loading. Please wait.

Presentation is loading. Please wait.

(6.1) Central Processing Unit Architecture  Architecture overview  Machine organization – von Neumann  Speeding up CPU operations – multiple registers.

Similar presentations


Presentation on theme: "(6.1) Central Processing Unit Architecture  Architecture overview  Machine organization – von Neumann  Speeding up CPU operations – multiple registers."— Presentation transcript:

1 (6.1) Central Processing Unit Architecture  Architecture overview  Machine organization – von Neumann  Speeding up CPU operations – multiple registers – pipelining – superscalar and VLIW  CISC vs. RISC

2 (6.2) Computer Architecture  Major components of a computer – Central Processing Unit (CPU) – memory – peripheral devices  Architecture is concerned with – internal structures of each – interconnections » speed and width – relative speeds of components  Want maximum execution speed – Balance is often critical issue

3 (6.3) Computer Architecture (continued)  CPU – performs arithmetic and logical operations – synchronous operation – may consider instruction set architecture » how machine looks to a programmer – detailed hardware design

4 (6.4) Computer Architecture (continued)  Memory – stores programs and data – organized as » bit » byte = 8 bits (smallest addressable location) » word = 4 bytes (typically; machine dependent) – instructions consist of operation codes and addresses oprn addr 1 addr 2 addr 3addr 2 addr 1

5 (6.5) Computer Architecture (continued)  Numeric data representations – integer (exact representation) » sign-magnitude » 2’s complement negative values change 0 to 1, add 1 – floating point (approximate representation) » scientific notation: 0.3481 x 10 6 » inherently imprecise » IEEE Standard 754-1985 smagnitude sexpsignificand

6 (6.6) Simple Machine Organization  Institute for Advanced Studies machine (1947) – “von Neumann machine” » ALU performs transfers between memory and I/O devices » note two instructions per memory word main memory Input- Output Equipment Arithmetic - Logic Unit Program Control Unit op code address 08202839

7 (6.7) Simple Machine Organization (continued)  ALU does arithmetic and logical comparisons – AC = accumulator holds results – MQ = memory-quotient holds second portion of long results – MBR = memory buffer register holds data while operation executes

8 (6.8) Simple Machine Organization (continued)  Program control determines what computer does based on instruction read from memory – MAR = memory address register holds address of memory cell to be read – PC = program counter; address of next instruction to be read – IR = instruction register holds instruction being executed – IBR holds right half of instruction read from memory

9 (6.9) Simple Machine Organization (continued)  Machine operates on fetch-execute cycle  Fetch – PC MAR – read M(MAR) into MBR – copy left and right instructions into IR and IBR  Execute – address part of IR MAR – read M(MAR) into MBR – execute opcode

10 (6.10) Simple Machine Organization (continued)

11 (6.11) Architecture Families  Before mid-60’s, every new machine had a different instruction set architecture – programs from previous generation didn’t run on new machine – cost of replacing software became too large  IBM System/360 created family concept – single instruction set architecture – wide range of price and performance with same software  Performance improvements based on different detailed implementations – memory path width (1 byte to 8 bytes) – faster, more complex CPU design – greater I/O throughput and overlap  “Software compatibility” now a major issue – partially offset by high level language (HLL) software

12 (6.12) Architecture Families

13 (6.13) Multiple Register Machines  Initially, machines had only a few registers – 2 to 8 or 16 common – registers more expensive than memory  Most instructions operated between memory locations – results had to start from and end up in memory, so fewer instructions » although more complex – means smaller programs and (supposedly) faster execution » fewer instructions and data to move between memory and ALU  But registers are much faster than memory – 30 times faster

14 (6.14) Multiple Register Machines (continued)  Also, many operands are reused within a short time – waste time loading operand again the next time it’s needed  Depending on mix of instructions and operand use, having many registers may lead to less traffic to memory and faster execution  Most modern machines use a multiple register architecture – maximum number about 512, common number 32 integer, 32 floating point

15 (6.15) Pipelining  One way to speed up CPU is to increase clock rate – limitations on how fast clock can run to complete instruction  Another way is to execute more than one instruction at one time

16 (6.16) Pipelining  Pipelining breaks instruction execution down into several stages – put registers between stages to “buffer” data and control – execute one instruction – as first starts second stage, execute second instruction, etc. – speedup same as number of stages as long as pipe is full

17 (6.17) Pipelining (continued)  Consider an example with 6 stages – FI = fetch instruction – DI = decode instruction – CO = calculate location of operand – FO = fetch operand – EI = execute instruction – WO = write operand (store result)

18 (6.18) Pipelining Example  Executes 9 instructions in 14 cycles rather than 54 for sequential execution

19 (6.19) Pipelining (continued)  Hazards to pipelining – conditional jump » instruction 3 branches to instruction 15 » pipeline must be flushed and restarted – later instruction needs operand being calculated by instruction still in pipeline » pipeline stalls until result ready

20 (6.20) Pipelining Problem Example  Is this really a problem?

21 (6.21) Real-life Problem  Not all instructions execute in one clock cycle – floating point takes longer than integer – fp divide takes longer than fp multiply which takes longer than fp add – typical values » integer add/subtract1 » memory reference1 » fp add2 (make 2 stages) » fp (or integer) multiply6 (make 2 stages) » fp (or integer) divide15  Break floating point unit into a sub-pipeline – execute up to 6 instructions at once

22 (6.22) Pipelining (continued)  This is not simple to implement – note all 6 instructions could finish at the same time!!

23 (6.23) More Speedup  Pipelined machines issue one instruction each clock cycle – how to speed up CPU even more?  Issue more than one instruction per clock cycle

24 (6.24) Superscalar Architectures  Superscalar machines issue a variable number of instructions each clock cycle, up to some maximum – instructions must satisfy some criteria of independence » simple choice is maximum of one fp and one integer instruction per clock » need separate execution paths for each possible simultaneous instruction issue – compiled code from non-superscalar implementation of same architecture runs unchanged, but slower

25 (6.25) Superscalar Example  Each instruction path may be pipelined 023456781clock

26 (6.26) Superscalar Problem  Instruction-level parallelism – what if two successive instructions can’t be executed in parallel? » data dependencies, or two instructions of slow type  Design machine to increase multiple execution opportunities

27 (6.27) VLIW Architectures  Very Long Instruction Word (VLIW) architectures store several simple instructions in one long instruction fetched from memory – number and type are fixed » e.g., 2 memory reference, 2 floating point, one integer – need one functional unit for each possible instruction » 2 fp units, 1 integer unit, 2 MBRs » all run synchronized – each instruction is stored in a single word » requires wider memory communication paths » many instructions may be empty, meaning wasted code space

28 (6.28) VLIW Example

29 (6.29) Instruction Level Parallelism  Success of superscalar and VLIW machines depends on number of instructions that occur together that can be issued in parallel – no dependencies – no branches  Compilers can help create parallelism  Speculation techniques try to overcome branch problems – assume branch is taken – execute instructions but don’t let them store results until status of branch is known

30 (6.30) CISC vs. RISC  CISC = Complex Instruction Set Computer  RISC = Reduced Instruction Set Computer

31 (6.31) CISC vs. RISC (continued)  Historically, machines tend to add features over time – instruction opcodes » IBM 70X, 70X0 series went from 24 opcodes to 185 in 10 years » same time performance increased 30 times – addressing modes – special purpose registers  Motivations are to – improve efficiency, since complex instructions can be implemented in hardware and execute faster – make life easier for compiler writers – support more complex higher-level languages

32 (6.32) CISC vs. RISC  Examination of actual code indicated many of these features were not used  RISC advocates proposed – simple, limited instruction set – large number of general purpose registers » and mostly register operations – optimized instruction pipeline  Benefits should include – faster execution of instructions commonly used – faster design and implementation

33 (6.33) CISC vs. RISC  Comparing some architectures

34 (6.34) CISC vs. RISC  Which approach is right?  Typically, RISC takes about 1/5 the design time – but CISC have adopted RISC techniques


Download ppt "(6.1) Central Processing Unit Architecture  Architecture overview  Machine organization – von Neumann  Speeding up CPU operations – multiple registers."

Similar presentations


Ads by Google