Presentation is loading. Please wait.

Presentation is loading. Please wait.

Part 2. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 A five-level memory.

Similar presentations


Presentation on theme: "Part 2. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 A five-level memory."— Presentation transcript:

1 Part 2

2 Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 A five-level memory hierarchy. Note cost vs. size.

3 1. All instructions are directly executed by hardware. 2. Maximize the rate at which instructions are issued. 3. Instructions should be easy to decode. 4. Only loads and stores should reference memory. 5. Provide many registers.

4 1. All instructions are directly executed by hardware.  Eliminate the microcode interpreter

5 2. Maximize the rate at which instructions are issued.  If you issue 500 MIPS, you have a 500 MIPS machine.  Parallelism

6 3. Instructions should be easy to decode.  Made possible by regular, fixed-length instructions w/ a small number of fields.  Fewer instructions are better.  Fewer instruction formats are better.

7 4. Only loads and stores should reference memory.  Memory access takes a long time.  Most instructions should use registers.  Separate ops for load & store.  can be done in parallel

8 5. Provide many registers.  At least 32!  Time consuming to have to save registers temporarily and reload them later.

9  Ways to increase speed: a. increase the clock speed b. parallelism types: 1. processor/core level 2. instruction level

10  Fetching instruction from memory is slow.  So use a Prefetch Buffer = set of registers (memory) containing instructions to be executed.  Fetch and execution can now be done in parallel!

11 Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0  A five-stage pipeline  The state of each stage as a function of time. Nine clock cycles are illustrated.

12  Latency = time to execute instruction  Bandwidth = MIPS (instructions per second – typically in millions)  Cycle time = time to move through 1 stage of the pipeline = clock rate = clock cycle

13 Problem: Let the clock rate = 3 nsec/stage and the execution of each instruction requires 6 stages or steps. a. What is the bandwidth in MIPS for a machine without any pipeline (i.e., without any instruction- level parallelism)? b. What is the bandwidth in MIPS for a machine with a pipeline?

14 Problem: Let the clock rate = 3 nsec/stage and the execution of each instruction requires 6 stages or steps. a. What is the bandwidth in MIPS for a machine without any pipeline (i.e., without any instruction- level parallelism)? 6 stages/inst x 3x10 -9 sec/stage = 18x10 -9 sec/inst 1 inst/18x10 -9 sec = 56 MIPS

15 Problem: Let the clock rate = 3 nsec/stage and the execution of each instruction requires 6 stages or steps. a. What is the bandwidth in MIPS for a machine without any pipeline (i.e., without any instruction- level parallelism)? 6 stages/inst x 3x10 -9 sec/stage = 18x10 -9 sec/inst 1 inst/18x10 -9 sec = 56 MIPS b. What is the bandwidth in MIPS for a machine with a pipeline?

16 Problem: Let the clock rate = 3 nsec/stage and the execution of each instruction requires 6 stages or steps. a. What is the bandwidth in MIPS for a machine without any pipeline (i.e., without any instruction- level parallelism)? 6 stages/inst x 3x10 -9 sec/stage = 18x10 -9 sec/inst 1 inst/18x10 -9 sec = 56 MIPS b. What is the bandwidth in MIPS for a machine with a pipeline? 3x10 -9 sec/inst 1 inst/3x10 -9 sec = 333 MIPS

17 Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 Dual five-stage pipelines with a common instruction fetch unit. fetches pairs of instructions

18 Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 Note: Since 2 inst can be executed at the same time (S4), they must not conflict over resource usage (e.g., register) and neither must depend on the result of the other. How can we insure this?

19 Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 Note: Since 2 inst can be executed at the same time (S4), they must not conflict over resource usage (e.g., register) and neither must depend on the result of the other. How can we insure this? (1) hardware, (2) compiler

20  386 – no pipeline  486 – one pipeline  first generation Pentium  two 5-stage pipelines: 1. u pipeline - can execute any instruction 2. v pipeline – limited; only integer instructions or FXCH  P4 – 20 stages  “The later "Prescott" and "Cedar Mill" Pentium 4 cores (and their Pentium D derivatives) had a 31-stage pipeline, the longest in mainstream consumer computing.” - http://en.wikipedia.org/wiki/Instruction_pipeline http://en.wikipedia.org/wiki/Instruction_pipeline  Nehalem (16 pipeline stages), Enhanced Core, and Sandy Bridge microachitecture (next few slides; see http://www.intel.com/content/dam/doc/manual/64-ia-32- architectures-optimization-manual.pdf) http://www.intel.com/content/dam/doc/manual/64-ia-32- architectures-optimization-manual.pdf

21

22

23

24 Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 A superscalar processor with five functional units. S3 issued every clock cycle S4 may require more than 1 clock cycle


Download ppt "Part 2. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 A five-level memory."

Similar presentations


Ads by Google