Presentation is loading. Please wait.

Presentation is loading. Please wait.

1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To.

Similar presentations


Presentation on theme: "1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To."— Presentation transcript:

1

2 1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To learn the system configuration trade-off what size of caches/memory is enough what kind of buses to connect system components what size (speed) of disks to use  To choose a computer for a set of applications in a project.  To interpret the benchmark figures given by salespersons.  To decide which processor chips to use in a system  To design the system software (compiler, OS) for a new processor?  To be the leader of a processor design team?  To learn several machine’s assembly languages?

3 1/16/99CS520S99 IntroductionC. Edward Chow Page 2 The Basic Structure of a Computer

4 1/16/99CS520S99 IntroductionC. Edward Chow Page 3 Control and Data Flow in Processor Processor is made up of  Data operator (Arithmetic and Logic Unit, ALU)—D consumes and combines information into a new meaning  Control—K evokes operations of other components

5 1/16/99CS520S99 IntroductionC. Edward Chow Page 4 Control is often distributed

6 1/16/99CS520S99 IntroductionC. Edward Chow Page 5 Instruction Execution at Register Transfer Level (RTL) Consider the detailed execution of the instruction “move &100, %d0” (Moving constant 100 to register d0) Assume the instruction was loaded into memory location 1000 The op code of the move instruction and the register address d0 are encoded in byte1000 and 1001 The constant 100 in byte 1002 and 1003.

7 1/16/99CS520S99 IntroductionC. Edward Chow Page 6 RTL Instruction Execution Mpc is set to 1000 pointing at instruction in the meory Step 1: Mmar = Mpc; // put pc into mar; prepare to fetch instruction. 1000

8 1/16/99CS520S99 IntroductionC. Edward Chow Page 7 Update Program Counter Step 2: Mpc = Mpc+4; // update program counter; move Mpc value to D, D perform +4, move result back to Mpc 1000+2 1000 1002

9 1/16/99CS520S99 IntroductionC. Edward Chow Page 8 Instruction Fetch Step 3: Mir = Mp[Mmar]; // fetch instruction send Mmar value to Mp, Mp retrieve move|d0, send back to Mir Steps3 and 2 can be done in parallel. 1000 Move|d0 100

10 1/16/99CS520S99 IntroductionC. Edward Chow Page 9 Instruction Decoding Step 4: Decode Instruction in Mir Move|d0 100

11 1/16/99CS520S99 IntroductionC. Edward Chow Page 10 RTL Instruction Execution Step 5: Mgeneral[0] = Mp[Mir 16-31 ];// execute the move of the constant into a general register named d0 Move|d0 100 Subscript 16-31 denotes the 16th and 31th bits containing constant 100

12 1/16/99CS520S99 IntroductionC. Edward Chow Page 11 Computer Architecture The term “computer architecture” was coined by IBM in 1964 for use with IBM 360. Amdahl, Blaauw, and Brooks [1964] used the term to refer to the programmer-visible portion of the instruction set. They believe that a family of machines of the same architecture should be able to run the same software. Benefits: With a precise defined architecture, we can have many compatible implementations. The program written in the same instruction set can run in all the compatible implementations.

13 1/16/99CS520S99 IntroductionC. Edward Chow Page 12 Architecture & Implementation Single Architecture—multiple implementation  computer family Multiple Architecture—single implementation  microcode emulator

14 1/16/99CS520S99 IntroductionC. Edward Chow Page 13 Computer Architecture Topics Instruction Set Architecture Pipelining, Hazard Resolution, Superscalar, Reordering, Prediction, Speculation, Vector, DSP Addressing, Protection, Exception Handling L1 Cache L2 Cache DRAM Disks, WORM, Tape Coherence, Bandwidth, Latency Emerging Technologies Interleaving Bus protocols RAID VLSI Input/Output and Storage Memory Hierarchy Pipelining and Instruction Level Parallelism Adapted from (Prof. Patterson’s CS252S98 viewgraph). Copyright 1998 UCB

15 1/16/99CS520S99 IntroductionC. Edward Chow Page 14 Computer Architecture Topics M Interconnection Network S PMPMPMP ° ° ° Topologies, Routing, Bandwidth, Latency, Reliability Network Interfaces Shared Memory, Message Passing, Data Parallelism Processor-Memory-Switch Multiprocessors Networks and Interconnections Adapted from (Prof. Patterson’s CS252S98 viewgraph). Copyright 1998 UCB

16 1/16/99CS520S99 IntroductionC. Edward Chow Page 15 CS 520 Course Focus Understanding the design techniques, machine structures, technology factors, evaluation methods that will determine the form of computers in 21st Century Technology Programming Languages Operating Systems History Applications Interface Design (ISA) Measurement & Evaluation Parallelism Computer Architecture: Instruction Set Design Organization Hardware Adapted from (Prof. Patterson’s CS252S98 viewgraph). Copyright 1998 UCB

17 1/16/99CS520S99 IntroductionC. Edward Chow Page 16 Function Requirements faced by a computer designer Applications –general purpose balanced performance for a range of tasks –Scientific high performance floating points –Commercial support for COBOL (decimal arithmetic) database/transaction processing Level of software compatibility –Object code/binary level no software porting, more hw design cost –Programming Lang. Level avoid old architecture burden, require software porting

18 1/16/99CS520S99 IntroductionC. Edward Chow Page 17 Function Requirements faced by a computer designer Operating System Requirements –Size of address space –Memory management/Protection (e.g. garbage collection vs. realtime scheduling) –Interrupt/traps Standards –Floating Point (IEEE754) –I/O Bus –OS –Networks –Programming Languages

19 1/16/99CS520S99 IntroductionC. Edward Chow Page 18 1988 Computer Food Chain PCWork- station Mini- computer Mainframe Mini- supercomputer Supercomputer Massively Parallel Processors Adapted from (Prof. Patterson’s CS252S98 viewgraph). Copyright 1998 UCB

20 1/16/99CS520S99 IntroductionC. Edward Chow Page 19 1998 Computer Food Chain PCWork- station Mainframe Supercomputer Mini- supercomputer Massively Parallel Processors Mini- computer Now who is eating whom? Server Adapted from (Prof. Patterson’s CS252S98 viewgraph). Copyright 1998 UCB

21 1/16/99CS520S99 IntroductionC. Edward Chow Page 20 Why Such Change in 10 years? Performance –Technology Advances CMOS VLSI dominates older technologies (TTL, ECL) in cost AND performance –Computer architecture advances improves low-end RISC, superscalar, RAID, … Price: Lower costs due to … –Simpler development CMOS VLSI: smaller systems, fewer components –Higher volumes CMOS VLSI : same dev. cost 10,000 vs. 10,000,000 units –Lower margins by class of computer, due to fewer services Function –Rise of networking/local interconnection technology Adapted from (Prof. Patterson’s CS252S98 viewgraph). Copyright 1998 UCB

22 1/16/99CS520S99 IntroductionC. Edward Chow Page 21 Technology Trends: Microprocessor Capacity CMOS improvements: Die size: 2X every 3 yrs Line width: halve / 7 yrs “Graduation Window” Alpha 21264: 15 million Pentium Pro: 5.5 million PowerPC 620: 6.9 million Alpha 21164: 9.3 million Sparc Ultra: 5.2 million Moore’s Law Adapted from (Prof. Patterson’s CS252S98 viewgraph). Copyright 1998 UCB

23 1/16/99CS520S99 IntroductionC. Edward Chow Page 22 Memory Capacity (Single Chip DRAM) year size(Mb) cycle time 19800.0625250 ns 19830.25220 ns 19861190 ns 19894165 ns 199216145 ns 199664120 ns 2000256100 ns Adapted from (Prof. Patterson’s CS252S98 viewgraph). Copyright 1998 UCB

24 1/16/99CS520S99 IntroductionC. Edward Chow Page 23 Technology Trends (Summary) CapacitySpeed (latency) Logic2x in 3 years2x in 3 years DRAM4x in 3 years2x in 10 years Disk4x in 3 years2x in 10 years Adapted from (Prof. Patterson’s CS252S98 viewgraph). Copyright 1998 UCB

25 1/16/99CS520S99 IntroductionC. Edward Chow Page 24 Processor Performance Trends Year 0.1 1 10 100 1000 19651970197519801985199019952000 Microprocessors Minicomputers Mainframes Supercomputers Adapted from (Prof. Patterson’s CS252S98 viewgraph). Copyright 1998 UCB

26 1/16/99CS520S99 IntroductionC. Edward Chow Page 25 Processor Performance (1.35X before, 1.55X now) 1.54X/yr Adapted from (Prof. Patterson’s CS252S98 viewgraph). Copyright 1998 UCB

27 1/16/99CS520S99 IntroductionC. Edward Chow Page 26 Performance Trends (Summary) Workstation performance (measured in Spec Marks) improves roughly 50% per year (2X every 18 months) Improvement in cost performance estimated at 70% per year Adapted from (Prof. Patterson’s CS252S98 viewgraph). Copyright 1998 UCB

28 1/16/99CS520S99 IntroductionC. Edward Chow Page 27 Computer Engineering Methodology Technology Trends Adapted from (Prof. Patterson’s CS252S98 viewgraph). Copyright 1998 UCB

29 1/16/99CS520S99 IntroductionC. Edward Chow Page 28 Computer Engineering Methodology Evaluate Existing Systems for Bottlenecks Technology Trends Benchmarks Adapted from (Prof. Patterson’s CS252S98 viewgraph). Copyright 1998 UCB

30 1/16/99CS520S99 IntroductionC. Edward Chow Page 29 Computer Engineering Methodology Evaluate Existing Systems for Bottlenecks Simulate New Designs and Organizations Technology Trends Benchmarks Workloads Adapted from (Prof. Patterson’s CS252S98 viewgraph). Copyright 1998 UCB

31 1/16/99CS520S99 IntroductionC. Edward Chow Page 30 Computer Engineering Methodology Evaluate Existing Systems for Bottlenecks Simulate New Designs and Organizations Implement Next Generation System Technology Trends Benchmarks Workloads Implementation Complexity Adapted from (Prof. Patterson’s CS252S98 viewgraph). Copyright 1998 UCB

32 1/16/99CS520S99 IntroductionC. Edward Chow Page 31 Measurement and Evaluation Architecture is an iterative process: Searching the space of possible designs At all levels of computer systems Creativity Good Ideas Mediocre Ideas Bad Ideas Cost / Performance Analysis Adapted from (Prof. Patterson’s CS252S98 viewgraph). Copyright 1998 UCB

33 1/16/99CS520S99 IntroductionC. Edward Chow Page 32 Measurement Tools Benchmarks, Traces, Mixes Hardware: Cost, delay, area, power estimation Simulation (many levels) –ISA, RT, Gate, Circuit Queuing Theory Rules of Thumb Fundamental “Laws”/Principles

34 1/16/99CS520S99 IntroductionC. Edward Chow Page 33 Metric of Computer Architecture Space measured in bits of representation Time measures in bit traffic (memory bandwidth) Many old frequency and benchmark studies focus on dynamic opcode (memory size concern) exponent differences of floating point operands (precision) length of decimal numbers in business files (memory size) Trend: space is not much a concern; speed/time is everything. Here we focus more on the following two performance metrics Response time = time between start and finish of an event — execution time — latency Throughput = total amount of work done in a given time — bandwidth (no. of bits or bytes moved per second)

35 1/16/99CS520S99 IntroductionC. Edward Chow Page 34 Metrics of Performance at Different Levels Compiler Programming Language Application Datapath Control TransistorsWiresPins ISA Function Units (millions) of Instructions per second: MIPS (millions) of (FP) operations per second: MFLOP/s Cycles per second (clock rate) Megabytes per second Answers per month Operations per second Adapted from (Prof. Patterson’s CS252S98 viewgraph). Copyright 1998 UCB

36 1/16/99CS520S99 IntroductionC. Edward Chow Page 35 Quantitative principles Improve means increase performance decrease execution time “X is n% faster than Y”  Quantitative principles Make the common case fast — Amdahl’s Law Locality of reference — 90% of execution time in 10% of code

37 1/16/99CS520S99 IntroductionC. Edward Chow Page 36 Amdahl’s Law Law of diminishing returns 50 FractionInEnhancedMode=0.5 based on old system SpeedupOfEnhancedMode=2 50 25 Time old Time new 

38 1/16/99CS520S99 IntroductionC. Edward Chow Page 37 Amdahl’s Law Result FractionIn Enhancedmode OverallSpeedup When SpeedupOfEnhancedMode=2 OverallSpeedup When SpeedupOfEnhancedMode = 0.11.05 1.1 0.31.15 1.4 0.51.33 2 0.71.5 3.33 0.91.9 10 0.992100

39 1/16/99CS520S99 IntroductionC. Edward Chow Page 38 Apply Amdahl’s Law: Example 1 Example1: Assume that the memory access accounts for 90% of the execution time. What is the speedup by replacing a 100ns memory with a 10ns memory? How much fast is the new system? Answer: FractionInEnhancedMode = 90%=0.9 SpeedupOfEnhancedMode = 100ns/10ns = 10 The new system is 426% faster than the old one. Is it worthwhile if the high speed memory costs 10 times more?

40 1/16/99CS520S99 IntroductionC. Edward Chow Page 39 Apply Amdahl’s Law: Example 2 Example 2: Assume that 40% of the time is spent on CPU task; the rest is spent on I/O. Assume we improve CPU and keep I/O speed unchanged. a) How much faster should new CPU be to have the overall speedup of 1.5? b) Is that possible to have an overall speedup of 2? Why? Solution: a)  x=6. 500% faster b) The maximum overall speedup that can be achieved is Therefore, it is not possible to achieve the overall speedup of 2.

41 1/16/99CS520S99 IntroductionC. Edward Chow Page 40 Apply Amdahl’s Law: Example 3 Example: A recent research on the bottleneck of a 10Mbps Ethernet network system showed that only 10% of the execution time of a distributed application was spent on transmitting messages and 90% of the time was on application/ protocol software execution at hosts’ computers. If we replace Ethernet with 100 Mbps FDDI, 900% faster than Ethernet, what will be speedup of this improvement? What if we use 900% faster hosts?

42 1/16/99CS520S99 IntroductionC. Edward Chow Page 41 Excution Time The first performance metric and the best metric. Measure the time it takes to execute the intended application(s) or the typical workload. The time command can measure an application. vlsia[93]: time ts9 217.1u 27.2s 8:16 49% 0+27552k 6+3io 26pf+0w Here is an example which shows how OS and I/O impact the execution time. For program 1, Elapsed Time = sum(t 1 :t 11 )-t 6 -t 8 System CPU time = t 1 +t 3 +t 5 +t 9 +t11 CPU time = t 1 + t 3 + t 4 + t 5 + t 9 +t 10 User CPU time = t 4 + t 10

43 1/16/99CS520S99 IntroductionC. Edward Chow Page 42 CPU Time CPI=(Clock cycles Per Instruction); I i is the frequency of instruction i in a program; IC=Instruction Count.; ClockCycleTime=1/ClockRate CPI figure gives insight into different styles of instruction sets & implementations. Interdependence among instruction count, CPI, and Clock rate Clock rate—Hardware technology and organization CPI—Organization and instruction set architecture Instruction count—Instruction set architecture and compiler technology We cannot measure the performance of a computer by single factor above alone.

44 1/16/99CS520S99 IntroductionC. Edward Chow Page 43 Evaluating Instruction Set Design ExampleExample Page 39: 1/4 of ALU and Load instructions replaced by new r->m inst. Assume that the clock cycle time is not changed. Is this a good idea? Frequency Before ClockcycleFrequency After ClockCycle ALU ops43%136.1%1 Loads21%211.4%2 Stores12%213.5%2 Braches24%226.9%3 New r->m12.1%2

45 1/16/99CS520S99 IntroductionC. Edward Chow Page 44 Evaluate Instruction Design CPI old = (0.43*1 + 0.21*2 + 0.12*2 + 0.24*2) = 1.57 CPU time old = InstructionCount old * 1.57 * ClockCycleTime old CPInew= =1.908 CPU time new = (0.893*InstructionCount old ) * 1.908 * ClockCycleTime old = 1.703 * InstructionCountold * ClockCycleTime old With the assumptions, it is a bad idea to add register- memory instructions.

46 1/16/99CS520S99 IntroductionC. Edward Chow Page 45 Estimate CPU time by (  CPIi*InstructionCounti)*ClockCycleTime Program: f=(a-b)/(c-d*e) MIPS R2000 25MHz Instructions (op dst, src1, src2) lw$14, 20($sp) lw$15, 16($sp) subu$24, $14, $15 lw$25, 8($sp) lw$8, 4($sp) mul$9, $25, $8 lw$10, 12($sp) subu$11, $10, $9 div$12, $24, $11 sw$12, 0($sp) IC=InstructionCount=10 CPI=ClockcyclesPerInstruction CPIi=ClockcyclesOfInstructionType i Ii=number of Instructions of type i in a prog. ClockCycleTime =1/ClockRate=1/25*10 6 =40*10 -9 sec=40nsec CPIi can be obtained from processor handbook. Here we assume no cache misses.

47 1/16/99CS520S99 IntroductionC. Edward Chow Page 46 Estimate CPU time by ClockCycleTime*  CPIi*InstructionCounti) iInstruction Type Ii Count CPIiCPIi*ICi 1lw5210 2subu212 3mul111 4div111 5sw122 16 CPU Time = 16*40 nsec = 640 nsec

48 1/16/99CS520S99 IntroductionC. Edward Chow Page 47 Other Performance Measures The only reliable measure of performance is the execution time of real programs. Other attempts: 1. Depends on instruction set, hard to compare, MIPS varies with programs on the same computer. Example1: the impact of using Floating Point Hardware on MIPS. Example2: Impact of optimizing compiler usage on MIPS. What affects performance? input version of programs, compiler, OS, CPU optimizing level of compiler machine configurations — amount of cache, main memory, disks — the speed of cache, main memory, disks, and bus.

49 1/16/99CS520S99 IntroductionC. Edward Chow Page 48 Myth of MIPS Example: The effect of optimizing compiler on MIPS number. (Page45) A machine with the 500MHz clock rate and the following clock cycles for instructions. For a program, the relative frequencies of instructions before and after using an optimizing compiler are as shown in the table. Instruction Type IC Before Optimization CPIiIC After Optimization ALU ops86143 Loads422 Stores242 Branches482 CPI unoptimized = 86/200*1+42/200*2+24/200*2+48/200*2=1.57 MIPS unoptimized = 500/(1.57*10 6 )=318.5 CPI optimized = 43/157*1+42/157*2+24/157*2+48/157*2=1.73 MIPS optimized = 500/(1.73*10 6 )=289.0 CPU time unoptimized = 200*1.57*(2*10 -9 ) = 6.28*10 -7 CPU time optimized = 157*1.73*(2*10-9) = 5.43*10 -7

50 1/16/99CS520S99 IntroductionC. Edward Chow Page 49 MFLOPS For scientific computing MFLOPS is used as a metric: Here it emphasizes operations instead of instructions. Unfortunately, the set of floating-point operations is not consistent across machines. The rating changes with different mix ratio of integer-floating or floating-floating instructions. The solution is to use a canonical number of floating point operations for certain type of FP operations, e.g. 1 for (add, sub, compare, mul), 4 for (fdiv, fsqrt), 8 for (arctan, sin, exp)

51 1/16/99CS520S99 IntroductionC. Edward Chow Page 50 Programs to Evaluate Performance Real programs — The set of programs to be run forms the workload. Kernels — key pieces of real programs; isolate features of a machines; Livermore Loops (weighted ops); Linpack Toy Benchmarks — 10 to 100 lines of codes: e.g., quicksort, Sieve, Puzzle Synthetic Benchmarks — artificially created to match an average execution profile: e.g., Whetstone, Dhrystone SPEC (System Performance Evaluation Cooperation) Benchmarks 89, 92, 95. Perfect Club Benchmarks for parallel computations.

52 1/16/99CS520S99 IntroductionC. Edward Chow Page 51 SPEC: System Performance Evaluation Cooperative Benchmark First Round 1989: 10 programs yielding a single number (“SPECmarks”) Second Round 1992: SPECInt92 (6 integer programs) and SPECfp92 (14 floating point programs) –Compiler Flags unlimited. March 93 of DEC 4000 Model 610: spice: unix.c:/def=(sysv,has_bcopy,”bcopy(a,b,c)= memcpy(b,a,c)” wave5: /ali=(all,dcom=nat)/ag=a/ur=4/ur=200 nasa7: /norecu/ag=a/ur=4/ur2=200/lc=blas Third Round 1995 –new set of programs: SPECint95 (8 integer programs) and SPECfp95 (10 floating point) –“benchmarks useful for 3 years” –Single flag setting for all programs: SPECint_base95, SPECfp_base95

53 1/16/99CS520S99 IntroductionC. Edward Chow Page 52 Comparison of Machine Performance Single Program—execution time Collection of (n) Programs 1.Total execution time 2.Normalized to a reference machine, compute the TimeRatio of ith program TimeRatio i =Time i /Time i (ReferenceMachine) arithmetic mean= geometric mean= harmonic mean= Geometric mean is consistent independent of referenced machine. Harmonic mean decrease impact of outliers.

54 1/16/99CS520S99 IntroductionC. Edward Chow Page 53 Summarize Performance Results Example: Execution of two programs on three machines. Assume Program 1 has 10M floating point operations and Program 2 has 50M floating point operations ComputerAComputerBComputerC Program1(sec)11020 Program2(sec)1005020 TotalTime(sec)1016040 Native MFLOPS on Program 1 10/1=1010/10=110/20=0.5 Native MFLOPS on Program 2 50/100= 0.5 50/50=150/20=2.5 Arithmetic Mean(10+0.5)/2=5.25(1+1)/2=1(0.5+2.5)/2=3 Geometric Mean

55 1/16/99CS520S99 IntroductionC. Edward Chow Page 54 Weighted Arithmetic Means For a set of n program, each takes Time i on one machine, the “equal-time” weights on that machine are Figure 1.12 W(3) [W(2)] are equal-time weights based on machineA [B]. This is used in Exercise 1.11 abcw(1)w(2)w(3) P1(sec)110200.50.9090.999 P2(sec)1000100200.50.0910.001 AM:W(1)500.55520 AM:W(2)91.8218.1820 AM:W(3)1.99810.0920

56 1/16/99CS520S99 IntroductionC. Edward Chow Page 55 Hints for Homework # 1 Exercise 1.7: 1. Whetstone consists of integer operations besides the floating- point operations. 2. When floating point processor is not used, all floating-point operations need to be emulated by integer operations (e.g. shift, and, add, sub, multiply, div...). 3. For different co-fp processors, we will have the same # of integer ops but different # of FP ops. Exercise 1.11: a.use the equal-time weightings formula in Page 26. b.DEC3000 execution time(ora) = VAX11 780Time(ora)/ DEC3000SPECRatio=7421/165

57 1/16/99CS520S99 IntroductionC. Edward Chow Page 56 FP Compilation Results depend on existence of FP coprocessor Exercise 1.7. Whetstone is a benchmark with both Integer and Floating Point (FP) operations.

58 1/16/99CS520S99 IntroductionC. Edward Chow Page 57 Compiling floating-point statement Here are the generated assembly instructions of a floating-point operation statement in C on DEC3100 (with R2010 floating point unit) using command cc -S Note that since the R2010 only implements simple floating point add, sub, mult, and div operations, sqrt, exp, and alog are translated as subroutine calls using jal instr. The floating-point division is translated as div.d and will be executed by R2010. # 7 x=sqrt(exp(alog(x)/t1)); s.d $f4, 48($sp)#load x to fp register f4 l.d $f12, 56($sp)#load t1 to fp register f12 jal alog#call subroutine alog move $16, $2 mtc1 $16, $f6 cvt.d.w $f8, $f6#f8 contains alog(x) l.d $f10, 48($sp) div.d $f12, $f8, $f10 jal exp mov.d $f20, $f0 mov.d $f12, $f20 jal sqrt s.d $f0, 56($sp)

59 1/16/99CS520S99 IntroductionC. Edward Chow Page 58 Homework #1 Problems 1.7 and 1.11 Problem A. Program segment: f=(a-b)/(a*b) is compiled into the following MIPS R2000 code. Instructions (op dst, src1, src2) lw $14, 20($sp)# a is allocated at M[sp+20] lw $15, 16($sp)# b is allocated at M[sp+16] subu $24, $14, $15 mul $9, $14, $15 div $12, $24, $9 sw $12, 0($sp)# f is allocated at M[sp+0]

60 1/16/99CS520S99 IntroductionC. Edward Chow Page 59 Homework #1 (Continue) Assume all the variables are already in the cache (i.e. does not have to go the main memory for data) and Table 1 contains the clock cycles for each types of instructions when data is in the cache. What is the execution time (in term of seconds) of the above segment using a R2000 chip with a 25 MHz clock? Problem B. Assume the CPU operation accounts for 70% of the time in a system. a) What is the overall speedup if we improve CPU speed by 100%? b) How much faster should the new CPU be in order to have the overall speedup of 1.7? c) Is it possible to have overall speedup of 3 by just improving the CPU?


Download ppt "1/16/99CS520S99 IntroductionC. Edward Chow Page 1 Why study computer architecture?  To learn the principles for designing processors and systems  To."

Similar presentations


Ads by Google