Presentation on theme: "CS1104: Computer Organisation School of Computing National University of Singapore."— Presentation transcript:
CS1104: Computer Organisation http://www.comp.nus.edu.sg/~cs1104 http://www.comp.nus.edu.sg/~cs1104 School of Computing National University of Singapore
CS1104Performance2 Performance Definition Factors Affecting Performance Measurement Parameters for Performance Co-relation Among Performance Parameters Read Chapter 4 of Patterson’s book Sections 4.1 – 4.3.
CS1104Performance3 Performance Definition (1/2) Purchasing perspective: Given a collection of machines, which has the best performance? least cost? best performance/cost? Design perspective: Faced with design options, which has the best performance improvement? least cost? best performance/cost?
CS1104Performance4 Performance Definition (2/2) Both require basis for comparison metric for evaluation Our goal is to understand performance of machine’s architectural design.
CS1104Performance5 Two Notions of Performance (1/2) Which has higher performance? Time to do ONE task (execution time) Execution time, response time, latency Tasks per day, hour, week, … (performance) Throughput, bandwidth. Response time and throughput might be in opposition. PlaneDC to ParisSpeedPassengers Throughput (pmph) Boeing 7476.5 hours610 mph470286,700 AirBus3 hours1350 mph132178,200
CS1104Performance6 Two Notions of Performance (2/2) Time of AirBus vs. Boeing 747? AirBus is 1350 mph / 610 mph = 2.2 times faster = 6.5 hours / 3 hours Throughput of AirBus vs. Boeing 747? AirBus is 178,200 pmph / 286,700 pmph = 0.62 “times faster” Boeing is 286,700 pmph / 178,200 pmph = 1.6 “times faster” Boeing is 1.6 times faster in terms of throughput. AirBus is 2.2 times faster in terms of flying time.
CS1104Performance7 Definitions Performance is in units of things-per-second Bigger is better If we are primarily concerned with response time “X is n times faster than Y” means the speedup n is:
CS1104Performance8 Computer Performance (1/2) Response Time (latency) Time between start and end of an event (execution time). How long does it take for my job to run? How long does it take to execute a job? How long must I wait for the database query? Throughput Total amount of work (or number of jobs) done in a given time. How many jobs can the machine run at once? What is the average execution rate? How much work is getting done?
CS1104Performance9 Computer Performance (2/2) If we upgrade a machine with a new processor, what do we improve? If we add a new machine to the lab what do we improve? Answer: Response Time. Answer: Throughput.
CS1104Performance10 Execution Time (1/3) Elapsed Time counts everything (disk and memory accesses, I/O, etc.) a useful number, but often not good for comparison purposes. CPU time doesn't count I/O or time spent running other programs. can be broken up into system time, and user time. Our focus: user CPU time time spent executing the lines of code that are "in" our program.
CS1104Performance11 Execution Time (2/3) Instead of reporting execution time in seconds, we often use clock cycles (basic time unit in machine). Cycle time (or cycle/clock period) = time between two consecutive rising edges, measured in seconds. Clock rate (or clock frequency) = 1/cycle-time = number of cycles per second (1 Hz = 1 cycle/second). Example: A 200 MHz clock has cycle time of 1/(200x10 6 ) = 5 x 10 -9 seconds = 5 nanoseconds. Clock Cycle time
CS1104Performance12 Execution Time (3/3) Therefore, to improve performance (everything else being equal), you can do the following: Reduce the number of cycles for a program, or Reduce the clock cycle time, or said in another way, Increase the clock rate.
CS1104Performance13 Cycles in a Program (1/2) Could we assume that number of cycles = number of instructions? Or number of cycles is proportional to number of instructions? 1st instruction 2nd instruction 3rd instruction 4th 5th 6th... Clock
CS1104Performance14 Cycles in a Program (2/2) No, the assumption is incorrect. The tasks of different instructions take different amount of time to finish. For example, a Multiple instruction may take more cycles than an Add instruction. Floating-point operations take longer than integer operations. Accessing memory takes more time than accessing registers. Clock
CS1104Performance15 Example 1: Clock Rate Our favorite program runs in 10 seconds on computer A, which has a 400 MHz clock. We are trying to help a computer designer build a new machine B, that will run this program in 6 seconds. The designer can use new (or perhaps more expensive) technology to substantially increase the clock rate, but has informed us that this increase will affect the rest of the CPU design, causing machine B to require 1.2 times as many clock cycles as machine A for the same program. What clock rate should we tell the designer to target at? ANSWER: Let C be the number of clock cycles required for that program. For A: Time = 10 sec. = C * 1/400MHz For B: Time = 6 sec. = (1.2 * C) * 1/clock_rate B Therefore, clock_rate B = 10 * 400 * 1.2 / 6 MHz = 800MHz
CS1104Performance16 Cycles per Instruction (CPI) (1/2) A given program will require some number of instructions (machine instructions) some number of cycles some number of seconds CPI cycle time Recall that different instructions have different number of cycles.
CS1104Performance17 Cycles per Instruction (CPI) (2/2) Invest resources where time is spent! I k = instruction frequency Average cycles per instruction CPI = (CPU Time * Clock Rate) / Instruction Count = Clock Cycles / Instruction Count CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle where
CS1104Performance18 Aspects of CPU Performance (1/2) Performance is determined by execution time. Does any of these variables equal performance? number of cycles to execute program? number of instructions in program? number of cycles per second? average number of cycles per instruction? average number of instructions per second? Answer: No to all. Common pitfall: thinking that one of the variables is indicative of performance when it really isn’t.
CS1104Performance19 Aspects of CPU Performance (2/2) CPU performance depends on: Clock cycle time Hardware technology and organization CPI Organization and ISA Instruction Count ISA and compiler Be careful of the following concepts: Machine ISA and hardware organization Machine cycle time ISA + hardware organization number of cycles for any instruction (this is not average CPI) ISA + compiler + program number of instructions executed Therefore, ISA + Compiler + Program + Hardware organization + Cycle time Total CPU time. CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle
CS1104Performance20 Example 2: CPI Suppose we have two implementations of the same instruction set architecture (ISA). For some program, Machine A has a clock cycle time of 10 ns and a CPI of 2.0. Machine B has a clock cycle time of 20 ns and a CPI of 1.2. What machine is faster for this program, and by how much? ANSWER: Let X be the number of instructions Machine A: time = X * 2.0 * 10 ns Machine B: time = X * 1.2 * 20 ns Performance(A) / Performance(B) = 1.2 * 20 / (2.0 * 10) = 1.2
CS1104Performance21 Example 3: Number of Instructions A compiler designer is deciding between two codes for a particular machine. Based on the hardware implementation, there are three different classes of instructions: Class A, Class B, and Class C, and they require one, two, and three cycles respectively. First code has 5 instructions: 2 of A, 1 of B, and 2 of C. Second code has 6 instructions: 4 of A, 1 of B, and 1 of C. Which code is faster? By how much? What is the CPI for each code? ANSWER: Let T be cycle time. Time(Code1) = (2 * 1 + 1 * 2 + 2 * 3) * T = 10 T Time(Code2) = (4 * 1 + 1 * 2 + 1 * 3) * T = 9 T Time(Code1) / Time(Code2) = 10/9 = 1.1 and Code2 is faster. CPI(Code1) = (2 * 1 + 1 * 2 + 2 * 3) / (2 + 1 + 2) = 2 CPI(Code2) = (4 * 1 + 1 * 2 + 1 * 3) / (4 + 1 + 1) = 1.5
CS1104Performance22 Example 4 (1/4) You are given two machine designs M1 and M2 for performance benchmarking. Both M1 and M2 have the same ISA, but different hardware implementations and compilers. Assuming that the clock cycle times for M1 & M2 are the same, performance study gives the following measurements for the two designs: For design M1: Instruction ClassCPINo. of Instructions Executed A 13,000,000,000,000,000,000 B 22,000,000,000,000,000,000 C 32,000,000,000,000,000,000 D 41,000,000,000,000,000,000 For design M2: A 22,700,000,000,000,000,000 B 31,800,000,000,000,000,000 C 31,800,000,000,000,000,000 D 2 900,000,000,000,000,000
CS1104Performance23 Example 4 (2/4) a)What is the CPI for each machine? Let Y = 1,000,000,000,000,000,000. CPI(M1) = (1*3Y + 2*2Y + 3*2Y + 4*1Y) / (3Y + 2Y + 2Y + 1Y) = 17Y / 8Y = 2.125 CPI(M2) = [(2*3Y+3*2Y+3*2Y+2*1Y)*0.9] / [(3Y+2Y+2Y+1Y)*0.9] = 20Y / 8Y = 2.5 Let C = clock cycle. Time(M1) = 2.125 * (8Y * C) Time(M2) = 2.5 * (0.9 * 8Y * C) = 2.25 * (8Y * C) M1 is faster than M2 by 2.25 / 2.125 = 1.0588 b)Which machine is faster? By how much?
CS1104Performance24 Example 4 (3/4) c)To further improve the performance of the machines, a new compiler technique is introduced. The compiler can simply eliminate all class D instructions from the benchmark program without any side effects. That is, there is no change to the number of class A, B, and C instructions executed in the two machines. With this new technique, which machine is faster? By how much? Let Y = 1,000,000,000,000,000,000 and C = clock cycle. CPI(M1) = (1 * 3Y + 2 * 2Y + 3 * 2Y) / (3Y + 2Y + 2Y) = 13Y / 7Y = 1.857 CPI(M2) = [(2*3Y + 3*2Y + 3*2Y) * 0.9] / [(3Y + 2Y + 2Y) * 0.9] = 18Y / 7Y = 2.571 Time(M1) = 1.857 * (7Y * C) Time(M2) = 2.571 * (0.9 * 7Y * C) = 2.314 * (7Y * C) M1 is faster than M2 by 2.314 / 1.857 = 1.246
CS1104Performance25 Example 4 (4/4) d)Alternatively, to further improve the performance of the machines, a new hardware technique is introduced. The hardware can simply execute all class D instructions in zero times without any side effects. (There is still execution for class D instructions). With this new technique, which machine is faster? By how much? Let Y = 1,000,000,000,000,000,000 and C = clock cycle. CPI(M1) = (1 * 3Y + 2 * 2Y + 3 * 2Y + 0 * Y) / (3Y + 2Y + 2Y + Y) = 13Y / 8Y = 1.625 CPI(M2) = [(2*3Y+3*2Y+3*2Y +0*Y) * 0.9] / [(3Y+2Y+2Y+Y) * 0.9] = 18Y / 8Y = 2.25 Time(M1) = 1.625 * (8Y * C) Time(M2) = 2.25 * (0.9 * 8Y * C) = 2.025 * (8Y * C) M1 is faster than M2 by 2.025 / 1.625 = 1.2461
CS1104Performance26 Key Concepts (1/2) Performance is specific to a particular program. Total execution time is a consistent summary of performance For a given architecture, performance increase comes from: increase in clock rate (without adverse CPI affects) improvement in processor organisation that lowers CPI compiler enhancement that lowers CPI and/or instruction count Pitfall: expecting improvement in one aspect of a machine’s performance to affect the total performance.
CS1104Performance27 Key Concepts (2/2) Basic formula for execution time. Performance factors: CPI, total number of instructions executed, cycle time. Factors affecting their values Calculation of individual values Note: Change of one factor might change more than one component (eg: reduction of instruction types executed) Different performance matrix: what it can tell and what it might mislead people.
CS1104Performance28 Sample Question (1/12) 1.Which of the following statements will affect the program execution time? Compiler technology Instruction set architecture Integration circuit chip technology Application software a)(i) and (iii) b)(ii) and (iii) c)(i), (ii), and (iii) d)All of the above e)None of the above [Answer]
CS1104Performance29 Sample Question (2/12) 2.Execution time can always be improved by: Combining simple instructions into complex instructions in the given ISA Improving the clock speed of the processor Using a new compiler that produces a shorter sequence of program code a)(ii) only b)(i) and (ii) c)(ii) and (iii) d)All of the above e)None of the above [Answer]
CS1104Performance30 Sample Question (3/12) 3.The overall (average) CPI of a given program is: a)decreased by decreasing the total number of instructions executed in a program, assuming all other hardware and software parameters remain the same. b)the sum of the CPIs of the individual types of instructions executed in the program. c)the average of the CPIs of the individual classes of instructions executed in the program. That is, if instruction class A, B, C, D, with corresponding CPI 1, 2, 3, 4, appear in a program, the overall CPI will be (1+2+3+4)/4. d)decreased by doubling the clock speed of the processor, assuming all other hardware and software parameters remain the same. e)None of the above. [Answer]
CS1104Performance31 Sample Question (4/12) 4.Assuming all other hardware and software parameters remain the same, the total number of instructions executed in a program can be changed by: a)changing the ISA of the processor. b)reducing the number of cycles needed to execute each instruction type in a program. c)increasing the number of cycles needed to execute each instruction type in a program. d)doubling the individual CPI of each instruction class executed in a program. e)None of the above. [Answer]
CS1104Performance32 Sample Question (5/12) 5.Execution time can always be improved by: i.changing the instruction set architecture of the processor from a complex one to a simple one. ii.changing the instruction set architecture of the processor from a simple one to a complex one. iii.changing the compiler so that a lower overall CPI value (and different program code sequence) is obtained. a)(i) and (iii) b)(i) only c)(ii) only d)(iii) only e)None of the above [Answer]
CS1104Performance33 Sample Question (6/12) 6.Which of the following affects the overall CPI of a program on a given ISA? i.The CPIs of individual classes of instructions defined in the ISA. ii.The number of instructions executed in the program. iii.The clock speed of the processor. a)(i) only b)(ii) only c)(iii) only d)(i) and (ii) only e)(i), (ii) and (iii) [Answer]
CS1104Performance34 Sample Question (7/12) 7.A program takes 5 seconds to execute on machine MM1 and 10 seconds to execute on machine MM2. Which of the following conclusions must be true? a)The number of instructions executed on MM1 is definitely fewer than that on MM2. b)The overall CPI on MM1 is definitely smaller than that on MM2. c)The clock speed of MM1 is definitely higher than that of MM2. d)All (a), (b), (c). e)None of the above. [Answer]
CS1104Performance35 Sample Question (8/12) 8.To improve performance of a program, state what (increase/decrease/do nothing) you would do to the following, assuming that everything else being equal: a)The number of required cycles. b)The number of instructions executed. c)The clock cycle time. d)The clock rate. [Decrease] [Increase]
CS1104Performance36 Sample Question (9/12) 9.Given a machine at 500 MHz, the measurement of a program execution is as follows: Instr. ClassCPIFrequency A 1 10% B 2 20% C 3 30% D 4 40% What happens to (i) overall CPI, (ii) execution time, if: a)Instruction distribution is: A: 40%, B: 30%, C:20%, D:10%. b)The CPIs of all instruction classes are doubled. Answer increase/decrease/unchanged. a) CPI decreases, execution time decreases. b) CPI increases, execution time increases.
CS1104Performance37 Sample Question (10/12) 10.State whether these statements are true or false: a)Instruction set architecture is mainly about the digital and circuit design of microprocessor. b)My 400MHz PC is always faster than the SoC’s SUN450 system, which runs at 336 MHz. c)CPU execution time of a program is defined as the elapsed time between the program submission to the generation of the result. a) False.b) False.c) False.
CS1104Performance38 Sample Question (11/12) 11.Given a machine at 500 MHz, the measurement of a program execution is as follows: Instr. ClassCPIFrequency A 1 10% B 2 20% C 3 30% D 4 40% What happens to (i) overall CPI, (ii) execution time, if: a)Equal instruction distribution among 4 classes (25% each) b)Clock rate is increased to 800 MHz. Answer increase/decrease/unchanged. a) CPI decreases, execution time decreases. b) CPI unchanged, execution time decreases.
CS1104Performance39 Sample Question (12/12) 12.Given a machine at 500 MHz, the measurement of a program execution is as follows: Instr. ClassCPIFrequency A 1 10% B 2 20% C 3 30% D 4 40% What happens to (i) overall CPI, (ii) instruction count, (iii) clock speed, if: a)A different compiler is used to produce the executable code of a program. b)A new hardware implement of the given ISA is used. Answer changed/unchanged. a) CPI changed, IC changed, CS unchanged. b) CPI changed, IC unchanged, CS changed.