Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer Architecture CSE 3322 Web Site crystal.uta.edu/~jpatters/cse3322 Send to Pramod Kumar, with the names and s.

Similar presentations


Presentation on theme: "Computer Architecture CSE 3322 Web Site crystal.uta.edu/~jpatters/cse3322 Send to Pramod Kumar, with the names and s."— Presentation transcript:

1

2 Computer Architecture CSE 3322 Web Site crystal.uta.edu/~jpatters/cse3322 Send email to Pramod Kumar, pxk3008@exchange.uta.edu, with the names and emails of your Four Project team members by Mon Sept 15. If not on a team, send your email address to Pramod.

3 CPI CPI = Clock Cycles / Instruction “Average clock cycles per instruction”

4 CPI CPI = Clock Cycles / Instruction “Average clock cycles per instruction” CPU Time = Instructions x CPI / Clock Rate = Instructions x CPI x Clock Cycle Time

5 CPI CPI = Clock Cycles / Instruction “Average clock cycles per instruction” CPU Time = Instructions x CPI / Clock Rate = Instructions x CPI x Clock Cycle Time Average CPI = SUM of CPI (i) * I(i) for i=1, n Instruction Count

6 CPI Invest Resources where time is Spent! CPI = Clock Cycles / Instruction Count = (CPU Time * Clock Rate) / Instruction Count “Average clock cycles per instruction” CPU Time = Instruction Count x CPI / Clock Rate = Instruction Count x CPI x Clock Cycle Time Average CPI = SUM of CPI (i) * I(i) for i=1, n Instruction Count Average CPI = SUM of CPI(i) * F(i) for i = 1, n F(i) is the Instruction Frequency

7 Suppose we have two implementations of the same instruction set For some program, Machine A has: a clock cycle time of 10 ns. and a CPI of 2.0 Machine B has: a clock cycle time of 20 ns. and a CPI of 1.2 What machine is faster for this program, and by how much? CPI Example

8 Suppose we have two implementations of the same instruction set For some program, Machine A has: CPU Time = I*2.0*10ns=I*20ns a clock cycle time of 10 ns. and a CPI of 2.0 Machine B has: a clock cycle time of 20 ns. and a CPI of 1.2 What machine is faster for this program, and by how much? CPI Example

9 Suppose we have two implementations of the same instruction set For some program, Machine A has: CPU Time = I*2.0*10ns=I*20ns a clock cycle time of 10 ns. and a CPI of 2.0 Machine B has: CPU Time = I*1.2*20ns=I*24ns a clock cycle time of 20 ns. and a CPI of 1.2 What machine is faster for this program, and by how much? CPI Example

10 Suppose we have two implementations of the same instruction set For some program, Machine A has: CPU Time = I*2.0*10ns=I*20ns a clock cycle time of 10 ns. and a CPI of 2.0 Machine B has: CPU Time = I*1.2*20ns=I*24ns a clock cycle time of 20 ns. and a CPI of 1.2 What machine is faster for this program, and by how much? A is 24/20 =1.2 faster than B CPI Example

11 Suppose we have two implementations of the same instruction set For some program, Machine A has: CPU Time = I*2.0*10ns=I*20ns a clock cycle time of 10 ns. and a CPI of 2.0 Machine B has: CPU Time = I*1.2*20ns=I*24ns a clock cycle time of 20 ns. and a CPI of 1.2 What machine is faster for this program, and by how much? A is 24/20 =1.2 faster than B Note: CPI is Smaller for B CPI Example

12 Example (RISC processor) Typical Mix Base Machine (Reg / Reg) OpFreqCyclesF(i)CPI(i)% Time ALU50%1 Load20%5 Store10%3 Branch20%2

13 Example (RISC processor) Typical Mix Base Machine (Reg / Reg) OpFreqCyclesF(i)CPI(i)% Time ALU50%1.5 Load20%5 1.0 Store10%3.3 Branch20%2.4 2.2 = CPI ave

14 Example (RISC processor) Typical Mix Base Machine (Reg / Reg) OpFreqCyclesF(i)CPI(i)% Time ALU50%1.5 Load20%5 1.0 Store10%3.3 Branch20%2.4 2.2 = CPI ave CPU Time(i) = Instr Cnt(i) * CPI(i) * Clk Cycle Time CPU Time Inst Cnt * CPI ave * Clk Cycle Time % Time = F(i) * CPI(i) / CPI ave

15 Example (RISC processor) Typical Mix Base Machine (Reg / Reg) OpFreqCyclesF(i)CPI(i)% Time ALU50%1.523% Load20%5 1.045% Store10%3.314% Branch20%2.418% 2.2 = CPI ave CPU Time(i) = Instr Cnt(i) * CPI(i) * Clk Cycle Time CPU Time Inst Cnt * CPI ave * Clk Cycle Time % Time = F(i) * CPI(i) / CPI ave

16 Example (RISC processor) Typical Mix Base Machine (Reg / Reg) OpFreqCyclesF(i)CPI(i)% Time ALU50%1.523% Load20%5 1.045% Store10%3.314% Branch20%2.418% 2.2 = CPI ave How much faster would the machine be if a better data cache reduced the average load time to 2 cycles?

17 Example (RISC processor) Typical Mix Base Machine (Reg / Reg) OpFreqCyclesF(i)CPI(i)% Time ALU50%1.523% Load20%5 (2) 1.0 (.4)45% Store10%3.314% Branch20%2.418% 2.2 (1.6) How much faster would the machine be if a better data cache reduced the average load time to 2 cycles? 2.2/1.6 = 1.375 CPU Time = Inst Cnt * CPI ave * Clk Cycle Time

18 Example (RISC processor) Typical Mix Base Machine (Reg / Reg) OpFreqCyclesF(i)CPI(i)% Time ALU50%1.523% Load20%5 1.045% Store10%3.314% Branch20%2.418% 2.2 How much faster would the machine be if a better data cache reduced the average load time to 2 cycles? CPI = 1.6 How does this compare with using branch prediction to shave a cycle off the branch time?

19 Example (RISC processor) Typical Mix Base Machine (Reg / Reg) OpFreqCyclesF(i)CPI(i)% Time ALU50%1.523% Load20%5 1.045% Store10%3.314% Branch20%2 (1).4 (.2)18% 2.2 (2.0) How much faster would the machine be if a better data cache reduced the average load time to 2 cycles? CPI = 1.6 How does this compare with using branch prediction to shave a cycle off the branch time? CPI = 2.0

20 Example (RISC processor) Typical Mix Base Machine (Reg / Reg) OpFreqCyclesF(i)CPI(i)% Time ALU50%1.523% Load20%5 1.045% Store10%3.314% Branch20%2.418% 2.2 How much faster would the machine be if a better data cache reduced the average load time to 2 cycles? CPI = 1.6 How does this compare with using branch prediction to shave a cycle off the branch time? CPI = 2.0 What if two ALU instructions could be executed at once?

21 Example (RISC processor) Typical Mix Base Machine (Reg / Reg) OpFreqCyclesF(i)CPI(i)% Time ALU50%1 (.5).5 (.25)23% Load20%5 1.045% Store10%3.314% Branch20%2.418% 2.2 (1.95) How much faster would the machine be if a better data cache reduced the average load time to 2 cycles? CPI = 1.6 How does this compare with using branch prediction to shave a cycle off the branch time? CPI = 2.0 What if two ALU instructions could be executed at once? CPI=1.95

22 A compiler designer is trying to decide between two code sequences for a particular machine. Based on the hardware implementation, there are three different classes of instructions: Class A has 1 cycle Class B has 2 cycles Class C has 3 cycles The first code sequence has 5 instructions: 2 of A, 1 of B, and 2 of C The second sequence has 6 instructions: 4 of A, 1 of B, and 1 of C. Which sequence will be faster? How much? What is the CPI for each sequence?

23 A compiler designer is trying to decide between two code sequences for a particular machine. Based on the hardware implementation, there are three different classes of instructions: Class A has 1 cycle Class B has 2 cycles Class C has 3 cycles The first code sequence has 5 instructions: 2 of A, 1 of B, and 2 of C 2*1+1*2+2*3 = 10 The second sequence has 6 instructions: 4 of A, 1 of B, and 1 of C. 4*1+1*2+1*3 = 9 Which sequence will be faster? How much? What is the CPI for each sequence?

24 A compiler designer is trying to decide between two code sequences for a particular machine. Based on the hardware implementation, there are three different classes of instructions: Class A has 1 cycle Class B has 2 cycles Class C has 3 cycles The first code sequence has 5 instructions: 2 of A, 1 of B, and 2 of C 2*1+1*2+2*3 = 10 The second sequence has 6 instructions: 4 of A, 1 of B, and 1 of C. 4*1+1*2+1*3 = 9 Which sequence will be faster? How much? 10 / 9 = 1.11 What is the CPI for each sequence?

25 A compiler designer is trying to decide between two code sequences for a particular machine. Based on the hardware implementation, there are three different classes of instructions: Class A has 1 cycle Class B has 2 cycles Class C has 3 cycles The first code sequence has 5 instructions: 2 of A, 1 of B, and 2 of C 2*1+1*2+2*3 = 10 The second sequence has 6 instructions: 4 of A, 1 of B, and 1 of C. 4*1+1*2+1*3 = 9 Which sequence will be faster? How much? 10 / 9 = 1.11 What is the CPI for each sequence? 10/5 = 2 9/6 = 1.5

26 MIPS = Instruction Count Execution time x 10 6 A popular performance metric is MIPS, the number of millions of instructions per second. For a given program,

27 MIPS = Instruction Count Execution time x 10 6 A popular performance metric is MIPS, the number of millions of instructions per second. For a given program, 1.Cannot compare if instruction set is different

28 MIPS = Instruction Count Execution time x 10 6 A popular performance metric is MIPS, the number of millions of instructions per second. For a given program, 1.Cannot compare if instruction set is different 2.Highly dependent on the program

29 MIPS = Instruction Count Execution time x 10 6 A popular performance metric is MIPS, the number of millions of instructions per second. For a given program, 1.Cannot compare if instruction set is different 2.Highly dependent on the program 3.Can be inversely proportional to performance

30 Two different compilers are being tested for a 100 MHz. machine with three different classes of instructions: Class A has 1 cycle,Class B has 2 cycles, Class C has 3 cycles Instruction counts ( billions) Code fromABC Compiler 1511 Compiler 21011 Which sequence will be faster according to MIPS? Which sequence will be faster according to execution time? MIPS example

31 Two different compilers are being tested for a 100 MHz. machine with three different classes of instructions: Class A Class B Class C CPI 1 2 3 Instruction counts ( billions) Code fromABCTotal Compiler 1 511 7 Compiler 21011 12 CPU cyclesExec TimeMIPS Compiler 1 5+1x2+1x3=10 billion Compiler 2 MIPS example

32 Two different compilers are being tested for a 100 MHz. machine with three different classes of instructions: Class A Class B Class C CPI 1 2 3 Instruction counts ( billions) Code fromABCTotal Compiler 1 511 7 Compiler 21011 12 CPU cyclesExec TimeMIPS Compiler 1 10 billion Compiler 2 15 billion MIPS example

33 Two different compilers are being tested for a 100 MHz. machine with three different classes of instructions: Class A Class B Class C CPI 1 2 3 Instruction counts ( billions) Code fromABCTotal Compiler 1 511 7 Compiler 21011 12 CPU cyclesExec TimeMIPS Compiler 1 10 billion10 10 x10 -8 =100 Compiler 2 15 billion MIPS example

34 Two different compilers are being tested for a 100 MHz. machine with three different classes of instructions: Class A Class B Class C CPI 1 2 3 Instruction counts ( billions) Code fromABCTotal Compiler 1 511 7 Compiler 21011 12 CPU cyclesExec TimeMIPS Compiler 1 10 billion 100 sec Compiler 2 15 billion 150 sec MIPS example

35 Two different compilers are being tested for a 100 MHz. machine with three different classes of instructions: Class A Class B Class C CPI 1 2 3 Instruction counts ( billions) Code fromABCTotal Compiler 1 511 7 Compiler 21011 12 CPU cyclesExec TimeMIPS Compiler 1 10 billion 100 sec7x10 3 /100 Compiler 2 15 billion 150 sec MIPS example

36 Two different compilers are being tested for a 100 MHz. machine with three different classes of instructions: Class A Class B Class C CPI 1 2 3 Instruction counts ( billions) Code fromABCTotal Compiler 1 511 7 Compiler 21011 12 CPU cyclesExec TimeMIPS Compiler 1 10 billion 100 sec 70 Compiler 2 15 billion 150 sec 12x10 3 /150 MIPS example

37 Two different compilers are being tested for a 100 MHz. machine with three different classes of instructions: Class A Class B Class C CPI 1 2 3 Instruction counts ( billions) Code fromABCTotal Compiler 1 511 7 Compiler 21011 12 CPU cyclesExec TimeMIPS Compiler 1 10 billion 100 sec 70 Compiler 2 15 billion 150 sec 80 MIPS example

38 Performance best determined by running a real application –Use programs typical of expected workload –Or, typical of expected class of applications e.g., compilers/editors, scientific applications, graphics, etc. Benchmarks

39 Performance best determined by running a real application –Use programs typical of expected workload –Or, typical of expected class of applications e.g., compilers/editors, scientific applications, graphics, etc. Small benchmarks –nice for architects and designers –easy to standardize –can be abused Benchmarks

40 SPEC (System Performance Evaluation Cooperative) Benchmarks

41 SPEC (System Performance Evaluation Cooperative) –companies have agreed on a set of real program and inputs Benchmarks

42 SPEC (System Performance Evaluation Cooperative) –companies have agreed on a set of real program and inputs –can still be abused Benchmarks

43 SPEC (System Performance Evaluation Cooperative) –companies have agreed on a set of real program and inputs –can still be abused –valuable indicator of performance (and compiler technology) Benchmarks

44 SPEC95 Eighteen application benchmarks (with inputs) reflecting a technical computing workload

45 SPEC95 Eighteen application benchmarks (with inputs) reflecting a technical computing workload Eight integer applications –go, m88ksim, gcc, compress, li, ijpeg, perl, vortex

46 SPEC95 Eighteen application benchmarks (with inputs) reflecting a technical computing workload Eight integer applications –go, m88ksim, gcc, compress, li, ijpeg, perl, vortex Ten floating-point intensive applications –tomcatv, swim, su2cor, hydro2d, mgrid, applu, turb3d, apsi, fppp, wave5

47 SPEC95 Eighteen application benchmarks (with inputs) reflecting a technical computing workload Eight integer applications –go, m88ksim, gcc, compress, li, ijpeg, perl, vortex Ten floating-point intensive applications –tomcatv, swim, su2cor, hydro2d, mgrid, applu, turb3d, apsi, fppp, wave5 Must run with standard compiler flags –eliminate special undocumented incantations that may not even generate working code for real programs

48 SPEC fp

49 Execution Time After Improvement = Execution Time Unaffected + ( Execution Time Affected / Amount of Improvement ) Amdahl's Law

50 Execution Time After Improvement = Execution Time Unaffected + ( Execution Time Affected / Amount of Improvement ) Example: Suppose a program runs in 100 seconds on a machine, with multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster? Amdahl's Law

51 Execution Time After Improvement = Execution Time Unaffected + ( Execution Time Affected / Amount of Improvement ) Example: Suppose a program runs in 100 seconds on a machine, with multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster? Improved Time = (100 – 80) + 80/n = 100/4 Amdahl's Law

52 Execution Time After Improvement = Execution Time Unaffected + ( Execution Time Affected / Amount of Improvement ) Example: Suppose a program runs in 100 seconds on a machine, with multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster? Improved Time = (100 – 80) + 80/n = 100/4 20 + 80/n = 25 80 = 5n ; n = 16 Amdahl's Law

53 Execution Time After Improvement = Execution Time Unaffected + ( Execution Time Affected / Amount of Improvement ) Example: Suppose a program runs in 100 seconds on a machine, with multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster? How about making it 5 times faster? Amdahl's Law

54 Execution Time After Improvement = Execution Time Unaffected + ( Execution Time Affected / Amount of Improvement ) Example: Suppose a program runs in 100 seconds on a machine, with multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster? How about making it 5 times faster? Improved Time = (100 –80) + 80/n = 100/5 Amdahl's Law

55 Execution Time After Improvement = Execution Time Unaffected + ( Execution Time Affected / Amount of Improvement ) Example: Suppose a program runs in 100 seconds on a machine, with multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster? How about making it 5 times faster? Improved Time = (100 –80) + 80/n = 100/5 20 + 80/n = 20 80/n = 0 Impossible! Amdahl's Law

56 Performance is specific to a particular program/s –Total execution time is a consistent summary of performance Remember

57 Performance is specific to a particular program/s –Total execution time is a consistent summary of performance For a given architecture performance increases come from: –increases in clock rate (without adverse CPI affects) Remember

58 Performance is specific to a particular program/s –Total execution time is a consistent summary of performance For a given architecture performance increases come from: –increases in clock rate (without adverse CPI affects) –improvements in processor organization that lower CPI Remember

59 Performance is specific to a particular program/s –Total execution time is a consistent summary of performance For a given architecture performance increases come from: –increases in clock rate (without adverse CPI affects) –improvements in processor organization that lower CPI –compiler enhancements that lower CPI and/or instruction count Remember


Download ppt "Computer Architecture CSE 3322 Web Site crystal.uta.edu/~jpatters/cse3322 Send to Pramod Kumar, with the names and s."

Similar presentations


Ads by Google