# Performance Evaluation of Architectures Vittorio Zaccaria.

## Presentation on theme: "Performance Evaluation of Architectures Vittorio Zaccaria."— Presentation transcript:

Performance Evaluation of Architectures Vittorio Zaccaria

Vittorio Zaccaria, Architectures - 2000 Performance Evaluation From the client perspective: response time (or latency): time to run the task. From the server perspective: Throughput (or bandwidth): tasks executed per second.

Vittorio Zaccaria, Architectures - 2000 Speedup X is n% faster than Y if: ExTime(y) Speedup(x,y)= -------------- = 1+n/100 ExTime(x)

Vittorio Zaccaria, Architectures - 2000 Performance and Speedup Performance(A)=1/ExTime(A). Speedup(x,y)= Performance(x)/Performance(y)

Vittorio Zaccaria, Architectures - 2000 Excercise: A executes a task in 10 secs. B executes the same task in 15 secs What is true? 1) A is 50% faster than B 2) A is 33% faster than B

Vittorio Zaccaria, Architectures - 2000 Excercise (15 min) Linpack and Dhrystone benchmarks on several VAX models: ModelYearLinpack ExTime Dhrystone ExTime VAX-11/78019784.905.69 VAX-860019851.431.35 VAX-855019870.6950.96

Vittorio Zaccaria, Architectures - 2000 Excercise: Calculate: In the Linpack case: Total speedup and average per-year speedup from VAX8600 to VAX780 The same for VAX8550 and VAX8600 In the Dhrystone case: Total speedup and average per-year speedup from VAX8600 to VAX780 The same for VAX8550 and VAX8600

Vittorio Zaccaria, Architectures - 2000 Excercise speedup Average per Year speedup

Vittorio Zaccaria, Architectures - 2000 Amdahl's Law

Vittorio Zaccaria, Architectures - 2000 Amdahl’s Law ExTime new = ExTime old x (1 - Fraction enhanced ) + Fraction enhanced Speedup overall = ExTime old ExTime new Speedup enhanced = 1 (1 - Fraction enhanced ) + Fraction enhanced Speedup enhanced If speedup-enhanced goes to infinity, speedup-oveall reaches 1/(1-fraction_enhanced)

Vittorio Zaccaria, Architectures - 2000 Excercise on Amdhal’s Law Floating point instructions improved to run 2X; but only 10% of actual instructions are FP Speedup overall = ?

Vittorio Zaccaria, Architectures - 2000 Excercise on Amdhal’s Law Speedup overall = 1 0.95 =1.053 ExTime new = ExTime old x (0.9 +.1/2) = 0.95 x ExTime old Solution:

Vittorio Zaccaria, Architectures - 2000 2 nd Excercise on Amdhal’s Law Suppose to improve the CPU speed 5X (with a 5X cost) Suppose that the CPU is used 50% of the time and that the base CPU cost is 1/3 of the entire system It is worth to upgrade the CPU? Compare speedup and costs!

Vittorio Zaccaria, Architectures - 2000 2 nd Excercise on Amdhal’s Law Speedup=1/(0.5+0.5/5)=1.67 Increased= (2/3)+(1/3)*5=2.33  It is not worth to upgrade the CPU!

Vittorio Zaccaria, Architectures - 2000 Performance Indexes Response time = latency due to the completion of a task including disk accesses, memory accesses, I/O Activity and other parallel tasks. CPU time = does not include I/O wait time and corresponds to CPU user time and the CPU system time (OS)

Vittorio Zaccaria, Architectures - 2000 CPU time CPUtime(P)= Clock Cycles needed to exec P ------------------------------------- clock frequency

Vittorio Zaccaria, Architectures - 2000 Average CPI The average Clock Cycles per Instruction (CPI) can be defined as: clock cycles needed to exec. P CPI(P)= --------------------------------------- number of instructions CPUtime= Tclock*CPI*Ninst = (CPI*Ninst)/f

Vittorio Zaccaria, Architectures - 2000 Aspects of CPU performance CPU time= Seconds= Instructions x Cycles x Seconds Program Program Instruction Cycle CPU time= Seconds= Instructions x Cycles x Seconds Program Program Instruction Cycle

Vittorio Zaccaria, Architectures - 2000 Aspects of CPU performance The CPI can vary among instructions: CPI_i is the number of clock cycles needed by instruction type i IC_i is the number of times that instruction i is executed. CPU time =CycleTime * Σ CPI * IC i = 1 n i i

Vittorio Zaccaria, Architectures - 2000 Overall CPI The overall CPI can be expressed as (CPU clock cycles)/Instructions: CPI = Σ CPI i *(I i / instructions) i = 1 n Invest Resources where time is Spent!

Vittorio Zaccaria, Architectures - 2000 Excercise Base Machine (Reg / Reg) OpFreqCycles ALU50%1 Load20%5 Store10%3 Branch20%2 A RISC processor shows the following statistics: Calculate the average CPI and the speedup w.r.t.: The same machine with an improved D\$ (Load Cycles=2) The same machine with a branch CPI=1 The same machine with 2 ALUs working in parallel.

Vittorio Zaccaria, Architectures - 2000 Solution Average CPI: 0.5x1+0.2x5+0.1x3+0.2x2=2.2 Use Amdhal’s law to compute overall speedup: Cache improved Speedup: 1.13 Branch improved Speedup: 1.11 ALU improved Speedup: 1.33

Vittorio Zaccaria, Architectures - 2000 Excercise Procedure calls in architecture A are very expensive. Suppose to introduce a new architecture B similar to A such that: A has a clock 5% faster than B. The fraction of loads/stores of A is 30%. B executes 30% loads/stores less than A Loads/stores require 1 clock cycle. Compare CPU times of A and B.

Vittorio Zaccaria, Architectures - 2000 Solution Number of instr. of B NB = [1-(0.3x0.3)]*NA=0.9*NA Clock Period of B: TB=TA*1.05 CPUtimeA=1*NA*TA CPUtimeB=0.9*NA*TA*1.05*1 =0.945*CPUtimeA

Vittorio Zaccaria, Architectures - 2000 MIPS MIPS= millions of instructions per second. number of instructions frequency of the clock ------------------------------------ = -------------------------------- execution time(in sec) * 10^6 CPI * 10^6

Vittorio Zaccaria, Architectures - 2000 MIPS (cont.) Problem: depends heavily on the ISA. Difficult to compare different ISAs It depends on the program It can be the inverse of the performance!! A complex instruction set can have a MIPS lower than a simple instruction set but can execute in less time programs.

Vittorio Zaccaria, Architectures - 2000 Relative MIPS Relative MIPS of an architecture A: TCPU_A ------------------ x MIPS_reference_arch TCPU_reference_arch In the 80’s the reference architecture was the VAX_11/780