Execution time Execution Time (processor-related) = IC x CPI x T

Execution time Execution Time (processor-related) = IC x CPI x T
IC = instruction count CPI = average number of system clock periods to execute an instruction T = clock period

Review

CS501 Advanced Computer Architecture
Lecture03 Dr.Noor Muhammad Sheikh

data transfer instructions
Example Consider two SRC programs having three types of instructions given as follows Number of .. Program 1 Program 2 data transfer instructions 2 1 control instructions 5 ALSU Instructions Compare both the programs for the following parameters Instruction count Speed of execution

Example contd.. Instruction count IC. IC for program 1= 2+2+2=6
For execution time we can use the following SRC specifications. ET = IC x CPI x T ET1= (2x2)+(2x3)+(2x4) = 18 ET2 =(5x2)+(1x3)+(1x4) =17 Instruction Type CPI Control 2 ALSU 3 Data Transfer 4 Note: Since both programs are executing on the same machine, the T factor can be ignored while calculating ET.

lar r6,mpy ;load address of mpy lar r7, next ;load address of next
Problem: Consider the following SRC code segments for implementing the operation a=b+5c. Find which one is more efficient in terms of instruction count and execution time. Program 1: Multiplication by using repeated addition in a for loop org 0 a: .dw 1 b: .dw 1 c: .dw 1 .org 80 la r5, ; load value of loop lar r6,mpy ;load address of mpy lar r7, next ;load address of next ld r2, b ; load contents of b ld r3, c ; load contents of c la r4, ;load 0 in r4 mpy: brzr r7,r ; jump to next after 5 iterations add r4,r4,r ;r4 contains r4+c addi r5,r5, ; decrement index br r ; loop again next: add r4,r4,r ; r4 contains sum of b and 5c st r4, a ;store at address a stop

Problem: Consider the following two SRC code segments for implementing the operation a=b+5c. Find which one is more efficient in terms of instruction count and execution time. Program 2: Multiplication using sub-routine call .org 0 a: .dw 1 b: .dw 1 c: .dw 1 .org 80 lar r1,mpy ;load address of mpy in r1 ld r2, b ; load contents of b in r2 la r3, ; load index in r3 ld r4,c ; load contents of c in r4 brl r5, r ; r5 contains PC add r2,r2,r7 ; r2 contains sum b+5c st r2, a stop mpy: la r7, ;r7 contains zero lar r8,again ;r8 contain again address again: brzr r5,r ;exit loop when index is add r7,r7,r4 ; r7 contains r7+c addi r3,r3, ; decrement index br r8

Solution The instructions in both programs can be divided into 3
types and the respective count of each type is Number of.. Program 1 Program 2 Data transfer instructions 7 Control instructions 3 4 ALSU instructions IC for program 1 = = 13 IC for program 2 = = 14

Solution contd.. For execution time, consider the following SRC
specifications. ET = IC x CPI x T ET1= (7x4)+(3x2)+(3x3) = 43T ET2= (7x4)+(4x2)+(3x3) = 45T Conclusion: Program 1 runs faster than program 2 as obvious from the execution time of both. Instruction Type CPI Control 2 ALSU 3 Data Transfer 4

MIPS Millions of Instructions Per Second = IC / (ET x 106)
Capability of different instructions varies from machine to machine, eg. RISC machines have simpler instructions, so the same job will require more instructions Was popular when the VAX 11/780 was treated as a reference – late 70s and early 80s

MIPS as a performance metric
MIPS is inversely proportional to execution time, ET= IC / (MIPS x 106 )

Example Consider a machine having a 100 MHz clock and three
instruction types with following parameters. Now suppose that two different compilers generate code for the same program. The instruction count for each is given as follows Instruction Type CPI Control 2 ALSU 3 Data Transfer 4 IC in millions Code from compiler 1 Code from compiler 2 Control 5 10 ALSU 1 Data Transfer

Compare the two codes according to MIPS and according to execution time.
Solution: First we find the CPI for both code sequences Since CPI = clock cycles for each type of instruction / IC CPI1= (5x2 + 1x3 + 1x4)/ 7 = 2.43 CPI2= (10x2 +1x3 + 1x4)/12 = 2.25 As MIPS= Clock Rate/ (CPI x 106 ) MIPS1= 100 x 106 / (2.43 x 106) = 41.15 MIPS2=100 x 106 / (2.25 x 106) = 44.44 Hence the code generated by compiler 2 has higher MIPS Rating.

First we find the CPI for both code sequences
Compare the two codes according to MIPS and according to execution time. Solution: First we find the CPI for both code sequences Since CPI = clock cycles for each type of instruction / IC CPI1= (5x2 + 1x3 + 1x4)/ 7 = 2.43 CPI2= (10x2 +1x3 + 1x4)/12 = 2.25 As MIPS= Clock Rate/ (CPI x 106 ) MIPS1= 100 x 106 / (2.43 x 106) = 41.15 MIPS2=100 x 106 / (2.25 x 106) = 44.44 Hence the code generated by compiler 2 has higher MIPS Rating. As MIPS = IC / (ET x 106) MIPS= (IC x clock rate)/ ( IC x CPI x 106) = Clock rate/(CPI x 106)

Solution contd.. Since ET = IC / (MIPS x 106)
ET1= (7 x 106) / (41.15 x 106) = 0.17 seconds ET2= (12 x 106) / ( x 106) = 0.27 seconds Hence code sequence 1 is much more efficient in terms of execution time.

MFLOPS Millions of FLoating point Operations Per Second
Using FP operations makes more sense to some compared to using just any instructions Results vary from FP op to FP op Better compared to MIPS because of two reasons:

2 reasons FP ops are complex, and therefore, provide a better picture of the hardware capabilities on which they are run Overheads (get operands, store results, etc. ) are effectively lumped with the FP ops they support

*** The name is a play on the word Whetstone
Dhrystones *** Dhrystone is a general “integer performance” benchmark test originally developed by Reinhold Weicker in 1984. Small program; less than 100 HLL statements Compiles to about 1 to 1.5 Kb of code *** The name is a play on the word Whetstone

Disadvantages of using Whetstones and Dhrystones
Both Whetstones and Dhrystones are now considered obsolete because of the following reasons. Small, fit in cache Obsolete instruction mix Prone to compiler tricks Difficult to reproduce results Uncontrolled source code

SPEC System Performance Evaluation Cooperative
(SPEC) was founded in October, 1988, by Apollo, Hewlett-Packard, MIPS Computer Systems and SUN Microsystems Latest version is SPEC CPU2000

SPEC The standard SPEC benchmark suite includes: A compiler
A Boolean minimization program A spreadsheet program A number of other programs that stress arithmetic processing speed It uses a simple metric, elapsed time, to measure performance of competing machines Machine independent code is used for fair comparison

Advantages It provides for ease of publication.
Each benchmark carries the same weight. SPECratio is dimensionless. It is not unduly influenced by long running programs. It is relatively immune to performance variation on individual benchmarks. It provides a consistent and fair metric.

Execution time Execution Time (processor-related) = IC x CPI x T

Similar presentations

Presentation on theme: "Execution time Execution Time (processor-related) = IC x CPI x T"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Execution time Execution Time (processor-related) = IC x CPI x T

Similar presentations

Presentation on theme: "Execution time Execution Time (processor-related) = IC x CPI x T"— Presentation transcript:

Similar presentations

About project

Feedback