Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Lecture 6 Performance Measurement and Improvement.

Similar presentations


Presentation on theme: "1 Lecture 6 Performance Measurement and Improvement."— Presentation transcript:

1

2 1 Lecture 6 Performance Measurement and Improvement

3 2 How to make the code faster Measurement and Profiling Hot Spots Practical Hints

4 3 Rationale for this unit This lecture is about making programs run fast. Usually speed is not the most important concern while writing a program. The professional programmer is usually most concerned with making a program that is easy to write, debug, and maintain. A programmer is not just coding.

5 4 Reason on simple program (1) A correct program, even if is slow, computes right answers faster than a program that is not. It is often better to use a simple but slow algorithm. A program that is finished computes right answers much faster than a program that is not. Fast programs often take much more time to develop, and they are useless until they are finished. Simple but fast program

6 5 Reason on simple program (2) Computers’ performance is double in speed every 18 months. Computer technology changes so fast that improvements in speed can often be obtained simply by waiting for the next generation of hardware. Speed improvements of less than a factor of two are barely noticeable to users in an interactive setting.

7 6 Procedure of developing a program A slow but correct Program Modify the program to make it faster

8 7 Measurement and Profiling First, how to measure program’s performance What to Measure (execution speed) Timing Mechanisms (use wall clock, such as your watch)

9 8 What to Measure (CPU time and Wall clock) The most common thing to measure is CPU time. CPU time is the time a process spends executing instructions. It does not count any time spent executing other programs or just waiting.

10 9 What to Measure (Wall clock) An alternative is to measure real time or "wall clock time“ This is the time an ordinary clock on the wall or a wrist watch shows. The difference between CPU time and wall time can give some indication of the time spent waiting for I/O. Wall time CPU time I/O time

11 10 CPU time It can be divided between user time, the time spent directly executing your program code, and system time, the time spent by the operating system on behalf of your program

12 11 Timing Mechanisms There are two ways to measure the timing behavior of a program. The most obvious is direct measurement with a timer (wall clock – difference between start and end times.) An alternative to using timers directly is to use statistical sampling. A timer periodically interrupts the program and records the program counter or increments a counter. (profiling)

13 12 High-Resolution on Pentium Systems Typical operating system clocks are not very precise because they rely on hardware to interrupt the processor every clock period. The operating system then increments a counter Intel Pentium processors (among others) have a very high-speed internal 64-bit counter that can be accessed by special instructions.

14 13 Profiling – to show the profile

15 14 System Monitoring - example

16 15 Principles - Performance The 80/20 Rule – It means 80% of the CPU time is spent in 20% of the program. In this case, you can have better performance by looking at this 20%. Amdahl's Law – for parallel processing, the performance is limited by sequential part of the program.

17 16 Explanation Suppose the program really spends 80% of its time in one spot, and suppose you can rewrite this spot to take a negligible mount of time. The program will now execute in 20% of its original time, meaning that it now runs 5 times as fast.

18 17 Example of 80/20: 10% on one module means 2% as a whole A module consists of 5 modules 20 ms 18 ms 20 ms

19 18 Example of 80/20: 10% on one means 5% as a whole A module consists of 5 modules 10 ms 50 ms 10 ms 45 ms 10 ms Conclusion: focus on module with more CPU time

20 19 Example – Before enhancement

21 20 Example – After enhancement Faster From 24222 to 7471

22 21 Example – a simple for loop #include void main() { for (int i = 0; i < 1000; i++) printf("The value is %d \n", i, i^2); }

23 22 Example – Result of a simple for loop – total time is 509 ms, print i, i^i

24 23 Example – Result of a simple for loop – total time is 533 ms, print i, i*i*i – 4.7% difference

25 24 Procedure (1) – setting

26 25 Procedure (2) – enable profiling

27 26 Procedure (3) – rebuild

28 27 Procedure (4) – run with profiling

29 28 Example – a simple while loop #include void main() { int i = 0; while (i < 1000) { printf("The value is %d \n", i, i^2); i++; }

30 29 Example – result in million second

31 30 Example with a sub-routine

32 31 Example with a sub-routine Main() subroutine

33 32 A program that can be used to determine Mega flop // This is matrix multiplication #include void main(){ float a[250][250], b[250][250], c[250][250]; int i, j, k; for (i = 0; i< 250; i++) for (j = 0; j < 250; j++) for (k =0; k <250; k++) c[i][j] += a[i][k] * b[k][j]; // matrix multiplication }

34 33 Performance is 349ms

35 34 Determination of Mega Flop The time it takes for my machine is 349ms. This program involves 250^3 steps including two floating point operations, an add and a multiply 250 x 250 x 250 = 15625000. The performance for this loop is 15625000/349ms = 15.625 x 10^6 /0.349 s = 44 MFLOPs (mega floating point operation). Note that for super computer, the value is about 1000 MFLOPs. You can try your computer at home to determine your machine’s performance.

36 35 Same output but change the program #include // this program uses a temporary location t // to store the value void main(){ float a[250][250], b[250][250], c[250][250]; int i, j, k; float r = 0.0; for (i = 0; i< 250; i++){ for (j = 0; j < 250; j++) { for (k =0; k <250; k++) { r += a[i][k] * b[k][j]; //this is matrix multiplication } c[i][j] = r; } } }

37 36 Same machine – 254ms, why? This is related to the cache memory effect, as the data is stored in cache. This will be explained later.

38 37 Summary It is better to write a simple but fast program. The procedure is to write a program that works, then makes it faster. There is a rule called 80/20 which means 80% of CPU time spends on 20% of program. You should focus on these 20%. To measure the performance – Profiling To determine which causes the delay.


Download ppt "1 Lecture 6 Performance Measurement and Improvement."

Similar presentations


Ads by Google