Presentation is loading. Please wait.

Presentation is loading. Please wait.

Performance and Code Tuning Overview

Similar presentations


Presentation on theme: "Performance and Code Tuning Overview"— Presentation transcript:

1 Performance and Code Tuning Overview
CPSC 315 – Programming Studio Spring 2017

2 Is Performance Important?
Performance tends to improve with time HW Improvements Other things can be more important Accuracy Robustness Code Readability Worrying about it can cause problems “More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason – including blind stupidity.” – William A. Wulf

3 So Why Worry About It? Sometimes code performance is critical
Very large-scale problems Inefficiencies that make things seem infeasible The gains in hardware improvement may be coming to an end Or, at least need more tricks to take advantage of “Performance Engineering” will likely be important in making long-term improvements

4 So, How Can We Improve Performance?
First, there are ways to improve performance that don’t involve code tuning Before doing code tuning, you need to know what to tune Finally, you can tune your code Carefully, measuring along the way

5 Performance Increases without Code Tuning
Lower your Standards/Requirements (!!) Asking for more than is needed leads to trouble Example: Return in 1 second Always? On Average? 99% of the time?

6 Performance Increases without Code Tuning
Lower your Standards/Requirements High Level Design The overall program structure can play a huge role

7 Performance Increases without Code Tuning
Lower your Standards/Requirements High Level Design Class/Routine Design Algorithms used have real differences Can have largest effect, especially asymptotically

8 Performance Increases without Code Tuning
Lower your Standards/Requirements High Level Design Class/Routine Design Interactions with Operating System Hidden OS calls within libraries – their performance affects overall code

9 Performance Increases without Code Tuning
Lower your Standards/Requirements High Level Design Class/Routine Design Interactions with Operating System Compiler Optimizations “Automatic” Optimization Getting better and better, but not perfect Different compilers work better/worse

10 Performance Increases without Code Tuning
Lower your Standards/Requirements High Level Design Class/Routine Design Interactions with Operating System Compiler Optimizations Upgrade Hardware Straightforward, if possible…

11 Code Profiling Determine where code is spending time
No sense in optimizing where no time is spent Provide measurement basis Determine whether “improvement” really improved anything Need to take precise measurements

12 Profiling Techniques Profiler – compile with profiling options, and run through profiler Gets list of functions/routines, and amount of time spent in each Use system timer Less ideal Might need test harness for functions Use system-supported real time Only slightly better than wristwatch… Graph results for understanding Multiple profile results: see how profile changes for different input types

13 What Is Tuning? Making small-scale adjustments to correct code in order to improve performance After code is written and working Affects only small-scale: a few lines, or at most one routine Examples: adjusting details of loops, expressions Code tuning can sometimes improve code efficiency tremendously

14 What Tuning is Not Reducing lines of code
Not an indicator of efficient code A guess at what might improve things Know what you’re trying, and measure results Optimizing as you go Wait until finished, then go back to improve Optimizing while programming is often a waste A “first choice” for improvement Worry about other details/design first

15 Common Inefficiencies
Unnecessary I/O operations File access especially slow Paging/Memory issues Can vary by system System Calls Interpreted Languages C/C++/VB tend to be “best” Java about 1.5 times slower PHP/Python about 100 times slower

16 Operation Costs Different operations take different times
Integer division longer than other ops Transcendental functions (sin, sqrt, etc.) even longer Knowing this can help when tuning Vary by language In C++, private routine calls take about twice the time of an integer op, and in Java about half the time.

17 An Example Stealing an example from Charles Leiserson’s Distinguised Lecture on Performance Engineering February 8, 2017 Joint Work with Bradley Kuszmaul and Tao Schardl. Performance Engineering a 4K x 4K Matrix Multiplication Machine: Dual-Socket Intel Xeon E v3 (Haswell) 18 cores 2.9 GHz 60 GB of DRAM

18 Performance Engineering: 4Kx4K Matrix Multipication
Implementation Time (seconds) Straightforward Python Implementation ??

19 Performance Engineering: 4Kx4K Matrix Multipication
Implementation Time (seconds) Straightforward Python Implementation 25,552.48 (>7 hours!) ????

20 Performance Engineering: 4Kx4K Matrix Multipication
Implementation Time (seconds) Straightforward Python Implementation 25,552.48 Straightforward Java Implementation ?

21 Performance Engineering: 4Kx4K Matrix Multipication
Implementation Time (seconds) Straightforward Python Implementation 25,552.48 Straightforward Java Implementation (almost 40 min)

22 Performance Engineering: 4Kx4K Matrix Multipication
Implementation Time (seconds) Straightforward Python Implementation 25,552.48 Straightforward Java Implementation Straightforward C Implementation 542.67

23 Performance Engineering: 4Kx4K Matrix Multipication
Implementation Time (seconds) Straightforward Python Implementation 25,552.48 Straightforward Java Implementation Straightforward C Implementation 542.67 Parallel Loops 69.80

24 Performance Engineering: 4Kx4K Matrix Multipication
Implementation Time (seconds) Straightforward Python Implementation 25,552.48 Straightforward Java Implementation Straightforward C Implementation 542.67 Parallel Loops 69.80 Parallel Divide and Conquer 3.80

25 Performance Engineering: 4Kx4K Matrix Multipication
Implementation Time (seconds) Straightforward Python Implementation 25,552.48 Straightforward Java Implementation Straightforward C Implementation 542.67 Parallel Loops 69.80 Parallel Divide and Conquer 3.80 add Vectorization (streaming the parallel operations) 1.10

26 Performance Engineering: 4Kx4K Matrix Multipication
Implementation Time (seconds) Straightforward Python Implementation 25,552.48 Straightforward Java Implementation Straightforward C Implementation 542.67 Parallel Loops 69.80 Parallel Divide and Conquer 3.80 add Vectorization (streaming the parallel operations) 1.10 add AVX intrinsics (processor commands for vector operations) 0.41

27 Performance Engineering: 4Kx4K Matrix Multipication
Implementation Time (seconds) Straightforward Python Implementation 25,552.48 Straightforward Java Implementation Straightforward C Implementation 542.67 Parallel Loops 69.80 Parallel Divide and Conquer 3.80 add Vectorization (streaming the parallel operations) 1.10 add AVX intrinsics (processor commands for vector operations) 0.41 Strassen’s algorithm 0.38

28 The Lesson from this Example
Code “performance engineering” can make incredible improvements in performance. 67,243 times faster in this case!

29 Remember Code readability/maintainability/etc. is usually more important than efficiency Always start with well-written code, and only tune at the end Measure!


Download ppt "Performance and Code Tuning Overview"

Similar presentations


Ads by Google