Presentation is loading. Please wait.

Presentation is loading. Please wait.

Visual C++ 2005 New Optimizations Ayman Shoukry Program Manager Visual C++ Microsoft Corporation.

Similar presentations


Presentation on theme: "Visual C++ 2005 New Optimizations Ayman Shoukry Program Manager Visual C++ Microsoft Corporation."— Presentation transcript:

1 Visual C++ 2005 New Optimizations Ayman Shoukry Program Manager Visual C++ Microsoft Corporation

2 How can your application run faster? ► Maximize optimization for each file. ► Whole Program Optimization (WPO) goes beyond individual files. ► Profile Guided Optimization (PGO) specializes optimizations specifically for your application. ► New Floating Point Model. ► OpenMP ► 64bit Code Generation.

3 Maximum Optimization for Each File ► Compiler optimizes each source code file to get best runtime performance  The only type optimization available in Visual C++ 6 ► Visual C++ 2005 has better optimization algorithms  Specialized support for newer processors such as Pentium 4  Improved speed and better precision of floating point operations  New optimization techniques like loop unrolling

4 Whole Program Opitmization ► Typically Visual C++ will optimize programs by generating code for object files separately ► Introducing whole program optimization  First introduced with Visual C++ 2002 and has since improved  Compiler and linker set with new options (/GL and /LTCG)  Compiler has freedom to do additional optimizations ► Cross-module inlining ► Custom calling conventions  Visual C++ 2005 supports this on all platforms  Whole program optimizations is widely used for Microsoft products.

5 Profile Guided Optimization ► Static analysis leaves many open optimization questions for the compiler, leading to conservative optimizations ► Visual C++ programs can be tuned for expected user scenarios by collecting information from running application ► Introducing profile guided optimization  Optimizing code by using program in a way how its customer use it  Runs optimizations at link time like whole program optimization  Available in Visual Studio 2005  Widely adopted in Microsoft if (p != NULL) { /* Perform action with p */ } else { /* Error code */ } Is it common for p to be NULL? If it is not common for p to be NULL, the error code should be collected with other infrequently used code

6 PGO: Instrumentation ► We instrument with “probes” inserted into the code ► Two main types of probes  Value probes ► Used to construct histogram of values  Count (simple/entry) probes ► Used to count number of times a path is taken ► We try to insert the minimum number of probes to get full coverage  Minimizes the cost of instrumentation

7 PGO Optimizations ► Switch expansion ► Better inlining decisions ► Cold code separation ► Virtual call speculation ► Partial inlining

8 Compile with /GL & Optimizations On (e.g. /O2) Source Object files Instrumented Image Scenarios Output Profile data Object files Link with /LTCG:PGI Instrumented Image Profile data Object files Link with /LTCG:PGO Optimized Image Profile Guided Optimization

9 PGO: Inlining Sample ► Profile Guided uses call graph path profiling. foo bat barbaza

10 PGO: Inlining Sample (Cont) 100 foo bat 2050 barbaz 15 bar baz ► Profile Guided uses call graph path profiling. a 1075 bar baz 15

11 PGO – Inlining Sample (cont) foo bat 20125 barbaz 100 15 barbaz ► Inlining decisions are made at each call site. a 10 15

12 PGO – Switch Expansion if (i == 10) goto default; switch (i) { case 1: … case 2: … case 3: … default:… } Most frequent values are pulled out. switch (i) { case 1: … case 2: … case 3: … default:… } // 90% of the // time i = 10; ►

13 PGO – Code Separation A CB D 100 10 A B C D Default layout A B C D Optimized layout Basic blocks are ordered so that most frequent path falls through.

14 PGO – Virtual Call Speculation class Foo:Base{ … void call(); } class Bar:Base { … void call(); } class Base{ … virtual void call(); } void Bar(Base *A) { … while(true) { … A->call(); … } void Func(Base *A) { … while(true) { … if(type(A) == Foo:Base) { // inline of A->call(); } else A->call(); … } The type of object A in function Func was almost always Foo via the profiles

15 PGO – Partial Inlining Basic Block 1 Cond Cold CodeHot Code More Code

16 PGO – Partial Inlining (cont) Basic Block 1 Cond Cold CodeHot Code More Code Hot path is inlined, but NOT the cold

17 Demo Optimizing applications with VC++ 2005

18 New Floating Point Model ► /Op made your code run slow  No intermediate switch ► New Floating Point Model  /fp:fast  /fp:precise (default)  /fp:strict  /fp:except

19 /fp:precise ► The default floating point switch ► Performance and Precision ► IEEE Conformant ► Round to the appropriate precision  At assignments, casts and function calls

20 /fp:fast ► When performance matters most ► You know your application does simple floating point operations ► What can /fp:fast do?  Association  Distribution  Factoring inverse  Scalar reduction  Copy propagation  And others …

21 /fp:except ► Reliable floating point exceptions ► Thrown and not thrown when expected  Faults and traps, when reliable, should occur at the line that causes the exception  FWAITs on x86 might be added ► Cannot be used with /fp:fast and in managed code

22 /fp:strict ► The strictest FP option  Turns off contractions  Assumes floating point control word can change or that the user will examine flags ► /fp:except is implied ► Low double digit percent slowdown versus /fp:fast

23 What is the output? #include #include int main() { double x, y, z; double sum; x = 1e20; y = -1e20; z = 10.0; sum = x + y + z; printf ("sum=%f\n",sum); } / fp:fast /O2 = 0.000 /fp:strict /O2 = 10.0

24 OpenMP  A specification for writing multithreaded programs  It consists of a set of simple #pragmas and runtime routines  Makes it very easy to parallelize loop-based code  Helps with load balancing, synchronization, etc…  In Visual Studio, only available in C++

25 OpenMP Parallelization ► Can parallelize loops and straight-line code ► Includes synchronization constructs first = 1 last = 1000 1 ≤ i ≤ 250251 ≤ i ≤ 500501 ≤ i ≤ 750751 ≤ i ≤ 1000 void test(int first, int last) { #pragma omp parallel for for (int i = first; i <= last; ++i) { a[i] = b[i] + c[i]; }

26 64bit Compiler in VC2005 ► 64bit Compiler Cross Tools  Compiler is 32bit but resulting image is 64bit ► 64bit Compiler Native Tools  Compiler and resulting image are 64bit binaries. ► All previous optimizations apply for 64bit as well.

27 Resources ► Visual C++ Dev Center  http://msdn.microsoft.com/visualc http://msdn.microsoft.com/visualc  This is the place to go for all our news and whitepapers  Also VC2005 specific forums at http://forums.microsoft.com http://forums.microsoft.com ► Myself  http://blogs.msdn.com/aymans http://blogs.msdn.com/aymans


Download ppt "Visual C++ 2005 New Optimizations Ayman Shoukry Program Manager Visual C++ Microsoft Corporation."

Similar presentations


Ads by Google