Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Intel Software College.

Similar presentations


Presentation on theme: "Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Intel Software College."— Presentation transcript:

1 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Intel Software College

2 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 2 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Objectives At the successful completion of this module, you will be able to: Use key compiler optimization switches Optimize software for the Architecture Enhance performance with vectorization and other techniques

3 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 3 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Agenda Introduction Compiler Switches Dual Core Vectorization

4 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 4 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Key to optimizing: Intel ® Core™ Duo Exploiting Architectural Power requires Sophisticated Compilers Optimal use of Registers & functional units Dual-Core/Multi-processor SSE instructions Cache architecture

5 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 5 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Agenda Introduction Compiler Switches Intel® C++ compiler Dual Core Vectorization

6 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 6 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Activity 1 - raytrace2: Initial Compilation Set up environment and compile with both Microsoft* Visual C++.NET (MSVC*) and Intel® C++ Compiler (icl)

7 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 7 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version General Optimizations Mac*/Linux*Windows* -O0/Od Disables optimizations -g/Zi Creates symbols -O1/O1 Optimize for Binary Size: Server Code -O2/O2 Optimizes for speed (default) -O3/O3 Optimize for Data Cache: Loopy Floating Point Code

8 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 8 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Activity 2 - raytrace2: O3 Compilation Use Intel compiler’s High Level Optimizer (-O3) for loop centric codes

9 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 9 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Multi-pass Optimization Interprocedural Optimizations (IPO) ip: Enables interprocedural optimizations for single file compilation ipo: Enables interprocedural optimizations across files Can inline functions in separate files Enhances optimization when used in combination with other compiler features Mac*/Linux*Windows* -ip/Qip -ipo/Qipo

10 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 10 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Multi-pass Optimization - IPO Usage: Two-Step Process Linking Mac*/Linux*icc -ipo main.o func1.o func2.o Windows*icl /Qipo main.o func1.o func2.o Pass 1 Pass 2 virtual.o executable Compiling Mac*/Linux*icc -c -ipo main.c func1.c func2.c Windows*icl -c /Qipo main.c func1.c func2.c

11 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 11 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Activity 3 - raytrace2: IPO Compilation Use Intel compiler’s Inter-procedural Optimization (-Qipo)

12 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 12 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Profile Guided Optimizations (PGO) Use execution-time feedback to guide many other compiler optimizations Helps I-cache, paging, branch-prediction Enabled optimizations: Basic block ordering Better register allocation Better decision of functions to inline Function ordering Switch-statement optimization Better vectorization decisions

13 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 13 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Instrumented Compilation (Mac*/Linux*)icc -prof_gen[x] prog.c (Windows*)icl -Qprof_gen[x] prog.c Instrumented Execution Run program on a typical dataset Feedback Compilation (Mac/Linux)icc -prof_use prog.c (Windows)icl -Qprof_use prog.c DYN file containing dynamic info:.dyn Instrumented executable Merged DYN summary file:.dpi Delete old dyn files if you do not want the info included Step 1 Step 2 Step 3 Multi-pass Optimization PGO: Three-Step Process

14 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 14 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Activity 4 - raytrace2: PGO Compilation Use Intel compiler’s Profile-guided Optimization

15 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 15 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Agenda Introduction Compiler Switches Dual Core Auto Parallelization OpenMP Threading Diagnostics Vectorization

16 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 16 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Auto-parallelization Auto-parallelization: Automatic threading of loops without having to manually insert OpenMP* directives. Compiler can identify “easy” candidates for parallelization, but large applications are difficult to analyze. Mac*/Linux*Windows* -parallel/Qparallel -par_report[n]/Qpar_report[n]

17 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 17 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version OpenMP* Threading Technology Pragma based approach to parallelism Usage: OpenMP switches: -openmp : /Qopenmp OpenMP reports: - openmp-report : /Qopenmp-report #pragma omp parallel for for (i=0;i<MAX;i++) A[i]= c*A[i] + B[i];

18 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 18 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version OpenMP: Workqueueing Extension Example Intel Compiler’s Workqueuing extension Create Queue of tasks…Works on… Recursive functions Linked lists, etc. #pragma intel omp parallel taskq shared(p) { while (p != NULL) { #pragma intel omp task captureprivate(p) do_work1(p); p = p->next; }

19 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 19 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Parallel Diagnostics Source Instrumentation for Intel Thread Checker Allows thread checker to diagnose threading correctness bugs To use tcheck/Qtcheck you must have Intel Thread Checker installed See thread checker documentation http://www.intel.com/support/perfor mancetools/sb/CS-009681.htm Linux* (no support for Mac*) Windows* -tcheck/Qtcheck

20 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 20 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Agenda Introduction Compiler Switches Dual Core Vectorization SSE & Vectorization Vectorization Reports Explanations of a few specific vectorization inhibitors

21 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 21 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version SIMD – SSE, SSE2, SSE3 Support 16x bytes 8x words 4x dwords 2x qwords 1x dqword 4x floats 2x doubles MMX* SSE SSE2 SSE3 * MMX actually used the x87 Floating Point Registers - SSE, SSE2, and SSE3 use the new SSE registers

22 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 22 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version SIMD FP using AOS format* Thread Synchronization Video encoding Complex arithmetic FP to integer conversions HADDPD, HSUBPD HADDPS, HSUBPS MONITOR, MWAIT LDDQU ADDSUBPD, ADDSUBPS, MOVDDUP, MOVSHDUP, MOVSLDUP FISTTP * Also benefits Complex and Vectorization SSE3 Instructions

23 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 23 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Using SSE3 - Your Task: Convert This… 128-bit Registers A[0] B[0] C[0] + + + + A[1] B[1] C[1] not used for (i=0;i<=MAX;i++) c[i]=a[i]+b[i];

24 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 24 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version … Into This … 128-bit Registers A[3] A[2] B[3] B[2] C[3] C[2] + + A[1] A[0] B[1] B[0] C[1] C[0] + + for (i=0;i<=MAX;i++) c[i]=a[i]+b[i];

25 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 25 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Compiler Based Vectorization Processor Specific DescriptionUseMac*Linux * Windows * Generate instructions and optimize for Intel ® Pentium ® 4 compatible processors including MMX, SSE and SSE2. WDoes not apply -xW/QxW Generate instructions and optimize for Intel ® processors with SSE3 capability including Core Duo. These processors support SSE3 as well as MMX,SSE and SSE2. PVector- ization occurs by default -xP, -axP /QxP /QaxP

26 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 26 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Compiler Based Vectorization Automatic Processor Dispatch – ax[?] Single executable Optimized for Intel® Core Duo processors and generic code that runs on all IA32 processors. For each target processor it uses: Processor-specific instructions Vectorization Low overhead Some increase in code size

27 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 27 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Activity 5 – raytrace2: Vectorization Use Intel compiler’s Vectorization optimization (-QxP)

28 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 28 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Why Loops Don’t Vectorize Independence Loop Iterations generally must be independent Some relevant qualifiers: Some dependent loops can be vectorized. Most function calls cannot be vectorized. Some conditional branches prevent vectorization. Loops must be countable. Outer loop of nest cannot be vectorized. Mixed data types cannot be vectorized.

29 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 29 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Why Didn’t My Loop Vectorize? Macintosh*/Linux*Windows* -vec_reportn-Qvec_reportn Set diagnostic level dumped to stdout n=0: No diagnostic information n=1: (Default) Loops successfully vectorized n=2: Loops not vectorized – and the reason why not n=3: Adds dependency Information n=4: Reports only non-vectorized loops n=5: Reports only non-vectorized loops and adds dependency info

30 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 30 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Why Loops Don’t Vectorize “Existence of vector dependence” “Nonunit stride used” “Mixed Data Types” “Unsupported Loop Structure” “Contains unvectorizable statement at line XX” There are more reasons loops don’t vectorize but we will disucss the reasons above

31 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 31 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version “Existence of Vector Dependency” Usually, indicates a real dependency between iterations of the loop, as shown here: for (i = 0; i < 100; i++) x[i] = A * x[i + 1];

32 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 32 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Defining Loop Independence Iteration Y of a loop is independent of when (or whether) iteration X occurs. int a[MAX], b[MAX]; for (j=0;j<MAX;j++) { a[j] = b[j]; }

33 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 33 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version “Nonunit stride used” for (I=0;I<=MAX;I++) for (J=0;J<=MAX;J++) { c[I][J]+=1; // Unit Stride c[J][I]+=1; // Non-Unit A[J*J]+=1; // Non-unit A[B[J]]+=1; // Non-Unit if (A[MAX-J])=1 last1=J;}// Non-Unit End Result: Loading Vector may take more cycles than executing operation sequentially. Memory

34 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 34 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version “Mixed Data Types” An example: int howmany_close(double *x, double *y) { int withinborder=0; double dist; for(int i=0;i<MAX;i++) { dist=sqrtf(x[i]*x[i] + y[i]*y[i]); if (dist<5) withinborder++; } Mixed data types are possible – but complicate things i.e.: 2 doubles vs 4 ints per SIMD register Some operations with specific data types won’t work

35 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 35 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version “Unsupported Loop Structure” Example: struct _xx { int data; int bound; } ; doit1(int *a, struct _xx *x) { for (int i=0; i bound; i++) a[i] = 0; An unsupported loop structure means the loop is not countable, or the compiler for whatever reason can’t construct a run-time expression for the trip count.

36 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 36 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version “Contains unvectorizable statement” for (i=1;i<nx;i++) { B[i] = func(A[i]); } 128-bit Registers A[3] A[2] B[3] B[2] func A[1] A[0] B[1] B[0] func

37 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 37 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Activity 6 - raytrace2: Putting it all together Use all previous optimizations in tandem (-O3, -QxP, IPO and PGO)

38 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 38 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Reference Web-based and classroom training www.intel.com/software/college White papers and technical notes www.intel.com/ids www.intel.com/software/products Product support resources www.intel.com/software/products/support

39 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 39 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version


Download ppt "Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Intel Software College."

Similar presentations


Ads by Google