Presentation is loading. Please wait.

Presentation is loading. Please wait.

Carnegie Mellon Lessons From Building Spiral The C Of My Dreams Franz Franchetti Carnegie Mellon University Lessons From Building Spiral The C Of My Dreams.

Similar presentations


Presentation on theme: "Carnegie Mellon Lessons From Building Spiral The C Of My Dreams Franz Franchetti Carnegie Mellon University Lessons From Building Spiral The C Of My Dreams."— Presentation transcript:

1 Carnegie Mellon Lessons From Building Spiral The C Of My Dreams Franz Franchetti Carnegie Mellon University Lessons From Building Spiral The C Of My Dreams Franz Franchetti Carnegie Mellon University

2 Carnegie Mellon C Compilers Got Pretty Good… Numerical Recipes: textbook FFT implementation (ANSI C + auto-vectorization) Spiral-generated FFT plain ANSI C Spiral-generated FFT auto-vectorized, using C99 + #pragma Spiral-generated FFT using intrinsics Gap: 30% - 50%

3 Carnegie Mellon Spiral vs. C Compiler Algorithm Generation Algorithm Optimization Implementation Code Optimization problem specification algorithm C code fast executable Search C compiler Spiral Spiral does all high-level optimization algorithm choice program transformations parallelization, vectorization memory layout C compiler = “glorified assembler“ very simple (pre-digested) code access to machine details should behave predictable must produce fast code We are after the fastest possible code

4 Carnegie Mellon Cross-Platform Portability in Spiral SIMD vector extensions SSE – SSE 4.2, AVX, LRBni, AltiVec, VMX, Cell, BlueGene/L, CUDA warps,… Threading and messaging interfaces Pthreads, OpenMP, Windows threads, CUDA, MPI, Cell DMA Compilers Intel C/C++, Intel Fortran, IBM XL C, Gnu C, PGI, MS Visual C, Vector C,… Languages K&R C, ANSI C, C99, C++, Intel/GNU/IBM/MS extensions, Fortran 77, Fortran 90, Java, CUDA, Verilog, x86 assembly Hardware FPGA, ASIC, CPU + instruction in FPGA Caveat: Retargeting the unparser is the easy part. The hard part is in the higher abstraction levels.

5 Carnegie Mellon Proposal: Influence C Standard C is extensible enough for our needs Intrinsic functions, pragmas, preprocessor, bit-fields, pointers, struct/union vector data types, inline assembly, inline opcodes, C compiler available on any machine The OS and the C compiler is built with it… High quality compilers available Intel C, IBM XL C, Gnu C, PGI C, MS Visual C Works for us (somehow) Most library generators target C and find ways to co-opt the compilers Only standards get widely adopted It will take some time If we try to have our own language and compiler, we will fail

6 Carnegie Mellon Everybody Extends C (At Will) Industrial C Compilers Intel C/C++, IBM XL C, MS Visual C, PGI C, Vector C,… Gnu C, LLVM C, Open64 C fall-back for everybody without their own C compiler OpenMP, CUDA, UPC provides parallelism through #pragma or language extensions Hardware vendors map ISAs etc. into intrinsic functions and data types Provide a standard on how to extend C “nicely” for us Specify what a C compiler should do

7 Carnegie Mellon Lets Get To Work Survey what we do to get perforance Threading commands, SIMD intrinsics, memory attributes,… Survey what we want the compilers to do (but they don’t) Translate our wish list into pragmas and attributes etc. Collect horror stories and black belt programming tricks What are the problems and how do we fight the compiler? How do we want C interpreted? “register” keyword, SSA order, array writes=spills What about assembly tricks that can’t be expressed in C? Side effects of instructions, software pipelining, IA32 memory operands Goal: a small C extension to be included in C1X/C2X

8 Carnegie Mellon C for Autotuning: Some Ideas Standardize intrinsic interface and attributes clean up SSE vs. VMX, vector constants, Intel/Gnu/XL syntax, alignment, memory placement, register, function call ABI,… Expose compiler optimizations and algorithms #pragma for Bellady register allocation, array scalarization, dag ordering… Autotuning subset/extension of OpenMP thread pinning, fast synchronization, worker threads, SMT/SIMT Manage multiple address spaces, caches, local stores messaging, DMA, scratchpad load/stores, non-temporal loads/stores, cache lines, memory layout, address translation Strict adherence to semantics, attributes and pragmas register means register, SSA code defines order,…

9 Carnegie Mellon Summary Make C for Autotuning part of the next C standard Small, well developed C extension and C semantics clarifications Don’t build our own compiler Impossible to keep up with industry and hardware changes Collect and formalize community knowledge Autotuning and program generation community effort Convince hardware and compiler vendors to join Need broad support to make it happen Convince the C standard committee Hard but crucial to influence C standard


Download ppt "Carnegie Mellon Lessons From Building Spiral The C Of My Dreams Franz Franchetti Carnegie Mellon University Lessons From Building Spiral The C Of My Dreams."

Similar presentations


Ads by Google