OpenMP * Support in Clang/LLVM: Status Update and Future Directions 2014 LLVM Developers' Meeting Alexey Bataev, Zinovy Nis Intel.

Slides:



Advertisements
Similar presentations
Endpoints Proposal Update Jim Dinan MPI Forum Hybrid Working Group June, 2014.
Advertisements

INTEL CONFIDENTIAL Threading for Performance with Intel® Threading Building Blocks Session:
GPU Computing with OpenACC Directives. subroutine saxpy(n, a, x, y) real :: x(:), y(:), a integer :: n, i $!acc kernels do i=1,n y(i) = a*x(i)+y(i) enddo.
Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.
Intel ® Xeon ® Processor E v2 Product Family Ivy Bridge Improvements *Other names and brands may be claimed as the property of others. FeatureXeon.
The OpenUH Compiler: A Community Resource Barbara Chapman University of Houston March, 2007 High Performance Computing and Tools Group
Software and Services Group Optimization Notice Advancing HPC == advancing the business of software Rich Altmaier Director of Engineering Sept 1, 2011.
Advanced microprocessor optimization Kampala August, 2007 Agner Fog
Perceptual Computing SDK Q2, 2013 Update Building Momentum with the SDK 1 Barry Solomon, Senior Product Manager, Intel Xintian Wu, Architect, Intel.
Software & Services Group Developer Products Division Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property.
Presented by Rengan Xu LCPC /16/2014
Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.
1 ITCS4145/5145, Parallel Programming B. Wilkinson Feb 21, 2012 Programming with Shared Memory Introduction to OpenMP.
OpenMPI Majdi Baddourah
INTEL CONFIDENTIAL OpenMP for Domain Decomposition Introduction to Parallel Programming – Part 5.
HEVC Commentary and a call for local temporal distortion metrics Mark Buxton - Intel Corporation.
Intel® Education Read With Me Intel Solutions Summit 2015, Dallas, TX.
Intel® Education Learning in Context: Science Journal Intel Solutions Summit 2015, Dallas, TX.
Getting Reproducible Results with Intel® MKL 11.0
ORIGINAL AUTHOR JAMES REINDERS, INTEL PRESENTED BY ADITYA AMBARDEKAR Overview for Intel Xeon Processors and Intel Xeon Phi coprocessors.
This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit
Software & Services Group, Developer Products Division Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.
Tuning Python Applications Can Dramatically Increase Performance Vasilij Litvinov Software Engineer, Intel.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 5 Shared Memory Programming with OpenMP An Introduction to Parallel Programming Peter Pacheco.
Parallel Programming in Java with Shared Memory Directives.
OpenMP China MCP.
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads.
Evaluation of a DAG with Intel® CnC Mark Hampton Software and Services Group CnC MIT July 27, 2010.
IBIS-AMI and Direction Indication February 17, 2015 Updated Feb. 20, 2015 Michael Mirmak.
1 Intel® Many Integrated Core (Intel® MIC) Architecture MARC Program Status and Essentials to Programming the Intel ® Xeon ® Phi ™ Coprocessor (based on.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
OpenMP - Introduction Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
1 OpenMP Writing programs that use OpenMP. Using OpenMP to parallelize many serial for loops with only small changes to the source code. Task parallelism.
OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
Enterprise Platforms & Services Division (EPSD) JBOD Update October, 2012 Intel Confidential Copyright © 2012, Intel Corporation. All rights reserved.
Introduction to OpenCL* Ohad Shacham Intel Software and Services Group Thanks to Elior Malul, Arik Narkis, and Doron Singer 1.
Computer Organization David Monismith CS345 Notes to help with the in class assignment.
IBIS-AMI and Direction Decisions
IBIS-AMI and Direction Indication February 17, 2015 Michael Mirmak.
Recognizing Potential Parallelism Introduction to Parallel Programming Part 1.
The Drive to Improved Performance/watt and Increasing Compute Density Steve Pawlowski Intel Senior Fellow GM, Architecture and Planning CTO, Digital Enterprise.
Copyright © 2011 Intel Corporation. All rights reserved. Openlab Confidential CERN openlab ICT Challenges workshop Claudio Bellini Business Development.
Boxed Processor Stocking Plans Server & Mobile Q1’08 Product Available through February’08.
Threaded Programming Lecture 2: Introduction to OpenMP.
Auto-Vectorization Jim Hogg Program Manager Visual C++ Compiler Microsoft Corporation.
Single Node Optimization Computational Astrophysics.
Template Library for Vector Loops A presentation of P0075 and P0076
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
10/05/2010CS4961 CS4961 Parallel Programming Lecture 13: Task Parallelism in OpenMP Mary Hall October 5,
INTEL CONFIDENTIAL Intel® Smart Connect Technology Remote Wake with WakeMyPC November 2013 – Revision 1.2 CDI/IBP #:
Special Topics in Computer Engineering OpenMP* Essentials * Open Multi-Processing.
Tuning Threaded Code with Intel® Parallel Amplifier.
© Copyright Khronos Group, Page 1 Real-Time Shallow Water Simulation with OpenCL for CPUs Arnon Peleg, Adam Lake software, Intel OpenCL WG, The.
Martin Kruliš by Martin Kruliš (v1.1)1.
Intel® Many Integrated Core Architecture Software & Services Group, Developer Relations Division Copyright© 2011, Intel Corporation. All rights reserved.
SHARED MEMORY PROGRAMMING WITH OpenMP
CS427 Multicore Architecture and Parallel Computing
Computer Engg, IIT(BHU)
Many-core Software Development Platforms
SOC Runtime Gregory Stoner.
OpenFabrics Interfaces: Past, present, and future
Multi-core CPU Computing Straightforward with OpenMP
OpenFabrics Interfaces Working Group Co-Chair Intel November 2016
12/26/2018 5:07 AM Leap forward with fast, agile & trusted solutions from Intel & Microsoft* Eman Yarlagadda (for Christine McMonigal) Hybrid Cloud – Product.
Ideas for adding FPGA Accelerators to DPDK
Enabling TSO in OvS-DPDK
A Scalable Approach to Virtual Switching
OpenMP Martin Kruliš.
CUDA Fortran Programming with the IBM XL Fortran Compiler
Presentation transcript:

OpenMP * Support in Clang/LLVM: Status Update and Future Directions 2014 LLVM Developers' Meeting Alexey Bataev, Zinovy Nis Intel

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 1.Intro 2.Status 3.Runtime Library 4.Coprocessor/Accelerator Support 5.SIMD Features & Status 2 Agenda

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice  Industry-wide standard for shared memory multiprocessing programming in C/C++ and Fortran  Vendor-neutral, platform-neutral, portable, pragma based, managed by an independent consortium  Very important in High Performance Computing (HPC)  OpenMP Version 3.1 (July 2011)  Implemented in GCC *, ICC, Oracle Solaris Studio *, PGI * …  OpenMP Version 4.0 (July 2013)  Adds support for coprocessors/accelerators, SIMD, error handling, thread affinity, user defined reductions  Implemented in GCC *, ICC … 3 What is OpenMP * ?

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice “OpenMP * in Clang/LLVM” Team 4 Driving collaboration and implementation  Implementation of OpenMP in Clang/LLVM  Coprocessor/accelerator support  Test coverage  Code reviews Current participation from AMD*, Argonne National Laboratory, IBM*, Intel, Micron*, Texas Instruments*, University of Houston Talk to us if you want to be involved!

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice  Uses “early” outlining  Parallel and task regions are outlined as functions, all other regions are emitted as is  Pragmas are just regular AST nodes with associated structured code block  Executable directives are Stmts  Declarative directives are Decls  CodeGen follows regular rules for Stmts and Decls nodes  Uses libiomp5 runtime library, targets Intel OpenMP * API  Calls to runtime functions are generated in frontend  No LLVM IR extensions are required 5 OpenMP * Support in Clang Support in Clang/LLVM is under development

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 6 OpenMP Constructs Representation #pragma omp parallel if ( ) OMPParallelDirective OMPExecutableDirective Stmt CapturedStmt (with ) OMPClause OMPIfClause Expr OMPClause …

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice  Clang/LLVM 3.5 Release  Parsing and semantic analysis for OpenMP * 3.1 (except for ‘ordered’ and ‘atomic’ directives)  CodeGen for ‘parallel’ and ‘simd’ directives, ‘safelen’ clause  First release with libiomp5 OpenMP runtime library!  Clang/LLVM Trunk  Complete parsing and semantic analysis for OpenMP 3.1 and partial for OpenMP 4.0  CodeGen for ‘critical’ directive, ‘if’, ‘num_threads’, ‘firstprivate’, ‘private’, ‘aligned’ and ‘collapse’ clauses  No support in driver yet, use -Xclang -fopenmp=libiomp5  Planning full OpenMP 3.1 support in Clang/LLVM 3.6 Release, most of OpenMP 4.0 in Clang/LLVM 3.7 Release 7 Current Status Depends on the speed of code review!

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice  Full OpenMP * part of OpenMP 4.0 support  Coprocessor/accelerator support is under development  OpenMP Validation Suite from OpenUH * test suite  Passed 119 tests of 123  Supported on Linux * and Mac OS X *  x86, x86-64, PowerPC *, ARM *  Based on Clang/LLVM 3.5 Release  Includes trunk-based version (maintained by Hal Finkel)  Use -fopenmp driver option 8 Current Status: clang-omp Repo You can try it right now!

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice Current Status: Summary 9 Features Parsing/Sema (clang-omp) CodeGen (clang-omp) Parsing/Sema (Trunk) CodeGen (Trunk) Parallel Yes Partially Worksharing Yes No Tasking Yes No Other (OpenMP * 3.1) Yes No SIMD (OpenMP 4.0) Yes Partially Coprocessor/ Accelerator (OpenMP 4.0) YesPartially No Other (OpenMP 4.0) Yes No

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice  Supports OpenMP 4.0 (though not all features are yet in Clang)  Supported on Windows *, Linux * and Mac OS X *  x86, x86-64, PowerPC *, ARM *  Can be built by Clang, GCC *, ICC  Two build bots configured (libiomp5-clang-x86_64-linux- debian, libiomp5-gcc-x86_64-linux-debian) 10 OpenMP * Runtime Library libiomp5

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice  Required for support of OpenMP 4.0 target constructs  Under development by Intel, IBM *, TI *, etc. in clang-omp repository  Plan to support x86, x86-64, PowerPC * and ARM * as hosts, multiple targets (Intel® Xeon Phi™ coprocessor, GPUs, FPGAs, …) 11 OpenMP * Coprocessor/Accelerator Support Library HOST LLVM generated host code libomptarget.so Phi offload RTL Phi GPU Data Phi code GPU code GPU offload RTL Fat binary

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice  Coprocessors/Accelerators –brand-new #pragma omp target  Threads –good old #pragma omp parallel for  SIMD –OpenMP with SIMD flavor #pragma omp simd 12 OpenMP * : Levels of Parallelism

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice  How to run faster?  Run instructions simultaneously –Thread-level parallelism –#pragma omp parallel for  Handle the data simultaneously –Data-level parallelism –SIMD, Single Instruction Multiple Data, vector instructions –#pragma omp simd  Do Both –#pragma omp parallel for simd 13 SIMD: What is It?

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 14 SIMD: How to Use … float *A, *B, *C;... #pragma omp simd for(int i…) C[i] = A[i] + B[i]; $> clang –O2 –Xclang – fopenmp=… Explicit Manual … float *A, *B, *C;... for(int i…) C[i] = A[i] + B[i]; $> clang –O2 … Auto High and transparent portability High and transparent scalability No development cost Unpredictable performance Max performance Low portability High development cost Low scalability Expresses desired vectorization High and transparent portability High and transparent scalability Low development cost Predictable performance #include "xmmintrin.h“ … float *A, *B, *C; __m128 a, b, c;... for(int i…) { a = _mm_load_ps(A + i*4); b = _mm_load_ps(B + i*4); c = _mm_add_ps(a, b); _mm_store_ps(C+i * 4, c); } $> clang …

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice  Autovectorization in LLVM fails if:  cost heuristic says “no”  non-trivial data dependencies  overhead is high for run-time checks  loop is not innermost  OpenMP * SIMD pragmas:  vectorize, don’t check! 15 SIMD: Autovectorization vs Pragmas // RT pointer aliasing checks: float* void doThings(float *a, float *b, float *c, float *d, float *e, unsigned N, unsigned K) { for(int i = 0; i < N; ++i) // not innermost for(int j = 0; j < N; ++j) // non-trivial data-deps due to unknown K a[i + j] = b[i + K] + c[i + K] + d[i + K] + e[i + K]; } $> clang -mllvm -debug-only=loop-vectorize … LV: Checking a loop in "doThings" from test.c:4:3... LV: Found a runtime check ptr: %arrayidx18.us = … LV: We need to do 4 pointer comparisons LV: We can't vectorize because we can't find the array bounds LV: Can't vectorize due to memory conflicts $> clang -mllvm -debug-only=loop-vectorize –Xclang –fopenmp=… LV: Checking a loop in "doThings" from test.c:4:3 LV: Loop hints: force=enabled width=0 unroll=0 LV: A loop annotated parallel, ignore memory dependency checks. LV: We can vectorize this loop! // the only change #pragma omp simd collapse(2) for(int i = 0; i < N; ++i) for(int j = 0; j < N; ++j) a[i + j] = … …

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice  OpenMP SIMD requires support from both frontend and backend  Frontend parses OpenMP SIMD pragmas and creates LLVM IR metadata hints, which are then used by vectorizer  In case of collapse clause frontend creates a new loop, collapsing the existing loop nest into single loop 16 OpenMP* SIMD: Status in Clang LV: Checking a loop in … LV: Loop hints: force=enabled width=4 unroll=0 … !42 = metadata !{metadata !"llvm.loop.vectorize.enable", i1 true} !43 = metadata !{metadata !"llvm.loop.vectorize.width", i32 4} #pragma omp simd safelen(4) for(int i = 0; i < N; ++i) …

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice  LLVM loop vectorizer recognizes loop vectorizing metadata  Collapse, safelen and aligned clauses are ready to use  Other clauses and constructs are under development  TODO:  Vectorizing metadata must be propagated through all the passes prior to the vectorizer  Compile-time memory checks must be disabled in the presence of OpenMP SIMD pragmas 17 OpenMP* SIMD: Status in LLVM backend

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice Questions? OpenMP * Support in Clang/LLVM: Status Update and Future Directions Alexey Bataev, Zinovy Nis 18

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Copyright © 2014, Intel Corporation. All rights reserved. Intel, Pentium, Xeon, Xeon Phi, Core, VTune, Cilk, and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries. Optimization Notice Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice Backup 20

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 21 OpenMP Constructs Representation: Continued OMPParallelDirective CapturedStmt OMPIFClause br i1, label %omp.if.then, label%omp.if.else omp.if.else: br label %omp.if.end omp.if.then: %3 = bitcast %struct.anon* %agg.captured to i8* call i32 1, void (i32*, i32*,...)* bitcast (void (i32*, i32*, to void (i32*, i32*,...)*), i8* ) br label %omp.if.end omp.if.end: … define internal i32*, i8*) #0 { %.gtid. = load i32* %0 call i32 %.gtid.) } Expr Stmt