C++11 and three compilers: performance considerations CERN openlab Summer Students Lightning Talks Sessions Supervised by Pawel Szostek Stephen Wang ›

Slides:



Advertisements
Similar presentations
Chapter 17 vector and Free Store Bjarne Stroustrup
Advertisements

Yaser Zhian Dead Mage IGDI, Workshop 10, May 30 th -31 st, 2013.
Chapter 4 Ch 1 – Introduction to Computers and Java Flow of Control Loops 1.
Overview of programming in C C is a fast, efficient, flexible programming language Paradigm: C is procedural (like Fortran, Pascal), not object oriented.
Jacobi solver status Lucian Anton, Saif Mulla, Stef Salvini CCP_ASEARCH meeting October 8, 2013 Daresbury 1.
A Process Splitting Transformation for Kahn Process Networks Sjoerd Meijer.
Chapter 17 vector and Free Store John Keyser’s Modifications of Slides By Bjarne Stroustrup
Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.
Intel Parallel Advisor Workflow David Valentine Computer Science Slippery Rock University.
Advanced microprocessor optimization Kampala August, 2007 Agner Fog
Fundamentals of Python: From First Programs Through Data Structures
LECTURE 1 CMSC 201. Overview Goal: Problem solving and algorithm development. Learn to program in Python. Algorithm - a set of unambiguous and ordered.
The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.
CS 4800 By Brandon Andrews.  Specifications  Goals  Applications  Design Steps  Testing.
A Tool to Support Ontology Creation Based on Incremental Mini- Ontology Merging Zonghui Lian Data Extraction Research Group Supported by Spring Conference.
Complexity Analysis (Part I)
Computer Programming 1 Repetition. Computer Programming 2 Objectives Repetition structures Study while and do loops Examine for loops A practical example.
Performance Analysis and Optimization Tool Andres S. CHARIF-RUBIAL Emmanuel OSERET Performance Analysis Team, University.
Trilinos Coding and Documentation Guidelines Roscoe A. Bartlett Trilinos Software Engineering Technologies and Integration Lead Computer Science and Mathematics.
Lecture 4 Loops.
Tuning Tophat2 Belinda Giardine. Tophat2 Aligns reads from RNA to the genome Ribonucleic acid (RNA) is a ubiquitous family of large biological molecules.
Algorithmic Problem Solving CMSC 201 Adapted from slides by Marie desJardins (Spring 2015 Prof Chang version)
Scientific Computing Beyond Matlab Nov 19, 2012 Jason Su.
Aggregating Labels in Crowdsourcing Data CERN Openlab Summer Students Lightning Talks Sessions Maria Priisalu › 19/08/2015.
GNU Compiler Collection (GCC) and GNU C compiler (gcc) tools used to compile programs in Linux.
FFT: Accelerator Project Rohit Prakash Anand Silodia.
InCoB August 30, HKUST “Speedup Bioinformatics Applications on Multicore- based Processor using Vectorizing & Multithreading Strategies” King.
Algorithms and Algorithm Analysis The “fun” stuff.
Profiling and Tuning OpenACC Code. Profiling Tools (PGI) Use time option to learn where time is being spent -ta=nvidia,time NVIDIA Visual Profiler 3 rd.
Performance of mathematical software Agner Fog Technical University of Denmark
AES Encryption Code Generator Undergraduate Research Project by Paul Magrath. Supervised by Dr David Gregg.
CSE 303 Concepts and Tools for Software Development Richard C. Davis UW CSE – 12/6/2006 Lecture 24 – Profilers.
Structure Layout Optimizations in the Open64 Compiler: Design, Implementation and Measurements Gautam Chakrabarti and Fred Chow PathScale, LLC.
Parallelization of likelihood functions for data analysis Alfio Lazzaro CERN openlab Forum on Concurrent Programming Models and Frameworks.
Chapter 7 Object Code Generation. Chapter 7 -- Object Code Generation2  Statements in 3AC are simple enough that it is usually no great problem to map.
Cint : C++ interpreter Frequently Asked Questions Digest 14 Oct 2002 at CERN Masaharu Goto.
Zenodo Information Architecture and Usability CERN openlab Summer Students Lightning Talks Sessions Megan Potter › 19/08/2015.
Auto-Vectorization Jim Hogg Program Manager Visual C++ Compiler Microsoft Corporation.
Programming Languages
Arrays. The array data structure Array is a collection of elements, that have the same data type Integers (int) Floating point numbers (float, double)
1 Performance Issues CIS*2450 Advanced Programming Concepts.
3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.
Oracle TimesTen in-memory database integration CERN openlab Summer Student Lightning Talks Sessions Jakub Žitný Supervisor: Miroslav Potocký.
Flow Control in Imperative Languages. Activity 1 What does the word: ‘Imperative’ mean? 5mins …having CONTROL and ORDER!
FFT Accelerator Project Rohit Prakash(2003CS10186) Anand Silodia(2003CS50210) Date : February 23,2007.
 Prepared by: Eng. Maryam Adel Abdel-Hady
Computer Programming 12 Lesson 6 – Loop structure By: Dan Lunney.
Program Design & Development EE 201 C7-1 Spring
July 10, 2016ISA's, Compilers, and Assembly1 CS232 roadmap In the first 3 quarters of the class, we have covered 1.Understanding the relationship between.
The “Understanding Performance!” team in CERN IT
An Iterative FFT We rewrite the loop to calculate nkyk[1] once
Welcome: Intel Multicore Research Conference
Big Data Analytics: HW#3
STL Common tools for C++.
Compiling Dynamic Data Structures in Python to Enable the Use of Multi-core and Many-core Libraries Bin Ren, Gagan Agrawal 9/18/2018.
Engineering Innovation Center
7 Best Programming Languages Based as per Earnings & Opportunities
AUTOMATED SESSION PLANNING. In the present world, everything has become automated. By, a click everything is being processed. But the preparation of the.
Anne Pratoomtong ECE734, Spring2002
IXPUG Abstract Submission Instructions
Learn about MATLAB Engineers – not sales!
Performance Optimization for Embedded Software
Compiler Back End Panel
1 مفهوم ارتباطات ارتباطات معادل واژه communications ) ميباشد(. ارتباطات يك فرايند اجتماعي و دو طرفه است كه در آن اطلاعات مبادله شده و نوعي تفاهم بين طرفهاي.
Compiler Back End Panel
Analyzing an Algorithm Computing the Order of Magnitude Big O Notation
STL: Traversing a Vector
Multi-Core Programming Assignment
Implementation of a De-blocking Filter and Optimization in PLX
Makefiles, GDB, Valgrind
Presentation transcript:

C++11 and three compilers: performance considerations CERN openlab Summer Students Lightning Talks Sessions Supervised by Pawel Szostek Stephen Wang › 19/08/2014

Motivation › “Surprisingly, C++11 feels like a new language” by Stroustrup. › Improvements: Language usability, Mulithreading and other stuff. › How about performance? › Four questions: Time measurement methods, For-loop efficiency, std::async, STL algorithms in parallel mode › GCC 4.9.0, ICC , Clang+LLVM 3.4 › -O2, -O3, -Ofast 19/08/2014Stephen Wang2

Contribution › Make four micro-benchmarks for each feature. › Code: › Automatize the process, make results repeatable › Python and bash script are used. › Compile -> Run -> Table (3 compiler * 3 options * containers * algorithms …) › Use profiling tools to check performance such as vectorization and threads. › Perf, Intel Vtune, Likwid, even PMU › Answer basic questions. 19/08/2014Stephen Wang3

Conclusions auto start = std::chrono::steady_clock::no w(); auto end = std::chrono::steady_clock::no w(); int elapsed = std::chrono::duration_cast (end - start).count(); overhead.push_back(elapsed); 19/08/2014Stephen Wang4 RDTSCP std::chrono

Conclusions › std::chrono (C++11) is a reliable time measurement. 19/08/2014Stephen Wang5 auto start = std::chrono::steady_clock::no w(); microseconds_sleep(index); auto end = std::chrono::steady_clock::no w(); int elapsed = std::chrono::duration_cast (end - start).count();

Conclusions › Performance of for-loop varies with iteration method and container. › Which containers are vectorized? 19/08/2014Stephen Wang6 arraystd::arraystd::liststd::setstd::vector GCC ICC Clang Table range-based for loop

Conclusions › std::async spawns threads when called and the computation is done immediately. ( stdlibc++ from GCC) 19/08/2014Stephen Wang7 double sum = 0; auto handle1 = std::async(std::launch::async, fsum, v.begin(), v.begin()+distance); Auto handle2 = std::async(…); … sum = handle1.get()+handle2.get()+ha ndle3.get()+handle4.get();

Conclusions › GNU libstdc++ parallel speeds up part of STL algorithms, whose performance varies with containers. 19/08/2014Stephen Wang8

› Thanks 19/08/2014Stephen Wang9