Presentation is loading. Please wait.

Presentation is loading. Please wait.

C++11 and three compilers: performance considerations CERN openlab Summer Students Lightning Talks Sessions Supervised by Pawel Szostek Stephen Wang ›

Similar presentations


Presentation on theme: "C++11 and three compilers: performance considerations CERN openlab Summer Students Lightning Talks Sessions Supervised by Pawel Szostek Stephen Wang ›"— Presentation transcript:

1 C++11 and three compilers: performance considerations CERN openlab Summer Students Lightning Talks Sessions Supervised by Pawel Szostek Stephen Wang › 19/08/2014

2 Motivation › “Surprisingly, C++11 feels like a new language” by Stroustrup. › Improvements: Language usability, Mulithreading and other stuff. › How about performance? › Four questions: Time measurement methods, For-loop efficiency, std::async, STL algorithms in parallel mode › GCC 4.9.0, ICC 15.0.0, Clang+LLVM 3.4 › -O2, -O3, -Ofast 19/08/2014Stephen Wang2

3 Contribution › Make four micro-benchmarks for each feature. › Code: https://github.com/wangyichao/CPP11_Benchmarks › Automatize the process, make results repeatable › Python and bash script are used. › Compile -> Run -> Table (3 compiler * 3 options * containers * algorithms …) › Use profiling tools to check performance such as vectorization and threads. › Perf, Intel Vtune, Likwid, even PMU › Answer basic questions. 19/08/2014Stephen Wang3

4 Conclusions auto start = std::chrono::steady_clock::no w(); auto end = std::chrono::steady_clock::no w(); int elapsed = std::chrono::duration_cast (end - start).count(); overhead.push_back(elapsed); 19/08/2014Stephen Wang4 RDTSCP std::chrono

5 Conclusions › std::chrono (C++11) is a reliable time measurement. 19/08/2014Stephen Wang5 auto start = std::chrono::steady_clock::no w(); microseconds_sleep(index); auto end = std::chrono::steady_clock::no w(); int elapsed = std::chrono::duration_cast (end - start).count();

6 Conclusions › Performance of for-loop varies with iteration method and container. › Which containers are vectorized? 19/08/2014Stephen Wang6 arraystd::arraystd::liststd::setstd::vector GCC ICC Clang Table range-based for loop

7 Conclusions › std::async spawns threads when called and the computation is done immediately. ( stdlibc++ from GCC) 19/08/2014Stephen Wang7 double sum = 0; auto handle1 = std::async(std::launch::async, fsum, v.begin(), v.begin()+distance); Auto handle2 = std::async(…); … sum = handle1.get()+handle2.get()+ha ndle3.get()+handle4.get();

8 Conclusions › GNU libstdc++ parallel speeds up part of STL algorithms, whose performance varies with containers. 19/08/2014Stephen Wang8

9 › Thanks 19/08/2014Stephen Wang9


Download ppt "C++11 and three compilers: performance considerations CERN openlab Summer Students Lightning Talks Sessions Supervised by Pawel Szostek Stephen Wang ›"

Similar presentations


Ads by Google