Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dilemma of Parallel Programming Xinhua Lin ( 林新华 ) HPC Lab of 17 th Oct 2011.

Similar presentations


Presentation on theme: "Dilemma of Parallel Programming Xinhua Lin ( 林新华 ) HPC Lab of 17 th Oct 2011."— Presentation transcript:

1 Dilemma of Parallel Programming Xinhua Lin ( 林新华 ) HPC Lab of SJTU @XJTU, 17 th Oct 2011

2 Disclaimers I am not funded by CRAY Slides marked with Chapel logo are taken from Brad Chamberlain’s talk ‘The Mother of All Chapel Talks’, with permission from himself Funny pictures are from Internet

3 About me and HPC Lab in SJTU Directing HPC Lab Co-translator of PPP Co-founder of HMPP CoC for AP&Japan As MS HPC Invitation institutes @SH Support For HPC Center of SJTU Hold SJTU HPC Seminar monthly http://itis.grid.sjtu.edu.cn/blog

4 Three Challenges for ParaProg in multi/many core era Revolution V.S. Evolution Low level V.S. High level – Performance V.S. Programmable Performance V.S. Performance Portability For more detail: Paper Version: Special issue for HPC and Cloud, Sep 2011 Online Version: http://itis.grid.sjtu.edu.cn/bloghttp://itis.grid.sjtu.edu.cn/blog

5 Outline Right Level to expose Parallel ParaProg languages Reviews Multiresolution and Chapel

6 Right Level to Expose Parallel

7 Can we stop water/parallel ? Hardware ISA OS Library Language

8 Performance V.S. Programmable Target Machine MPI OpenMP pthreads Expose Implementing Mechanisms “Why is everything so tedious?” Target Machine ZPL HPF Higher-Level Abstractions “Why don’t I have more control?” Low Level High Level

9 ParaProg Education Tired of teaching yet another specific lang. – MPI for Cluster – OpenMP for SMP then Multi-core CPU – CUDA for GPU, and now OpenCL – More on the way… Had to explain concepts by different tools – Single lang. to explain them all? Similar in OS education – Production OS: Linux, Unix and Window – OS only for education: Minix

10 ParaProg languages Reviews

11 Hybrid Programming Model MPI is insufficient in multi/many core era – OpenMP for multi-core – CUDA/OpenCL for many-core* So called Hybrid Programming was invented as a temporary solution, workable but ugly – MPI+OpenMP for Multi-core cluster – MPI+CUDA/OpenCL for GPU cluster like Tianhe-1A Similar idea used in CUDA for thread and thread-block, OpenCL for work-item and work- group * We will wait and see how OpenMP works on Intel MIC

12 ParaProg from different ways Low Level (expose implementation mechanism ) – MPI, CUDA and OpenCL – OpenMP High Level – PGAS: CAF, UPC and Tianuim – Global View: NESL, ZPL – APGAS: Chapel, X10 Directive Based – HMPP, PGI, CRAY-directive

13 Mulutiesolution and Chapel

14 What is Mulutiesolution? Structure the language in a layered manner, permitting it to be used at multiple levels as required/desired – support high-level features and automation for convenience – provide the ability to drop down to lower, more manual levels – use appropriate separation of concerns to keep these layers clean Distributions Data parallelism Task Parallelism Locality Control Target Machine Base Language language concepts

15 Where Chapel was born: HPCS HPCS: High Productivity Computing Systems (DARPA et al.) – Goal: Raise productivity of high-end computing users by 10  – Productivity = Performance + Programmability + Portability + Robustness Phase II: Cray, IBM, Sun (July 2003 – June 2006) – Evaluated the entire system architecture’s impact on productivity… processors, memory, network, I/O, OS, runtime, compilers, tools, … …and new languages: Cray: Chapel IBM: X10 Sun: Fortress Phase III: Cray, IBM (July 2006 – 2010) – Implement the systems and technologies resulting from phase II – (Sun also continues work on Fortress, without HPCS funding)

16 Global-view V.S. Fragmented Problem: “Apply 3-pt stencil to vector” global-view = + ( )/2)/2 fragmented = + = + = )/2+ ( ((

17 Global-view V.S. SPMD Code Global-View def main() { var n: int = 1000; var a, b: [1..n] real; forall i in 2..n-1 { b(i) = (a(i-1) + a(i+1))/2; } SPMD def main() { var n: int = 1000; var locN: int = n/numProcs; var a, b: [0..locN+1] real; if (iHaveRightNeighbor) { send(right, a(locN)); recv(right, a(locN+1)); } if (iHaveLeftNeighbor) { send(left, a(1)); recv(left, a(0)); } forall i in 1..locN { b(i) = (a(i-1) + a(i+1))/2; }

18 Chapel Overview A design principle for HPC – “Support the general case, optimize for the common case” Data Parallel (ZPL) + Task Parallel(CRAY MTA) + Script Lang. Latest version 1.3.0 is available in as OSS: http://sourceforge.net/projects/chapel Distributions Data parallelism Task Parallelism Locality Control Target Machine Base Language language concepts

19 Chapel example: Heat Transfer A: 1.0 n n   4 repeat until max change < 

20 Chapel Code For Heat Transfer

21 Chapel as Minix in ParaProg If I were to offer a ParaProg class, I’d want to teach about: – data parallelism – task parallelism – concurrency – synchronization – locality/affinity – deadlock, livelock, and other pitfalls – performance tuning – …

22 Conclusion—Major Points Programmable and Performance are always the dilemma of ParaProg Multiresolution sounds perfect in theory but not mature enough for production However, Chapel could be used as Minix in ParaProg

23 Q&A


Download ppt "Dilemma of Parallel Programming Xinhua Lin ( 林新华 ) HPC Lab of 17 th Oct 2011."

Similar presentations


Ads by Google