Presentation is loading. Please wait.

Presentation is loading. Please wait.

Applications for K42 Initial Brainstorming Paul Hargrove and Kathy Yelick with input from Lenny Oliker, Parry Husbands and Mike Welcome.

Similar presentations


Presentation on theme: "Applications for K42 Initial Brainstorming Paul Hargrove and Kathy Yelick with input from Lenny Oliker, Parry Husbands and Mike Welcome."— Presentation transcript:

1 Applications for K42 Initial Brainstorming Paul Hargrove and Kathy Yelick with input from Lenny Oliker, Parry Husbands and Mike Welcome

2 Criteria for Selecting Apps  Should be OS/Runtime “intensive”  Would highlight features/benefits of K42 I/O performance is important Uses multiple threads per node (show off fast threads) Uses small/asynch/active messages  Pragmatics Should be easily ported to K42 Part of an existing/funded effort Has a collaborator who: »Understands the applications in detail (e.g., and author) »Is interested in OS/Runtime issues

3 MADCAP: I/O Bound  For the Astrophysicists in the audience:  Microwave Anisotropy Dataset Computational Analysis Package  Optimal general algorithm for extracting key cosmological data from Cosmic Microwave Background Radiation (CMB)  Anisotropies in the CMB contains early history of the Universe  Calculates maximum likelihood two-point angular correlation function  Recasts problem in dense linear algebra: ScaLAPACK Steps include: mat-mat, matrix-inv, mat-vec, Cholesky decomp, data redistribution  Portability:  Depends on ScaLAPACK, which is portable  Has been tuned and run on vectors, ccNuma, cluster  Developed at NERSC/LBNL by Julian Borrill  Part of application evaluation suite led by Leonid Oliker at LBNL Temperature anisotropies in CMB measured by Boomerang

4 MADCAP: Performance  Computation dominated by BLAS3: efficiency should be very high  But all systems sustain relatively low % peak  Reason: I/O is a major challenge:  Code Only partially ported due to code’s requirements of global file system  I/O performance is a limiting factor in above  Further work is required to: reduce I/O, remove system calls, and remove global file system requirements  Detailed analysis presented HiPC 2004 by Carter, Borrill, and Oliker  http://crd.lbl.gov/~oliker/papers/HIPC04_final.pdf P Power 3Power4ESX1 Gflops/P%peakGflops/P%peakGflops/P%peakGflops/P%peak 160.6241%1.529%4.132%2.227% 640.5436%0.8116%1.923%2.016%

5 MADCAP: MADbench  MADbench is a “lightweight version of MADCAP” Retains operational complexity of full MADCAP Global files system requirement relaxed (to run on ESS)  MADbench is a proxy for full MADCAP Authors hope to perform work to reduce I/O costs in MADbench and then apply their changes to the full MADCAP.

6 UPC HPL: Multi-Threading  High Performance Linpack is also dominated by BLAS3 In spite of top500 number, surprisingly hard to tune Performance very sensitive to block size, total problem size, etc.  Recently written in UPC by Parry Husbands at LBNL Parallel extension of C that has several compilers Portable Berkeley compiler uses lightweight communication  UPC HPL written in an event-driven style User-level threads for tasks: factorization, matrix multiply, pivoting… Has been run with PTH and Posix Thread and hand-rolled  Performs well, e.g., on the X1 MSP at 64 processors: MPI HPL: 521.6 Gflop/s (n=160,000) UPC HPL: 562.1 Gflop/s (n=128,000)

7 Adaptive Mesh Refinement Comm Tiime  AMR is notoriously hard to scale due to communication cost  Mike Welcome at LBNL plans to build a one-sided comm version Will use overlapped/asynchronous communication on GASNet May use dynamic load balancing (remote task scheduling)

8 General Thoughts  This is just an initial set of suggestions  Choice influenced by existing collaborations at LBNL Please add your own ideas!  Tradeoffs on the particular choices Detailed performance info would be very useful in all of these The Madcap application is the most complete/real application Data for performance comparisons exist for Madcap and HPL UPC HPL is complete, but performance sensitivity to thread scheduler needs evaluation: seems to be an issue so far Too much BLAS3 with Madcap and HPL AMR is more challenging from performance standpoint, but it doesn’t yet exist


Download ppt "Applications for K42 Initial Brainstorming Paul Hargrove and Kathy Yelick with input from Lenny Oliker, Parry Husbands and Mike Welcome."

Similar presentations


Ads by Google