Presentation is loading. Please wait.

Presentation is loading. Please wait.

Extreme Scalability Working Group (XS-WG): Status Update Nick Nystrom Director, Strategic Applications Pittsburgh Supercomputing Center October 21, 2010.

Similar presentations


Presentation on theme: "Extreme Scalability Working Group (XS-WG): Status Update Nick Nystrom Director, Strategic Applications Pittsburgh Supercomputing Center October 21, 2010."— Presentation transcript:

1 Extreme Scalability Working Group (XS-WG): Status Update Nick Nystrom Director, Strategic Applications Pittsburgh Supercomputing Center October 21, 2010

2 XS-WG Update | Nystrom | October 21, 2010 2 Extreme Scalability Working Group (XS-WG): Purpose Meet the challenges and opportunities of deploying extreme-scale resources into the TeraGrid, maximizing both scientific output and user productivity. –Aggregate, develop, and share wisdom –Identify and address needs that are common to multiple sites and projects –May require assembling teams and obtaining support for sustained effort XS-WG benefits from active involvement of all Track 2 sites, BlueWaters, tool developers, and users. The XS-WG leverages and combines RPs’ interests to deliver greater value to the computational science community.

3 XS-WG Update | Nystrom | October 21, 2010 3 XS-WG Participants Nick NystromPSC, XS-WG lead Jay AlamedaNCSA Martin BerzinsUniv. of Utah (U) Paul BrownIU Lonnie CrosbyNICS, IO/Workflows lead Tim DudekGIG EOT Victor EijkhoutTACC Jeff GardnerU. Washington (U) Chris HempelTACC Ken JansenRPI (U) Shantenu JhaLONI Nick KaronisNIU (G) Dan KatzU. of Chicago Ricky KendallORNL Byoung-Do KimTACC Scott LathropGIG, EOT AD Vickie LynchORNL Amit MajumdarSDSC, TG AUS AD Mahin MahmoodiPSC, Tools lead Allen MalonyUniv. of Oregon (P) David O’NealPSC Dmitry PekurovskySDSC Wayne PfeifferSDSC Raghu ReddyPSC, Scalability lead Sergiu SanieleviciPSC Sameer ShendeUniv. of Oregon (P) Ray SheppardIU Alan SnavelySDSC Henry TufoNCAR George TurnerIU John UrbanicPSC Joel WellingPSC Nick WrightNERSC (P) S. Levent YilmazCSM, U. Pittsburgh (P) U: user; P: performance tool developer; G: grid infrastructure developer; *: joined XS-WG since last TG-ARCH update

4 XS-WG Update | Nystrom | October 21, 2010 4 Technical Challenge Area #1: Scalability and Architecture Algorithms, numerical methods, multicore performance, etc. –Robust, scalable infrastructure (libraries, frameworks, languages) for supporting applications that scale to O (10 4–6 ) cores –Numerical stability and convergence issues that emerge at scale –Exploiting systems’ architectural strengths –Fault tolerance and resilience Contributors –POC: Raghu Reddy (PSC) Recent and ongoing activities: hybrid performance –TG10 paper: Pablo Mininni, Duane Rosenberg, Raghu Reddy, and Annick Pouquet: “Investigation of Performance of a Hybrid MPI-OpenMP Turbulence Code” –Synergy with AUS; work by Wayne Pfeiffer and Dmitry Pekurovsky –Ongoing task: document and disseminate work done by Reddy, Pfeiffer, Pekurovsky, Jana, and Koesterke to explore hybrid performance tradeoffs Recent and ongoing activities: DCL architectures –Multiple RPs (NICS, PSC, SDSC, TACC) are quite busy standing up and preparing new DCL systems. XS-WG is beginning to address these.

5 XS-WG Update | Nystrom | October 21, 2010 5 PSC’s Blacklight (SGI Altix ® UV 1000) The World’s Largest Hardware-Coherent Shared Memory System 2×16 TB of cache-coherent shared memory –hardware coherency unit: 1 cache line (64B) –16 TB exploits the processor’s full 44-bit physical address space –ideal for fine-grained shared memory applications, e.g. graph algorithms, sparse matrices 32 TB addressable with PGAS languages (e.g. SGI UPC) –low latency, high injection rate supports one-sided messaging –also ideal for fine-grained shared memory applications NUMAlink ® 5 interconnect –fat tree topology spanning full UV system; low latency, high bisection bandwidth –hardware acceleration for PGAS, MPI, gather/scatter, remote atomic memory operations, etc. Intel Nehalem-EX processors: 4096 cores (2048 cores per SSI) –8-cores per socket, 2 hardware threads per core, 4 flops/clock, 24MB L3, Turbo Boost, QPI –4 memory channels per socket  strong memory bandwidth –x86 instruction set with SSE 4.2  excellent portability and ease of use SUSE Linux operating system –supports OpenMP, p-threads, MPI, PGAS models  high programmer productivity –supports a huge number of ISV applications  high end user productivity

6 XS-WG Update | Nystrom | October 21, 2010 6 Technical Challenge Area #1: Scalability and Architecture // Graph-500 Motivation: Create impartial benchmarks that will rank hardware/software technologies best suited to run graph-based algorithms and applications –low computational intensity, poor locality, high computational complexity, spatiotemporal load imbalances, exponential growth of intermediate results –latency is critical; many algorithmic and scaling challenges –one aspect of data-intensive computing Reference implementations now available –sequential, OpenMP, XMT, MPI)\ –graph generator; problem sizes range from 10 10 –10 15 bytes SC10 BoF: Unveiling the First Graph 500 List –Wednesday, November 17, 2010, 5:30-7:30pm; room 394 –http://sc10.supercomputing.org/schedule/event_detail.php?evid=bof170http://sc10.supercomputing.org/schedule/event_detail.php?evid=bof170 http://www.graph500.org/ Implementing and tuning now on Blacklight…

7 XS-WG Update | Nystrom | October 21, 2010 7 Technical Challenge Area #2: Tools Performance tools, debuggers, compilers, etc. –Evaluate strengths and interactions; ensure adequate installations –Analyze/address gaps in programming environment infrastructure –Provide advanced guidance to RP consultants Contributors POC: Mahin Mahmoodi (PSC) Recent and ongoing activities: reliable tool installations Ongoing application of performance tools at scale to complex applications to ensure their correct functionality; identify & remove problems Sameer and Mahin looking into the potential of using very large memory to enable performance analysis of extreme-scale runs

8 XS-WG Update | Nystrom | October 21, 2010 8 Performance Profiling of Million-core Runs Sameer Shende (ParaTools and University of Oregon) Metadata Information about 1million core profile datasets, TAUParaProf Manager Window. Execution Time Breakdown of LS3DF subroutines over all MPI ranks. LS3DF Routines Profiling Data on rank 1,048,575. Histogram of MPI_Barrier, distribution of the routine calls over the execution time. ~500 GB of shared memory successfully applied to the visual analysis of very large scale performance profiles, using TAU. Profile data: synthetic million-core dataset assembled from 32k-core LS3DF runs on ANL’s BG/P. PSC Blacklight: EARLY illumination

9 XS-WG Update | Nystrom | October 21, 2010 9 Technical Challenge Area #3: Workflow, data transport, analysis, visualization, and storage Coordinating massive simulations, analysis, and visualization –Data movement between RPs involved in complex simulation workflows; staging data from HSM systems across the TeraGrid –Technologies and techniques for in situ visualization and analysis Contributors –POC: Lonnie Crosby (NICS) Current activities –identifying new areas on which to focus

10 XS-WG Update | Nystrom | October 21, 2010 10 Questions?


Download ppt "Extreme Scalability Working Group (XS-WG): Status Update Nick Nystrom Director, Strategic Applications Pittsburgh Supercomputing Center October 21, 2010."

Similar presentations


Ads by Google