Presentation is loading. Please wait.

Presentation is loading. Please wait.

Petascale –LLNL Appro AMD: 9K processors [today] –TJ Watson Blue Gene/L: 40K processors [today] –NY Blue Gene/L: 32K processors –ORNL Cray XT3/4 : 44K.

Similar presentations


Presentation on theme: "Petascale –LLNL Appro AMD: 9K processors [today] –TJ Watson Blue Gene/L: 40K processors [today] –NY Blue Gene/L: 32K processors –ORNL Cray XT3/4 : 44K."— Presentation transcript:

1 Petascale –LLNL Appro AMD: 9K processors [today] –TJ Watson Blue Gene/L: 40K processors [today] –NY Blue Gene/L: 32K processors –ORNL Cray XT3/4 : 44K processors [Jan 2008] –TACC Sun : 55K processors [Jan 2008] –ANL Blue Gene/P : 160K processors [Jan 2008]

2 CCSM and Component Models –POP (Ocean) –CICE (Sea Ice) –CLM (Land Model) –CPL (Coupler) –CAM (Atmosphere) –CCSM

3 Status of POP (John Dennis) –17K Cray XT4 processors [12.5 years/day] –29K IBM Blue Gene/L [8.5 years/day] (BG Ready in Expedition Mode) Parallel I/O [Underway] Land causes load imbalance at 0.1 degree resolutions

4 Status of CAM (John Dennis) –CAM HOMME In Expedition Mode – Standard CAM “may be” run at 1 degree resolution or slightly higher on BG

5 Simulation rate for HOMME: Held-Suarez 1/2  1/3  1/4 

6 CAM & CCSM BG/L Expedition not from climate scientists Parallel I/O is the biggest bottleneck

7 Cloud Resolving Models/LES  Active Tracer High-resolution Atmospheric Model (ATHAM):  modularized  parallel-ready (MPI)  Goddard Cloud Ensemble Model (GCE):  well-established ( 70s- present)  parallel-ready (MPI)  scales linearly (99% up to 256 tasks)  comprehensive

8 Implementations  Been done (NERSC IBM SP, GFSC):  ATHAM: 2D & 3D bulk cloud physics  GCE: 3D bulk cloud physics 2D size-bins cloud physics  Being & to be done (Blue Gene):  GCE(ATHAM): 3D size-bins cloud physics larger domain longer simulation period finer resolution …

9 From: John Michalakes, NCAR

10 Model domains are decomposed for parallelism on two-levels Patch: section of model domain allocated to a distributed memory node Tile: section of a patch allocated to a shared-memory processor within a node; this is also the scope of a model layer subroutine. Distributed memory parallelism is over patches; shared memory parallelism is over tiles within patches Slide Courtesy: NCAR  Single version of code for efficient execution on: –Distributed-memory –Shared-memory –Clusters of SMPs –Vector and microprocessors Parallelism in WRF: Multi-level Decomposition Logical domain 1 Patch, divided into multiple tiles Inter-processor communication

11

12 NCAR WRF Issues With Bluegene/L (from John Michalakes)   Relatively slow I/O   Limited memory per node   Relatively poor processor performance   “Lots of of little gotchas mostly related to immaturity, especially in the programming environment.”


Download ppt "Petascale –LLNL Appro AMD: 9K processors [today] –TJ Watson Blue Gene/L: 40K processors [today] –NY Blue Gene/L: 32K processors –ORNL Cray XT3/4 : 44K."

Similar presentations


Ads by Google