Presentation is loading. Please wait.

Presentation is loading. Please wait.

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Scaling of the Community Atmospheric Model to ultrahigh resolution Michael F. Wehner Lawrence.

Similar presentations


Presentation on theme: "C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Scaling of the Community Atmospheric Model to ultrahigh resolution Michael F. Wehner Lawrence."— Presentation transcript:

1 C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Scaling of the Community Atmospheric Model to ultrahigh resolution Michael F. Wehner Lawrence Berkeley National Laboratory with Pat Worley (ORNL), Art Mirin (LLNL) Lenny Oliker (LBNL), John Shalf (LBNL)

2 C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Motivations  First meeting of the WCRP Modeling Panel (WMP)  Convened at the UK MetOffice October, 2005 by Shukla  Discussion focused on benefits and costs of climate and weather models approaching 1km in horizontal resolution  Eventual white paper by Shukla and Shapiro for the WMO JSC  “Counting the Clouds”, A presentation by Dave Randall (CSU) to DOE SciDAC (June 2005)  Dave presents a compelling argument for global atmospheric models that resolve cloud systems rather than parameterize them.  Presentation is on the web at

3 C O M P U T A T I O N A L R E S E A R C H D I V I S I O N fvCAM  NCAR Community Atmospheric Model version 3.1  Finite Volume hydrostatic dynamics (Lin-Rood)  Parameterized physics is the same as the spectral version  Our previous studies focus on the performance of the fvCAM with a 0.5 o X0.625 o X28L mesh on a wide variety of platforms (See Pat Worley’s talk this afternoon)  In the present discussion, we consider the scaling behavior of this model over a range of existing mesh configurations and extrapolate to ultra-high horizontal resolution.

4 C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Operations count  Exploit three existing horizontal resolutions to establish the scaling behavior of the number of operations per fixed simulation period.  Existing resolutions (all 28 vertical levels)  “B” 2 o X2.5 o  “C” 1 o X1.25 o  “D” 0.5 o x0.625 o  Define:  m = # of longitudes, n = # of latitudes

5 C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Operations Count (Scaling)  Parameterized physics  Time step can remain constant Ops = m * n  Dynamics  Time step determined by the Courant condition Ops = m * n * n  Filtering  Allows violation of an overly restrictive Courant condition near the poles Ops = m * log(m) * n * n

6 C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Operations Count (Physics)

7 C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Operations Count (dynamics)

8 C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Operations Count (Filters)

9 C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Sustained computation rate requirements A reasonable metric in climate modeling is that the model must run 1000 times faster than real time. Millenium scale control runs complete in a year. Century scale transient runs complete in a month.

10 C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Can this code scale to these speeds?  Domain decomposition strategies  Np = number of subdomains, Ng = number of grid points  Existing strategy is 1D in the horizontal  A better strategy is 2D in the horizontal  Note: fvCAM also uses a vertical decomposition as well as OpenMP parallelism to increase utilization of processors.

11 C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Processor scaling  The performance data from fvCAM fits the first model well but tells us little about future technologies.  A practical constraint is that the number of subdomains is limited to be less than or equal to the number of horizontal cells.  At three cells across per subdomain, complete communication of the model’s data is required.  This constraint can provide an estimate of the maximum number of subdomains (  processors) as well as the minimum processor performance required to achieve the 1000X real time metric (in the absence of communication costs).

12 C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Maximum number of horizontal subdomains ,123,366

13 C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Minimum processor speed to achieve 1000X real time Assume no vertical decomposition and no OpenMP

14 C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Total memory requirements

15 C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Memory scales slower than processor speed due to Courant condition.

16 C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Strawman 1km climate computer  “I” mesh at 1000X real time .015 o X.02 o X100L  ~10 Petaflops sustained  ~100 Terabytes total memory  ~2 million horizontal subdomains  ~10 vertical domains  ~20 million processors at 500Mflops each sustained including communications costs. 5 MB memory per processor ~20,000 nearest neighbor send-receive pairs per subdomain per simulated hour of ~10KB each

17 C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Conclusions  fvCAM could probably be scaled up to a 1.5km mesh  Dynamics would have to be changed to fully non-hydrostatic  The scaling of the operations count is superlinear with horizontal resolution because of the Courant condition.  Surprisingly, filtering does not dominate the calculation. Physics cost is negligible.  One dimensional horizontal domain decomposition strategy will likely not work. Limits on processor number and performance are too severe.  Two dimensional horizontal domain decomposition strategy would be favorable but requires a code rewrite.  Its not as crazy as it sounds.


Download ppt "C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Scaling of the Community Atmospheric Model to ultrahigh resolution Michael F. Wehner Lawrence."

Similar presentations


Ads by Google