Presentation on theme: "C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Scaling of the Community Atmospheric Model to ultrahigh resolution Michael F. Wehner Lawrence."— Presentation transcript:
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Scaling of the Community Atmospheric Model to ultrahigh resolution Michael F. Wehner Lawrence Berkeley National Laboratory with Pat Worley (ORNL), Art Mirin (LLNL) Lenny Oliker (LBNL), John Shalf (LBNL)
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Motivations First meeting of the WCRP Modeling Panel (WMP) Convened at the UK MetOffice October, 2005 by Shukla Discussion focused on benefits and costs of climate and weather models approaching 1km in horizontal resolution Eventual white paper by Shukla and Shapiro for the WMO JSC “Counting the Clouds”, A presentation by Dave Randall (CSU) to DOE SciDAC (June 2005) Dave presents a compelling argument for global atmospheric models that resolve cloud systems rather than parameterize them. Presentation is on the web at
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N fvCAM NCAR Community Atmospheric Model version 3.1 Finite Volume hydrostatic dynamics (Lin-Rood) Parameterized physics is the same as the spectral version Our previous studies focus on the performance of the fvCAM with a 0.5 o X0.625 o X28L mesh on a wide variety of platforms (See Pat Worley’s talk this afternoon) In the present discussion, we consider the scaling behavior of this model over a range of existing mesh configurations and extrapolate to ultra-high horizontal resolution.
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Operations count Exploit three existing horizontal resolutions to establish the scaling behavior of the number of operations per fixed simulation period. Existing resolutions (all 28 vertical levels) “B” 2 o X2.5 o “C” 1 o X1.25 o “D” 0.5 o x0.625 o Define: m = # of longitudes, n = # of latitudes
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Operations Count (Scaling) Parameterized physics Time step can remain constant Ops = m * n Dynamics Time step determined by the Courant condition Ops = m * n * n Filtering Allows violation of an overly restrictive Courant condition near the poles Ops = m * log(m) * n * n
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Operations Count (Physics)
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Operations Count (dynamics)
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Operations Count (Filters)
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Sustained computation rate requirements A reasonable metric in climate modeling is that the model must run 1000 times faster than real time. Millenium scale control runs complete in a year. Century scale transient runs complete in a month.
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Can this code scale to these speeds? Domain decomposition strategies Np = number of subdomains, Ng = number of grid points Existing strategy is 1D in the horizontal A better strategy is 2D in the horizontal Note: fvCAM also uses a vertical decomposition as well as OpenMP parallelism to increase utilization of processors.
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Processor scaling The performance data from fvCAM fits the first model well but tells us little about future technologies. A practical constraint is that the number of subdomains is limited to be less than or equal to the number of horizontal cells. At three cells across per subdomain, complete communication of the model’s data is required. This constraint can provide an estimate of the maximum number of subdomains ( processors) as well as the minimum processor performance required to achieve the 1000X real time metric (in the absence of communication costs).
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Maximum number of horizontal subdomains ,123,366
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Minimum processor speed to achieve 1000X real time Assume no vertical decomposition and no OpenMP
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Total memory requirements
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Memory scales slower than processor speed due to Courant condition.
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Strawman 1km climate computer “I” mesh at 1000X real time .015 o X.02 o X100L ~10 Petaflops sustained ~100 Terabytes total memory ~2 million horizontal subdomains ~10 vertical domains ~20 million processors at 500Mflops each sustained including communications costs. 5 MB memory per processor ~20,000 nearest neighbor send-receive pairs per subdomain per simulated hour of ~10KB each
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Conclusions fvCAM could probably be scaled up to a 1.5km mesh Dynamics would have to be changed to fully non-hydrostatic The scaling of the operations count is superlinear with horizontal resolution because of the Courant condition. Surprisingly, filtering does not dominate the calculation. Physics cost is negligible. One dimensional horizontal domain decomposition strategy will likely not work. Limits on processor number and performance are too severe. Two dimensional horizontal domain decomposition strategy would be favorable but requires a code rewrite. Its not as crazy as it sounds.