Presentation is loading. Please wait.

Presentation is loading. Please wait.

Gtb 1 Titanium Titanium: Language and Compiler Support for Scientific Computing Gregory T. Balls University of California - Berkeley Alex Aiken, Dan Bonachea,

Similar presentations


Presentation on theme: "Gtb 1 Titanium Titanium: Language and Compiler Support for Scientific Computing Gregory T. Balls University of California - Berkeley Alex Aiken, Dan Bonachea,"— Presentation transcript:

1 gtb 1 Titanium Titanium: Language and Compiler Support for Scientific Computing Gregory T. Balls University of California - Berkeley Alex Aiken, Dan Bonachea, Phillip Colella, David Gay, Susan Graham, Paul Hilfinger, Arvind Krishnamurthy, Ben Liblit, Chang Sun Lin, Peter McCorquodale, Carleton Miyamoto, Geoff Pike, Kar Ming Tang, Siu Man Yau, Katherine Yelick

2 gtb 2 Titanium Target Problems Many modeling problems in astrophysics, biology, material science, and other areas require –Enormous range of spatial and temporal scales To solve interesting problems, one needs: –Adaptive methods –Large scale parallel machines Titanium is designed for methods with –Stuctured grids –Locally-structured grids (AMR)

3 gtb 3 Titanium Common Requirements Algorithms for numerical PDE computations are (compared to linear algebra) –communication intensive –memory intensive AMR makes these harder –more small messages –more complex data structures –most of the programming effort is debugging the boundary cases –locality and load balance trade-off is hard

4 gtb 4 Titanium Titanium for Scientific Computing The Language –Java dialect compiled to C –Extensions for serial programming –Extensions for parallel programming The Compiler –Uniprocessor optimizations –Parallel optimizations –Available architectures The Results

5 gtb 5 Titanium Java for Scientific Computing Computational scientists work on increasingly complex models –Popularized C++ features: classes, overloading, pointer-based data structures But C++ is very complicated –easy to lose performance and readability Java is a better C++ –Safe: strongly typed, garbage collected –Much simpler to implement (research vehicle) –Industrial interest as well: IBM HP Java

6 gtb 6 Titanium Data Types Primitive scalar types: boolean, double, int, etc. –implementations store these in place –access is fast -- comparable to other languages Objects: user-defined and library –passed by pointer value –has level of indirection (pointer to) implicit –simple model, but inefficient for small objects Fast Objects (immutable classes) –similar to structs in C

7 gtb 7 Titanium Titanium Object Example immutable class Complex { private double real; private double imag; public Complex(double r, double i) { real = r; imag = i; } public Complex operator+(Complex c) { return new Complex(c.real + real, c.imag + imag); } public double getReal {return real;} public double getImag {return imag;} } Complex c = new Complex(7.1, 4.3); c = c + c;

8 gtb 8 Titanium Arrays in Java Arrays in Java are objects Only 1D arrays are directly supported Multidimensional arrays are slow 2d array Subarrays are important in AMR (e.g., interior of a grid) –Even C and C++ don’t support these well –Hand-coding (array libraries) can confuse optimizer

9 gtb 9 Titanium Multidimensional Arrays in Titanium New multidimensional array added to Java –One array may be a subarray of another »e.g., a is interior of b, or a is all even elements of b –Indexed by Points (tuples of ints) –Constructed over a set of Points, called Rectangular Domains (RectDomains) –Points, Domains and RectDomains are built-in immutable classes Support for AMR and other grid computations –domain operations: intersection, shrink, border

10 gtb 10 Titanium Unordered iteration Memory hierarchy optimizations are essential Compilers can sometimes do these, but hard in general Titanium adds unordered iteration on rectangular domains foreach (p in r) {... } –p is a Point –r is a RectDomain or Domain Foreach simplifies bounds checking as well Additional operations on domains and arrays to subset and transform

11 gtb 11 Titanium Titanium for Scientific Computing The Language –Java dialect compiled to C –Extensions for serial programming –Extensions for parallel programming The Compiler –Uniprocessor optimizations –Parallel optimizations –Available architectures The Results

12 gtb 12 Titanium SPMD Model All processors start together and execute same code, but not in lock-step Basic control done using –Ti.numProcs() total number of processors –Ti.thisProc() number of executing processor Bulk-synchronous style read all particles and compute forces on mine Ti.barrier(); write to my particles using new forces Ti.barrier(); This is neither message passing nor data-parallel

13 gtb 13 Titanium Global Address Space References (pointers) may be remote –useful in building adaptive meshes –easy to port shared-memory programs –uniform programming model across machines Global pointers are more expensive than local –True even when data is on the same processor »space (processor number + memory address) »dereference time (check to see if local) –Use local declarations in critical sections

14 gtb 14 Titanium Example: A Distributed Data Structure Proc 0 Proc 1 local_grids Data can be accessed across processor boundaries all_grids

15 gtb 15 Titanium Example: Setting Boundary Conditions foreach (l in local_grids.domain()) { foreach (a in all_grids.domain()) { local_grids[l].copy(all_grids[a]); }

16 gtb 16 Titanium Titanium for Scientific Computing The Language –Java dialect compiled to C –Extensions for serial programming –Extensions for parallel programming The Compiler –Uniprocessor optimizations –Communication optimizations –Available architectures The Results

17 gtb 17 Titanium Sequential Optimizations Current optimizations –foreach loops »within 20% of FORTRAN on many loop-intensive codes Optimizations in development –Cache blocking –Inlining

18 gtb 18 Titanium Parallel Optimizations Titanium compiler performs parallel optimizations –communication overlap and aggregation –fast parallel bulk I/O New analyses: –synchronization analysis: the parallel analog to control flow analysis for serial code [Gay & Aiken] –shared variable analysis: the parallel analog to dependence analysis [Krishnamurthy & Yelick] –local qualification inference: automatically inserts local qualifiers [Liblit & Aiken]

19 gtb 19 Titanium Architectures Titanium runs on many platforms –SP machines, T3Es, Networks of Workstations Titanium on Blue Horizon specifics –Uses LAPI (not MPI) –Allows user to specify threads (procs) per node –Performs conservative distributed garbage collection

20 gtb 20 Titanium Titanium for Scientific Computing The Language –Java dialect compiled to C –Extensions for serial programming –Extensions for parallel programming The Compiler –Uniprocessor optimizations –Communication optimizations –Available architectures The Results

21 gtb 21 Titanium AMR Gas Dynamics Hyperbolic Solver [McCorquodale & Colella] –Implementation of Berger-Colella algorithm –Mesh generation algorithm included 2D Example (3D supported) –Mach-10 shock on solid surface at oblique angle

22 gtb 22 Titanium FD-MLC for Poisson Problem Finite Difference based Method of Local Corrections [Balls & Colella] Example run on 16 processors –1 large high- wavenumber charge –2 smaller star-shaped charges -6.47x10 -9 0 1.31x10 -9

23 gtb 23 Titanium Parallel Performance Speedup on Ultrasparc SMP EM3D is small kernel –relaxation on unstructured mesh –shows high parallel efficiency of Titanium system AMR speedup limited by –small fixed mesh –2-levels, 9 patches

24 gtb 24 Titanium FD-MLC Parallel Performance Communication requirement is low (< 5%) Scaled speedup experiments are nearly ideal (flat) IBM SP2 at SDSCCray T3E at NERSC

25 gtb 25 Titanium Future Work Titanium language and compiler developments –Templates –Further optimization of serial performance Algorithm Development in Titanium –Self-gravitating gas dynamics –Immersed boundary methods Comparison to library approach –Performance –Code size and readability

26 gtb 26 Titanium Summary Language support –Arrays, Immutable, Overloading, … Compiler optimizations –Uniprocessor optimizations –Parallel analyses Architectures –Ported to several different platforms Results –Several algorithms implemented –Good parallel performance


Download ppt "Gtb 1 Titanium Titanium: Language and Compiler Support for Scientific Computing Gregory T. Balls University of California - Berkeley Alex Aiken, Dan Bonachea,"

Similar presentations


Ads by Google