Presentation is loading. Please wait.

Presentation is loading. Please wait.

A High-Performance Java Dialect Kathy Yelick, Luigi Semenzato, Geoff Pike, Carleton Miyamoto, Ben Liblit, Arvind Krishnamurthy, Paul Hilfinger, Susan Graham,

Similar presentations


Presentation on theme: "A High-Performance Java Dialect Kathy Yelick, Luigi Semenzato, Geoff Pike, Carleton Miyamoto, Ben Liblit, Arvind Krishnamurthy, Paul Hilfinger, Susan Graham,"— Presentation transcript:

1 A High-Performance Java Dialect Kathy Yelick, Luigi Semenzato, Geoff Pike, Carleton Miyamoto, Ben Liblit, Arvind Krishnamurthy, Paul Hilfinger, Susan Graham, David Gay, Phil Colella, and Alex Aiken Computer Science Division University of California at Berkeley and Lawrence Berkeley National Laboratory

2 What is Titanium? A practical language and system for high- performance parallel scientific computing –both shared and distributed-memory architectures –based on Java A platform for compiler and language experiments –parallel and cache optimizations –domain-specific language extensions Future directions for Java?

3 Practical Language Design Leverage existing culture –C-like languages –FORTRAN arrays Leverage existing design Small language, small compiler –no interpreter –compile into C No heroism –rely on well-understood techniques –treat advanced optimizations as a convenience rather than a necessity Java Titanium Other high- performance languages

4 Priorities Performance –consistently close to C/FORTRAN + MPI currently: 10%-80% slower aiming for 10%-20% Safety –as safe as Java ease of programming better optimizations Expressiveness –add small set of essential features Compatibility, interoperability, etc. –no gratuitous departures from Java standard

5 New Language Features Scalable parallelism –SPMD model of execution with global address space Multidimensional arrays –also: points and index sets as first-class values –multidimensional iterators Memory management –semi-automated zone-based allocation Other –Immutable classes –Operator overloading

6 Model of Parallelism Single Program, Multiple Data –fixed number of processes –each process has own local data –global synchronization (barrier) n processes... start barrier... end...

7 Global Synchronization Analysis In Titanium, processes must synchronize at the same textual instances of barrier() doThis(); barrier(); boolean x = someCondition(); if (x) { doThat(); barrier(); } doSomeMore(); barrier();

8 Global Synchronization Analysis In Titanium, processes must synchronize at the same textual instances of barrier() Singleness analysis statically guarantees correctness by restricting the values of variables that control program flow doThis(); barrier(); boolean single x = someCondition(); if (x) { doThat(); barrier(); } doSomeMore(); barrier();

9 Global Address Space Each process has its own heap References can span process boundaries Class T { … } T gv; T lv = null; if (thisProc() == 0) { lv = new T(); // allocate locally } gv = broadcast lv from 0; // distribute … gv.field... Process 0 Other processes lv gv lv gv lv gv lv gv lv gv lv gv LOCAL HEAP

10 Global vs. Local References Global references may be slow –distributed memory: overhead of a few instructions when using a global reference to access a local object –shared memory: no performance implications Solution: use local qualifier –statically restrict references to local objects –example: T local lv = null; –use only in critical sections

11 Arrays, Points, Domains Fast, expressive arrays –multidimensional –lower bound, upper bound, stride –concise indexing: A[p] instead of A(i, j, k) Points –tuple of integers as primitive type Domains –sets of points rectangular (bounds and stride) general (arbitrary set) Multidimensional iterators

12 Example: Point, RectDomain, Array Point lb = [1, 1]; Point ub = [10, 20]; RectDomain R = [lb : ub : [2, 2]]; double [2d] A = new double[R];// (no distributed arrays) … foreach (p in A.domain()) { A[p] = B[2 * p]; } Standard optimizations: strength reduction common subexpression elimination invariant code motion removing bounds checks from body

13 Example: Domain Point lb = [0, 0]; Point ub = [6, 4]; RectDomain R = [lb : ub : [2, 2]]; … Domain red = R + (R + [1, 1]); foreach (p in red) { … } (0, 0) (6, 4) R (1, 1) (7, 5) R + [1, 1] red (0, 0) (7, 5) Gauss-Seidel relaxation with red-black ordering

14 Memory Management Distributed GC –too unpredictable Zone-based memory management –extends existing model –good performance –safe –easy to use

15 Zone-Based Memory Management Zone Z1 = new Zone(); Z1 Zone Z2 = new Zone(); Z2 T x = new(Z1) T();x T y = new(Z2) T(); y x.field = y; x = y; delete Z1; delete Z2;// error Allocate objects in zones Release zones manually

16 Zone-Based Memory Management Zone Z1 = new Zone(); Z1 Zone Z2 = new Zone();Z2 C x = new(Z1) C();x C y = new(Z2) C(); y x.field = y; x = y; delete Z1; delete Z2;// error

17 Immutable Classes User-definable “primitive” type –same reason for primitive types in Java: performance No inheritance –does not inherit from Object –final –all (non-static) fields are final Example: complex numbers Used internally for Point

18 Other Features Operator overloading –useful to scientific programmers Parameterized types –will conform to standard

19 Implementation Strategy –compile Titanium into C (currently C++) –Posix threads for SMPs (currently Solaris threads) –Libsplit-c for communication Active Messages Status –runs on SUN Enterprise 8-way SMP –runs on Berkeley NOW –trivial ports to 1/2 dozen other architectures –tuning for sequential performance

20 Applications Three-D AMR Poisson Solver (AMR3D) –block-structured grids –2000 line program –algorithm not yet fully implemented in other languages –tests performance and effectiveness of language features Three-D Electromagnetic Waves (EM3D) –unstructured grids Several smaller benchmarks

21 Current Performance C/C++/ FORTRAN Java Arrays Titanium Arrays Overhead DAXPY 3D multigrid 2D multigrid EM3D 1.4s 12s 5.4s 0.7s1.8s1.0s42% 15% 83% 7% 6.2s 22s 1.5s6.8s Sequential performance 1248 EM3D AMR3D 1248 11.82.63.9 Parallel performance number of processors speedups

22 Conclusions Java is a good base language –easily extended –compilation reasonably simple High performance is possible –explicit parallelism –advanced array features –rely on simple, well-understood optimizations Essence of Java is preserved –small –safe

23 Sorry, I Clicked Too Far... there is nothing here

24 Incompatibilities Threads –no threads for the time being coexisting threads and processes are difficult to design Exceptions –run-time errors such as out-of-bound indexing halt the program instead of throwing an exception throwing exceptions prevents optimizations that reorder code


Download ppt "A High-Performance Java Dialect Kathy Yelick, Luigi Semenzato, Geoff Pike, Carleton Miyamoto, Ben Liblit, Arvind Krishnamurthy, Paul Hilfinger, Susan Graham,"

Similar presentations


Ads by Google