Type Systems For Distributed Data Structures

Type Systems For Distributed Data Structures
Mention Titanium, right off the top. Ben Liblit & Alexander Aiken University of California, Berkeley

Underlying Memory Model
Multiple machines, each with local memory Global memory is union of local memories Distinguish two types of pointers: Local points to local memory only Global points anywhere: machine, address Different representations & operations There are many different definitions for “distributed” computing. Here’s what we mean by that term. Intentionally vague about what a “machine” really is: could be one workstation in a networked cluster; could be one CPU in a Cray T3E supercomputer. This application domain (high performance scientific computing) requires that we exploit every static property we can to minimize dynamic decisions/dispatches.

Language Design Options
Make everything global? Conservatively sound Easy to use Hides program structure Needlessly slow Needlessly slow because in practice, a global pointer to local memory is slower than an equivalent local pointer. Maintenance problems become most apparent when dealing with legacy code. Titanium incorporates 16,000 lines of Java source code from Sun.

Language Design Options
Expose local/global to programmer? Explicit cost model Faster execution Naïve designs are unsound (as we will show) Code becomes difficult to write and maintain Conversion of sequential code is problematic By “unsound”, we mean that it is very easy to end up with local pointers from someone else’s address space. This is even more of a debugging nightmare than in sequential uniprocessor C code. Maintenance problems become most apparent when dealing with legacy code. Titanium incorporates 16,000 lines of Java source code from Sun.

A (Possibly) Gratuitous Global A (Potentially) Unsound Local
Vertical dashed line represents boundary between local address spaces. Green angled arrows represent global pointers. Note that they can cross local address spaces to point to remote data. Blue rounded arrows represent local pointers. Note that they cannot cross local address spaces. Global is only gratuitous if it never points to remote data. Remember that we are dealing with languages that have destructive assignment. Local is unsound if right-side machine can see left-side’s local pointer. 7 8 5

Understand It First, Then Fix It
Study local/global in a simpler context Design a sound type system for a tiny language Move from type checking to type inference Programmers see as much detail as they want Apply findings to design of real languages Type system detects & forbids “bad things” Local qualification inference as optimization

Type Grammar Boxed and unboxed values Integers, pointers, and pairs
Coercion, but no subsumption: local  global Just like int  float Pairs are not automatically boxed. Boxed and unboxed values Integers, pointers, and pairs Pairs are not assumed boxed References to boxes are either local or global

Global Dereferencing: Standard Approach Unsound
Although this seems simple, Split-C gets it wrong! x = 5 x =

Global Dereferencing: Sound With Type Expansion
5 x =

Global Dereferencing: Tuple Expansion
Type expansion for tuple components? No: would change representation of tuple 8 x = 8 5 x =

Global Dereferencing: Tuple Expansion
Solution: Invalidate local pointers in tuples Other components remain valid, usable This is a design decision, rather than something strictly compelled by underlying system. Other languages might opt for a different approach. We use this approach because it keeps named structures (e.g. C structures, Java classes) sane. 8 x = 8 5 x =

Global Tuple Selection
Starting at x, can we reach 5? Yes, with a proper selection operator Grey box highlights referent of pointer. 8 x = 5

Selection offsets pointer within tuple 8 x = @2 x = 5

Selection offsets pointer within tuple Global-to-local pointer works just as before 8 x = @2 x = 5 x =

Extended Type Grammar Allow subtyping on validity qualifiers
Subtyping extends down into compound tuples, but not across pointers. Why? So that assignment works, which leads us to the next slide… Allow subtyping on validity qualifiers

Global Assignment x = 3 5 y =

Type Qualifier Inference
Efficiently infer qualifiers in two passes: Maximize number of “invalid” qualifiers Maximize number of “local” qualifiers Allows for a range of language designs Complete inference Allow explicit declarations as needed On large codes, does better than humans! Our type system has gotten rather complex. Inference to the rescue! Inference is based on simple “a = b” and “a  b” term constraints. Assume that we already have a complete (but unqualified) static typing.

Titanium Implementation
Titanium = Java + SPMD parallelism Focus is on scientific codes Global is assumed; local is explicit E.g., “Object local” or “double [] local [] local” Local qualification inference in compiler Conservative for valid/invalid qualifiers Monomorphic “SPMD” is Single Program, Multiple Data. Java always boxes objects, so every object declaration is a potential site for a “local” qualifier. Why no “local” qualifier on “double”? Because Java boxes arrays, but does not box doubles. How do we handle casts, arrays, inheritance, native methods, …? See paper. Polymorphism would require code duplication. How much, and would it we worth it? Unknown.

Titanium Benchmarks: Speed
LU 422 lines; LU factorization for dense matrixes. Cannon 192 lines; Cannon’s algorithm for dense matrix multiplication. Sample 321 lines; a distributed sorting algorithm. GSRB 1,090 lines; Gauss-Seidel Red Black algorithm for solving elliptic partial differential equations. PPS 3,775 lines; parallel solver for the Poisson equation with infinite domain boundary conditions. Manual+LQI is better than auto+LQI, because programmers can exploit deeper properties of program that static analysis cannot reveal. This confirms our hypothesis that the best design combines selective manual annotation with aggressive, sound inference.

Titanium Benchmarks: Code Size

Summary and Conclusions
For top performance, local/global must be dealt with Soundness issues are subtle, but tractable Analysis core is surprisingly simple Type qualifier inference is a double win: Programming is easier Optimized code is faster, smaller Ease-of-use and performance are often seen as conflicting goals. Here is an example where offloading work into the compiler wins on both fronts.

Type Systems For Distributed Data Structures

Similar presentations

Presentation on theme: "Type Systems For Distributed Data Structures"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Type Systems For Distributed Data Structures

Similar presentations

Presentation on theme: "Type Systems For Distributed Data Structures"— Presentation transcript:

Similar presentations

About project

Feedback