Presentation is loading. Please wait.

Presentation is loading. Please wait.

Wide-Area Parallel Computing in Java Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences vrije Universiteit.

Similar presentations


Presentation on theme: "Wide-Area Parallel Computing in Java Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences vrije Universiteit."— Presentation transcript:

1 Wide-Area Parallel Computing in Java Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences vrije Universiteit

2 2 Introduction Distributed supercomputing -Parallel applications on geographically distributed computing system (computational grid) -Examples: SETI@home, RSA-155 Programming support -Language-neutral systems: Legion, Globus -Language-centric: Java Goal: study wide-area parallel computing in Java -Programming model: Remote Method Invocation

3 3 Outline Wide-area parallel computing Java Remote Method Invocation (RMI) Performance of JDK RMI The Manta high-performance Java system Wide-area parallel Java applications using RMI Application performance

4 4 Wide-area parallel computing Challenge -Tolerating poor latency and bandwidth of WANs Basic assumption: wide-area system is hierarchical -Connect clusters, not individual workstations -Most links are fast General approach -Optimize applications to exploit hierarchical structure  most communication is local

5 5 Distributed ASCI Supercomputer VU (128)UvA (24) Leiden (24)Delft (24) 6 Mb/s ATM Node configuration 200 MHz Pentium Pro 64-128 MB memory 2.5 GB local disks Myrinet LAN Redhat Linux 2.0.36

6 6 Java Growing interest in Java for parallel applications -Java Grande forum Parallel programming support in Java -Shared memory : multithreading -Distributed memory: Remote Method Invocation Study suitability of Java RMI for (wide-area) parallel programming -Optimizing performance of local RMI [PPoPP’99] -Wide-area parallel programming using RMI [JavaGrande’99]

7 7 RMI (1) Flexible object-oriented RPC-like primitive -Easy interoperability between Java Virtual Machines -Polymorphism  dynamic bytecode loading void species(Animal x) throws … { System.out.println(“Species “ + x.name()); } o.species(new Orca());  “Species orca” o.species(new Panda());  “Species panda” o.species(new Manta());  “Species manta” Animal Orca Panda Manta

8 8 RMI (2) Designed for client-server applications Automatic serialization (marshalling) Normally used in a high latency environment -E.g. Internet Is RMI fast enough for parallel programming ?

9 9 JDK RMI Performance ( 200 MHz Pentium Pro, JDK 1.1.4 )

10 10 Why is JDK RMI slow ? Serialization uses run-time type inspection Protocol overhead (class information) Thread creation for incoming calls TCP/IP Most code is written in Java

11 11 The Manta system Designed for high-performance computing Native (static) compilation -Source  executable New fast RMI protocol between Manta nodes Support (polymorphic) RMIs with JVMs Implemented on wide-area DAS system

12 12 JDK versus Manta 200 MHz Pentium Pro, Myrinet, JDK 1.1.4 interpreter, 1 object as parameter

13 13 Manta serialization class Test implements Serializable { int i; double d; Object o; } MantaJDK void PackageClass__Test(…) { WRITE_INT( type_id ); WRITE_INT( i ); WRITE_DOUBLE( d ); WRITE_OBJECT( o ); } Java Source

14 14 RMI protocol Light-weight RMI protocol -Send minimal type information Avoid thread creation -Simple nonblocking methods executed directly Avoid interrupts -Poll network when processor is idle Everything is written in C

15 15 Communication software Panda user space RPC protocol LFC Myrinet control program -Similar to active messages -Implemented partly on Myrinet network interfaces -Myrinet network interfaces mapped in user space Manta RMI Panda RPC LFCUDP EthernetMyrinet TCP ATM

16 16 Interoperability with JVMs Manta RMI protocol incompatible with JDK -Use fast RMI between Manta nodes -Use JDK-compliant protocol with JVMs Polymorphic RMI requires exchanging bytecodes -Also generate bytecodes when compiling a program -Dynamically compile and link bytecodes into running program

17 17 Null-RMI latency

18 18 RMI Throughput

19 19 Outline Wide-area parallel computing Java Remote Method Invocation (RMI) Performance of JDK RMI The Manta high-performance Java system Wide-area parallel Java applications using RMI Application performance

20 20 2 orders of magnitude between intra-cluster (LAN) and inter-cluster (WAN) communication performance Manta exposes hierarchical structure to application -Applications are optimized to reduce WAN-overhead Manta on wide-area DAS

21 21 Wide-area programming Problem: how to tolerate difference between LAN and WAN performance Wide-area system is structured hierarchically -Most links are fast Approach: application-level optimizations that exploit the hierarchical structure -Reduce wide-area communication

22 22 Application experience Parallel applications -Successive overrelaxation (SOR) -All-pairs shortest paths problem (ASP) -Traveling salesperson problem (TSP) -Iterative Deepening A* (IDA*) Measurements on wide-area DAS -1-4 clusters with 16 nodes -Comparison with single 64-node cluster

23 23 Successive Overrelaxation Red/black SOR -Neighbor communication, using RMI Problem: nodes at cluster-boundaries -Overlap wide-area communication with computation -RMI is synchronous  use multithreading Cluster 1Cluster 2 CPU 3CPU 2CPU 1CPU 6CPU 5CPU 4 40 5600 µsec µs

24 24 All-pairs shortest paths Broadcast at beginning of each iteration Problem: broadcasting over wide-area links -Lack of broadcast in Java -> use spanning tree -Use coordinator node per cluster -Do asynchronous send to all remote coordinators -Implemented using threads Cluster123

25 25 Traveling salesperson problem Replicated-worker style parallel search algorithm Problem: work distribution -Central job-queue has high overhead -Statically distribute jobs over clusters -Use centralized job-queue per cluster -Easy to express using RMI 1 2 3

26 26 Iterative Deepening A* Parallel search algorithm using work stealing Problem: inter-cluster work stealing Optimization: first look for work in local cluster -Easy to express using RMI Cluster12

27 27 Performance Wide-area DAS system: 4 clusters of 16 CPUs Comparison with single 16-node and 64-node cluster

28 28 Fast RMI possible through -Compiler-generated serialization, light-weight communication & RMI protocols Optimized wide-area applications are efficient -Reduce wide-area communication, or hide its latency Java RMI is easy to use, but some optimizations are awkward to express -No asynchronous communication, collective comm. Programming systems should take hierarchical structure of wide-area systems into account Conclusions http://www.cs.vu.nl/manta

29 29 Performance breakdown Manta ( Fast Ethernet )


Download ppt "Wide-Area Parallel Computing in Java Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences vrije Universiteit."

Similar presentations


Ads by Google