Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed Components for Integrating Large- Scale High Performance Computing Applications Nanbor Wang, Roopa Pundaleeka and Johan Carlsson

Similar presentations


Presentation on theme: "Distributed Components for Integrating Large- Scale High Performance Computing Applications Nanbor Wang, Roopa Pundaleeka and Johan Carlsson"— Presentation transcript:

1 Distributed Components for Integrating Large- Scale High Performance Computing Applications Nanbor Wang, Roopa Pundaleeka and Johan Carlsson {nanbor,roopa,johan}@txcorp.com Tech-X Corporation Boulder, CO CCA Meeting, October 11, 2007 Funded by DOE OASCR SBIR Grant #DE-FG02-04ER84099

2 Distributed Components 2 Nanbor Wang, Roopa Pundaleeka and Johan Carlsson Outline Motivation Distributed and Parallel High-Performance Computing (DPHPC) Exploring Diverse Distributed Technologies –Distributed Proxy Components –New Transport Mechanism to Babel RMI Babel RMI Babel RMI over CORBA IIOP –Performance Comparisons Future Work

3 Distributed Components 3 Nanbor Wang, Roopa Pundaleeka and Johan Carlsson Motivations for Distributed and Parallel Component-Based Software Engineering Existing component standards and frameworks designed with enterprise applications in mind –No support for features that are important for HPC scientific applications: interoperability with scientific programming languages (FORTRAN) and parallel computing infrastructure (MPI) Need to address needs of HPC scientific applications: combustion modeling, global climate modeling, fusion plasma simulations Motivating scenarios for Distributed and Parallel HPC (DPHPC): –Integrate separately-developed and established codes –Provide a different paradigm for partitioning problems – multi-physics simulations –Provide ways to better utilize high-CPU number hardware –Combine computing resources of multiple clusters/computing centers –Enable parallel data streaming between computing task and post- processing task

4 Distributed Components 4 Nanbor Wang, Roopa Pundaleeka and Johan Carlsson Distributed Proxy CCA Components Connect distributed parallel components by composing remote-capable proxy components into applications Hide the distributed aspect from the localized parallel CCA framework Provide low-cost mechanisms for connecting incompatible CCA infrastructures, e.g., Ccafeine, Dune, Ccain, and SciRUN

5 Distributed Components 5 Nanbor Wang, Roopa Pundaleeka and Johan Carlsson BABEL RMI CLIENT BABEL RMI SERVER Babel RMI Interface Simple Protocol New Transport Mechanism for Babel RMI Babel generates mapping for remote invocations, and uses Simple Protocol Babel has the capability to allow users to take advantage of various remoting technologies through third party RMI libraries We are developing a CORBA protocol library for Babel RMI using TAO (version 1.5.1 or later) –TAO is the C++ based CORBA middleware framework –This protocol is essentially a bridge between Babel and TAO BABEL RMI CLIENT BABEL RMI SERVER Babel RMI Interface TAOIIOP TAO

6 Distributed Components 6 Nanbor Wang, Roopa Pundaleeka and Johan Carlsson Adding CORBA protocol for Babel RMI Goal –Utilize CORBA wire protocol for Babel RMI for communication between Babel clients and servants –Allow interoperability between existing CORBA and Babel objects (e.g., with SciRUN CORBA support) –Maintain performance of CORBA IIOP protocol Direct mapping approach –Requires support of certain Babel types; complex numbers, multidimensional arrays and exceptions –Exchange messages in CORBA format –Allow development of new SIDL-compatible CORBA objects

7 Distributed Components 7 Nanbor Wang, Roopa Pundaleeka and Johan Carlsson Client-side Operation Invocations CORBA uses Common Data Representation (CDR) – a binary serialization format, for transferring messages. Data packed directly No more CORBA – BABEL transformations for data

8 Distributed Components 8 Nanbor Wang, Roopa Pundaleeka and Johan Carlsson Server-side Request Handling A default TAO servant handles all Babel invocations Requests are dispatched to target Babel objects based on the instance/object ID Need to extend TAO’s PortableServer class to expose the Input (for reading input parameters) and Output (to sending the results) CDRs –SIDL Call and Response objects get a reference to the Input and Output CDRs respectively

9 Distributed Components 9 Nanbor Wang, Roopa Pundaleeka and Johan Carlsson Server-side Request Handling in TAOIIOP 1.Default TAO object (TaoIIOPObject) extends TAO PortableServer::ServantBase, and implements the ‘dispatch’ method, which gets the Input and Output CDRs in ServerReq obj. 2.The dispatch method creates the sidl::rmi::Response, which stores the CDR 3.Gets a reference to the target SIDL object from the InstanceRegistry 4.Executes the target method 1.Pack methods are called on the response object for return, inout and out parameters 2.The results are directly packed into the CDR

10 Distributed Components 10 Nanbor Wang, Roopa Pundaleeka and Johan Carlsson Features Implemented in TAOIIOP All Babel types except opaque Exception Handling One-way method Invocation Non-Blocking / Asynchronous Method Invocation

11 Distributed Components 11 Nanbor Wang, Roopa Pundaleeka and Johan Carlsson TAOIIOP V2.0 Optimizations Initial implementation provides a proof-of-concept but has many extra memory allocations, copying and conversions Added support to be able to directly add Babel types to CORBA CDR No conversions between Babel types and CORBA types to support discrepancies Aggregation of memory allocations

12 Distributed Components 12 Nanbor Wang, Roopa Pundaleeka and Johan Carlsson Performance Comparison 1

13 Distributed Components 13 Nanbor Wang, Roopa Pundaleeka and Johan Carlsson Performance Comparison 2

14 Distributed Components 14 Nanbor Wang, Roopa Pundaleeka and Johan Carlsson Performance Comparison 3

15 Distributed Components 15 Nanbor Wang, Roopa Pundaleeka and Johan Carlsson Performance Comparison 4

16 Distributed Components 16 Nanbor Wang, Roopa Pundaleeka and Johan Carlsson Performance Analysis TaoIIOP V1.0 takes a performance hit consistently –Performing extra conversions for arrays and complex number types between CORBA and Babel –Multiple, fine-grained memory allocations –Not taking advantage of TAO’s key optimization mechanisms Distributed proxy components suffers a bit again because data marshalling TaoIIOP V2.0 has a performance gain of 10% for double and 30% for complex numbers, compared to TaoIIOP 1.0 –Optimizations: Made CORBA-Babel mapping types native in TAO by implementing optimized, zero-copy version of marshaling and demarshaling support

17 Distributed Components 17 Nanbor Wang, Roopa Pundaleeka and Johan Carlsson Application of DPHPC We have developed an example application to demonstrate the use of DPHPC –Separating post-simulation data processing (after each time step –Based on Vorpal, a C++ plasma and beam simulation code –Implemented by Fang (Cherry) Liu (Indiana Univ) during summer internship Visible speed up using DPHPC –The actually trend of speedups are counter-intuitive –We are exploring different RMI approaches (TAOIIOP, oneway) and examining ways to optimize the use case

18 Distributed Components 18 Nanbor Wang, Roopa Pundaleeka and Johan Carlsson Summary Implemented the distributed proxy components and the TaoIIOP Babel RMI protocol for connecting distributed CCA applications into a large-scale systems Conducted performance benchmarking on preliminary prototype implementation (version 1.0) to identify key optimizations needed Implemented the optimizations to minimize the overhead (version 2.0) Developed a preliminary example application for remote high performance parallel application with local clusters for data analysis and/or visualization –Work performed by summer intern Fang Liu from Indiana University


Download ppt "Distributed Components for Integrating Large- Scale High Performance Computing Applications Nanbor Wang, Roopa Pundaleeka and Johan Carlsson"

Similar presentations


Ads by Google