Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Hybrid Decomposition Scheme for Building Scientific Workflows Wei Lu Indiana University.

Similar presentations


Presentation on theme: "A Hybrid Decomposition Scheme for Building Scientific Workflows Wei Lu Indiana University."— Presentation transcript:

1 A Hybrid Decomposition Scheme for Building Scientific Workflows Wei Lu Indiana University

2 Application Decomposition Large scientific applications require –Decomposing the problem into manageable units –Units need to be Self-described Self-encapsulated Independently developed and deployed composable Two decomposition dimensions –Functional Decomposition (a.k.a. Spatial Decomposition) C/C++, JAVA Component –Temporal Decomposition Unix Pipe Workflow –however, most PSEs provide only one approach to the exclusion of the other Our work

3 Common Component Architecture (CCA) Scientific computing imposes special requirements –Support for legacy software –Performance is crucial –languages, data types Fortran, C/C++, Python, Java, etc. Complex numbers and Arrays (as first-class objects) –Support the various parallel run-time platforms CCA –Component framework specification –Designed for the scientific high performance computing –Aims at improving the scientific software reusing

4 CCA Component Each component describes –What functionality it fulfills Provide port –What functionality it needs to fulfill its task Use port Use-Provide pattern –Plug-and-play The port is described in SIDL –Scientific Interface Definition Language –Partially derived from CORBA IDL –With constructs to describe the complex number, array, etc. –Babel : Language Interoperability Tool NonlinearFunction FunctionPort MidpointIntegrator IntegratorPort CFortran LinearFunction FunctionPort Python

5 Example of the CCA Composition interface IntegratorPort extends gov.cca.Port { double integrate(in double lowBound, in double upBound, in int count); }

6 Ccaffeine Parallel implementation of the CCA framework SCMD (Single Component Multiple Data) –Inter-components communication virtual function call in the same address space –Intra-components communication could be MPI, PVM, etc.

7 Kepler Scientific workflow enviroment –Data-flow oriented Basic unit: Actor –Input, Output –Typed dataflow structure –Lots of domain-specific actors supporting biology, ecology, astronomy –General facility actors Grid service actor Web service actor Wire the actors by piping GridFtp Classifier localFilePath URL Credential

8 Compare Side by Side Actor –Stands for one function Port –Input/Output –A data-structure definition Connection –Producer to Consumer Compositions defines “How” Advantages –Loosely coupled –Supports distributed resource sharing Component –Stands for one class Port –Provide/Use –An interface signature Connection –Caller to Callee Composition defines “What” Advantages –Good performance –Supports parallel programming model

9 A Hybrid solution Typical scientific applications –involve multiple distributed data processing phases. –Among those phases there are number of computationally intensive cores, often the classical numerical algorithm need the high performance execution environment. The hybrid scheme –use the workflow scheme to decompose based on the distribution of the resource –Then use the component scheme to further decompose those computationally intensive sub- problems to form the parallel solution. Benefit from both schemes

10 Service over Components Building web service over the CCA –Web service = good interoperability –Kepler supports web service as the actor –More resource and protocols (e.g., WS-BEPL) Façade pattern –External view by the coarse-grained web service –Internal functionality by the fine-grained components. Factory pattern –Workflow needs a task-specific service rather than meta-level service. –The task-specific Service Should be created dynamically and on-demand –But service is not instantiable ! service Task-specific service create

11 Architecture Job –A specific task performed by a group wired components Two phases execution –Compose the job –Run the job Two explicitly separated web services (CCA-Services) –Factory Service –Job Proxy Factory Service Ccaffeine Framework IPC Job Proxy Composer User Invocation Job description

12 Job Factory Service A Façade for the ccaffeine framework –Connects the ccaffeine muxer via a socket –Maintains the job tables, job lifecycle Create –parameters Gateway port – the task-specific interface Composition Description: –how components wired to support the Gateway port –Convert the SIDL to WSDL Gateway port definition to the equivalent WSDL –Forward the composition commands to the ccaffeine muxer Will be executed in parallel –Maintain job records internally –Create the Job Proxy service return its WSDL URL Modify –Change the composition without impacting the service interface

13 Job Proxy Service Façade for the wired components With task-specific WSDL interface When getting the SOAP message –Extract the argument from the message –Pass the argument to the ccaffeine –Invoke the ccaffeine –Get result from Driver and send SOAP response Job Proxy User SOAP request Arguments Driver

14 Example Factory Service socket Composer Gateway port composition Job Proxy Go Gateway port User SOAP Job WSDL Job table

15 Convert SIDL to WSDL SIDL Port interface (methods) object oriented –Port interface A virtual interface inheritance, polymorphism Can be referred as the function parameter type –No data structure so far WSDL PortType (operations) wire-format description –PortType A group of message exchanges no inheritance, no polymorphism can’t be referred as the method parameter type –Any type is data structure essentially (by XML Schema) No way to figure out the structural information from a SIDL port interface! Challenge Current workaround: Only allow the methods with primitive argument type Introducing structure in SIDL will alleviate the problem reasonably

16 Example interface IntegratorPort extends gov.cca.Port { double integrate(in double lowBound, in double upBound, in int count); }

17 Kepler Web Service Actor Kepler provides a general web service actor For a method defined in the WSDL –The actor will dynamically adjusts its input/output setting

18 Kepler CCA-Service Actor For CCA-Serivce –Recall that we have 2 explicit steps –the JobProxy service is dynamically created –We need to hide the procedure of creating the JobProxy service from the user CCA-Service Actor –Extended from the web service actor –First calls the JobFactory service to create the JobProxy service –With the WSDL of JobProxy, it does same thing as a general web service actor does

19

20

21 Change the GUI from Socket stream based to Soap message based.

22

23 Conclusion A hybrid decomposition scheme for scientific application Workflow scheme is used first based on the resource distribution Component scheme is used to further decompose the core parts Web service interface is the key to the integration CCA integrates into Kepler as a special actor, with GUI supporting unified visual environment. Converting SIDL to WSDL is inherently challenging, Structure is useful for distributed systems, so we need to introduce the Structure into SIDL

24 Thanks Thanks for the valuable comment by the reviewers


Download ppt "A Hybrid Decomposition Scheme for Building Scientific Workflows Wei Lu Indiana University."

Similar presentations


Ads by Google