Presentation is loading. Please wait.

Presentation is loading. Please wait.

10 Sep 2005 NVO Summer School 20051 Managing VO data and process flows Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY.

Similar presentations


Presentation on theme: "10 Sep 2005 NVO Summer School 20051 Managing VO data and process flows Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY."— Presentation transcript:

1 10 Sep 2005 NVO Summer School 20051 Managing VO data and process flows Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY

2 10 Sep 2005 NVO Summer School 20052 Overview Astronomical data VOStore/VOSpace Workflows Astrogrid workflow CEA

3 10 Sep 2005 NVO Summer School 20053 The importance of data Data is the raison dêtre of the VO LSST is the data source nonpareil –data rates of 540MB/s ~16TB in 8 hrs –final archive > 3PB of data VO Wheel Well-established ways of handling distributed data: – SRB – PVFS – OGSA-DAI

4 10 Sep 2005 NVO Summer School 20054 Data use cases Client has data: –stored locally: transfers it to service –stored locally: service retrieves it –stored elsewhere: service retrieves it Service generates data: –stores it locally: notifies client of location –transfers it to the clients local store –transfers it to a client-designated store

5 10 Sep 2005 NVO Summer School 20055 VOStore Provides a uniform interface to existing or new data storage locations (Facade pattern) Structured/unstructured data both first level Methods: get put list / listAll importInit importData (sync/async) exportInit exportData (sync/async) delete rename

6 10 Sep 2005 NVO Summer School 20056 VOSpace Orchestrates VOStores: –data collections: directories, user-defined –authorisation: user groups –processing efficiency: where is the nearest copy? move copy identifiers

7 10 Sep 2005 NVO Summer School 20057 A virtual super-peer data network?

8 10 Sep 2005 NVO Summer School 20058 How to manage the flows? Way of describing a flow: –processes/steps, inputs/outputs, serial/parallel execution, control logic, variables, inline scripting –preferably XML (verbose but rigourous) Way of controlling a flow: engine e-Science vs. e-Business: –open-ended vs. closed –verification and publication –static vs. dynamic workflows –volume and type of data –meta-transactions –customer, manager and user vs. scientist

9 10 Sep 2005 NVO Summer School 20059 Workflow patterns Sequence: Parallel splitSynchronisatio n AND XORExclusive choiceSimple Merge Multi choice MultiMulti Merge Multi + Synchronizing Merge Multi + Multi Multi + Discriminator Deferred choice Multiple Instances with/out Synch Implicit termination Interleaved Parallel Routing Milestone

10 10 Sep 2005 NVO Summer School 200510 Workflow kerfuffle Workflow languages: BPEL (BPEL4WS, WSBPEL, WSFL, XLANG), BPML, WS-CDL (WSCL, WSCI), XPDL, BPSS, PSL, AGWL, DGL, DPML, GJobDL, GSFL, GFDL, GWorkflowDL, MoML, SWFL, YAWL, SCUFL/Xscufl, WPDL, PIF, PSL, OWL-S, xWFL, XPL, INCA Workflow engines: Taverna, Kepler, Pegasus, DiscoveryNet, Triana, SPA, Geodise, ICENI, Askalon, GridNexus, BioPipe, BizTalk, BPWS4J, DAGMan, GridAnt, GJH, GRMS, GWFE, GWES, ITIEE, JIGSA, Karajan, ScyFLOW, SDSC Matrix, SHOP2, wftk, YAWL Engine, WFEE

11 10 Sep 2005 NVO Summer School 200511 Astrogrid workflow components JES (Job Execution System) –Astrogrid workflow engine –Manages control flow –Runs steps in a controlled asynchronous fashion CEC (Common Execution Controller) –Manages step execution –Manages data flow CEA (Common Execution Architecture) apps –datacenters: support complex quesries against archives –processing: consume data files and reduce them

12 10 Sep 2005 NVO Summer School 200512 Astrogrid workflow schematic PortalRegistryMySpace Command Line CEA Datacenter CEAJES Client library CEC Save/load workflowSave/load data Resolve application Application list Submit workflow

13 10 Sep 2005 NVO Summer School 200513 Astrogrid workflow language description of the workflow 21 ${dec} ftp://aServer/myResults … …

14 10 Sep 2005 NVO Summer School 200514 CEA Create a uniform interface and model for an application and its parameters Provides higher level description than WSDL: –Restrict how interfaces can be expressed –Provide specific semantics for astronomical quantitites –Extra information, such as default values, GUI labels VOResource extensions for a general application Provide asynchronous operation: –callback, polling and job identification Allow separate data and control flows

15 10 Sep 2005 NVO Summer School 200515 Minimum CEA compliance Must implement CommonExecutionConnector interface Must send a message to services implementing ResultsListener interface Should send messages to services implementing JobMonitor interface Should perform basic type checking on all parameter types during init phase


Download ppt "10 Sep 2005 NVO Summer School 20051 Managing VO data and process flows Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY."

Similar presentations


Ads by Google