Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Kepler/SPA Extensions for Scientific Workflows – Now and Upcoming Ilkay Altintas SWAT lead San Diego Supercomputer Center Bertram Ludäscher.

Similar presentations


Presentation on theme: "1 Kepler/SPA Extensions for Scientific Workflows – Now and Upcoming Ilkay Altintas SWAT lead San Diego Supercomputer Center Bertram Ludäscher."— Presentation transcript:

1 1 Kepler/SPA Extensions for Scientific Workflows – Now and Upcoming Ilkay Altintas SWAT lead San Diego Supercomputer Center altintas@sdsc.edu Bertram Ludäscher Dept. of Computer Science & Genome Center University of California, Davis ludaesch@ucdavis.edu + many other SDM/SPA & Kepler contributors! San Diego Supercomputer Center UC DAVIS Department of Computer Science

2 2 KEPLER/CSP: Contributors, Sponsors, Projects Ilkay Altintas SDM, NLADR, Resurgence, EOL, … Kim Baldridge Resurgence, NMI Chad Berkley SEEK Shawn Bowers SEEK Terence Critchlow SDM Tobin Fricke ROADNet Jeffrey Grethe BIRN Christopher H. Brooks Ptolemy II Zhengang Cheng SDM Dan Higgins SEEK Efrat Jaeger GEON Matt Jones SEEK Werner Krebs, EOL Edward A. Lee Ptolemy II Kai Lin GEON Bertram Ludaescher SDM, SEEK, GEON, BIRN, ROADNet Mark Miller EOL Steve Mock NMI Steve Neuendorffer Ptolemy II Jing Tao SEEK Mladen Vouk SDM Xiaowen Xin SDM Yang Zhao Ptolemy II Bing Zhu SEEK Ptolemy II www.kepler-project.org LLNL, NCSU, SDSC, UCB, UCD, UCSB, UCSD, …, Zurich Collab. tools: IRC, cvs, skype, Wiki: hotTopics, FAQs,.. Ptolemy II SPA

3 3 GEON Dataset Generation & Registration (a co-development in KEPLER) Xiaowen (SDM/SPA) Edward et al.(Ptolemy) Yang (Ptolemy) Efrat (GEON) Ilkay (SDM/SPA) SQL database access (JDBC) Matt,Chad, Dan et al. (SEEK) % Makefile $> ant run % Makefile $> ant run

4 4 Update: endo-SPA (exo-Kepler), endo- Kepler (exo-SPA), … w/o counting peas… No/minor changes:No/minor changes: –XSLT, email, … Web service actor (SDM)Web service actor (SDM) –Updated: dynamic operation display, error reporting Command line actor (SDM)Command line actor (SDM) –Updated: improved interface and error handling SSH2 actor (SDM)SSH2 actor (SDM) –New: implements ssh2 protocol for remote execution (no plain password sent over the wire) Timestamp actor (SDM)Timestamp actor (SDM) –New: for logging BrowserUIv2.0 (SDM)BrowserUIv2.0 (SDM) –reimplemented, improved interface –v3.0 planned (“catching” http-get/post via localhost) Execution logger (SDM)Execution logger (SDM) –New: workflow “black box” for keeping track of runs Documentation framework (SDM)Documentation framework (SDM) –Autogenerated actor documentation (new doclets and taglets) Ontology-based actor and dataset classification (SEEK)Ontology-based actor and dataset classification (SEEK) –Finding relevant components: actors and datasets, suggesting possible connections, … Kepler/SRB toolkit (GEON, SDM, SEEK, …)Kepler/SRB toolkit (GEON, SDM, SEEK, …) –improved interfaces, new functions …

5 5 Application Pull vs Technology Push Use case driven (application pull)Use case driven (application pull) –PIW, TSI-1, TSI-2, … –Solve technology issues along the way (+) solve the particular scientists’ problem (-) one-of-a-kind solutions, few generic & reusable technology Example: –TSI-1 and TSI-2 are conceptually almost identical scientific (“Grid/HPC/HTC”) workflows –but implemented very differently  limited reuse, e.g., evolving/customizing one into the other is hard/impossible…

6 6 Application Pull vs Technology Push Technology driven (technology push)Technology driven (technology push) –Generic application integration mechanisms: web service actor, harvester, command-line actors, ssh2 actor, BrowserUI, … –Specialized interfaces to HPC/HTC systems: Large-scale data management: –SDSC SRB toolkit (set of SRB actors), –SRM?, PVFS2?, MPI-IO?, … Interfacing with generic job schedulers: –NIMROD, Condor, APST, … –Interfacing with scientific packages: –Statistics toolkit (R, …), GIS (Grass, ArcIMS, Mapserver…) –GAMESS toolkit, APBS (visualization)… (+) developing a reusable technology / toolkits (!) still need guidance by domain scientists’ problems, but need to lift one-of solutions into a general SWF engineering methodology

7 7 Increasing number of Kepler actors…

8 8 … creating prototype workflows and test cases (for automated tests) …

9 9 … putting them together in generic, reusable packages, e.g. Kepler/SRB toolkit SRB holdings @ SDSC only: 404 TB in 59 million files across 5167 users (12/16/’04, Reagan Moore)

10 10 KEPLER/R Toolkit (under development) Source: Dan Higgins, Kepler/SEEK

11 11

12 12 New Developments & Directions

13 13 Ontology-based Actor & Dataset Discovery Ontology based actor (service) and dataset search Result Display

14 14

15 15

16 16 Example: GAMESS Quantum-mechanics cheminformatics workflow Job management infrastructure in place Results database: under development Goal: 1000’s of GAMESS jobs (quantum mechanics)

17 17 Towards a Framework for “Grid/HPC/HTC” WFs & Job Management

18 18 Technology-oriented meeting: May 12th Ptolemy/Kepler Miniconference in Berkeley

19 19 What’s needed, what’s next Build generic toolkits / packagesBuild generic toolkits / packages Don’t reinvent – Reuse!Don’t reinvent – Reuse! –Improved R coupling, SCIRun coupling, … SWF Framework that lets scientists choose…SWF Framework that lets scientists choose… –SRB (Sput, Sget,…), SRM, MPI-IO, GlobusTK (GridFTP,…), Sabul, …, pNetCDF, parallel-R, … packages –Condor, Nimrod, … schedulers –GRASS, … General purpose SWF system/PSE that scientists can use themselvesGeneral purpose SWF system/PSE that scientists can use themselves

20 20 Towards a KEPLER School of Expression (Flow-based Design Patterns) Generality vs specialization of actorsGenerality vs specialization of actors –also loosely coupled vs tightly coupled Data transformation pipelinesData transformation pipelines –alternate compute and data transformation steps Stage-execute-fetch pattern (Grid/HPC/HTC-WFs)Stage-execute-fetch pattern (Grid/HPC/HTC-WFs) Loops, higher-order functions (map, foldr, …)Loops, higher-order functions (map, foldr, …) –cf. Taverna’s automatic loop insertion based on data types F-map producer [f 1, f 2, … f n ] methods functions map f [x 1, x 2, … x n ] producer [f(x) 1,…,f(x n )] X ABC connect JDBC/SRB connection tokens, proxies, certificates

21 21 Blurring Design (ToDo) and Execution

22 22 Kepler@UC Davis Genome Center: Scientific Workflows to Support the Complete (Wet-lab) Experiment Lifecycle Try to capture and (semi-)automate the Experiment Lifecycle:Try to capture and (semi-)automate the Experiment Lifecycle: –Discover similar experiments, … –reuse, customize, –execute, monitor, –manage results, –Register back to an experiment repository  Support Experiment Design, Execution, & Reuse  Scientific workflows and semantic extensions (ontologies, metadata++)

23 23 Summary: What we could/should do Push technology:Push technology: –Distributed Kepler & “detached” execution –Making Kepler more X-aware, where … … X=Data plumbing (SRB toolkit, GridTK, others, …) … X=Grid & Scheduling (need a “Grid director”? Condor director?), … X=Parameter-sweep (“Nimrod/APST”… director?) … X=Statistics & other specialized packages (R, parallel-R?, …, Grass, … ) … X=Visualization (SciRUN, …) –Semantic extensions Actors and datasets have “semantic types” to support reource discovery, WF design, … Create “Packages” or “Rolls”Create “Packages” or “Rolls” –… targeting certain scientific user groups & communities SWF Life-cycle support:SWF Life-cycle support: –Design, execution, monitoring, archival, re-use/re-run –Design patterns, “Kepler School of Expression”


Download ppt "1 Kepler/SPA Extensions for Scientific Workflows – Now and Upcoming Ilkay Altintas SWAT lead San Diego Supercomputer Center Bertram Ludäscher."

Similar presentations


Ads by Google