1 Kepler/SPA Extensions for Scientific Workflows – Now and Upcoming Ilkay Altintas SWAT lead San Diego Supercomputer Center Bertram Ludäscher.

Slides:



Advertisements
Similar presentations
Nimrod/K: Towards Massively Parallel Dynamic Grid Workflows David Abramson, Colin Enticott, Monash Ilkay Altinas, UCSD.
Advertisements

Overview of the Science Environment for Ecological Knowledge (SEEK) Ricardo Scachetti Pereira.
Provenance GGF18 Kepler/COW+RWS, Kepler/COW+RWS, Bowers, McPhiilips et al. Provenance Management in a COllection-oriented Scientific Workflow.
An Operational Metadata Framework For Searching, Indexing, and Retrieving Distributed GIServices on the Internet By Ming-Hsiang.
UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies Provenance Collection Support in the Kepler Scientific Workflow.
Jianwu Wang, Daniel Crawl, Ilkay Altintas San Diego Supercomputer Center, University of California, San Diego 9500 Gilman Drive, MC 0505 La Jolla, CA ,
The KEPLER Scientific Workflow System Bertram Ludäscher Ilkay Altintas … & the Kepler Team San Diego Supercomputer Center University of California, San.
Semantic Extensions for Scientific Workflows on the Grid Bertram Ludäscher San Diego Supercomputer Center Associate Professor Dept.
Chess Review November 18, 2004 Berkeley, CA Experimental Research Edited and Presented by Alberto Sangiovanni Vincentelli, Co-PI UC Berkeley.
Chad Berkley National Center for Ecological Analysis and Synthesis (NCEAS), University of California, Santa Barbara February.
Workflow Exchange and Archival: The KSW File and the Kepler Object Manager Shawn Bowers (For Chad Berkley & Matt Jones) University of California, Davis.
6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 Distributed Computing in Kepler Ilkay Altintas Lead, Scientific Workflow Automation Technologies.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
Ngu, Texas StatePtolemy Miniconference, February 13, 2007 Flexible Scientific Workflows Using Dynamic Embedding Anne H.H. Ngu, Nicholas Haasch Terence.
Data Grids: Globus vs SRB. Maturity SRB  Older code base  Widely accepted across multiple communities  Core components are tightly integrated Globus.
KEPLER: Overview and Project Status Bertram Ludäscher San Diego Supercomputer Center Associate Professor Dept. of Computer Science.
1 CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Global Earth Observation Grid Workshop, Bangkok, Thailand, March Integration Platform.
SAN DIEGO SUPERCOMPUTER CENTER Developing a CUAHSI HIS Data Node, as part of Cyberinfrastructure for the Hydrologic Sciences David Valentine Ilya Zaslavsky.
KEPLER: Overview and Project Status Bertram Ludäscher San Diego Supercomputer Center Associate Professor Dept. of Computer Science.
On Developing Data Grid Workflows using Storage Resource Broker (SRB) and Kepler Tim H. Wong - UC Davis Efrat Frank - SDSC Dr. Bertram Ludäscher - UC Davis.
A Kepler-based Three Tier Architecture applied to LiDAR Interpolation and Analysis Efrat Frank, Ilkay Altintas San Diego Supercomputer Center, UCSD Configuration.
The Kepler Project Overview, Status, and Future Directions Matthew B. Jones on behalf of the Kepler Project team National Center for Ecological Analysis.
Composing Models of Computation in Kepler/Ptolemy II Summary. A model of computation (MoC) is a formal abstraction of execution in a computer. There is.
Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.
1 Ilkay ALTINTAS - October, 2007 Ilkay ALTINTAS Lab Director, Scientific Workflow Automation Technologies San Diego Supercomputer Center, UCSD Kepler Scientific.
Biology.sdsc.edu CIPRes in Kepler: An integrative workflow package for streamlining phylogenetic data analyses Zhijie Guan 1, Alex Borchers 1, Timothy.
January, 23, 2006 Ilkay Altintas
SDM Center A Quick Update on the TSI and PIW workflows SDM All Hands March 2-3, Terence Critchlow, Xiaowen Xin, Bertram.
Bertram Ludäscher Managing Scientific Data: From Data Integration to Scientific Workflows Bertram Ludäscher UC.
Composing Models of Computation in Kepler/Ptolemy II
National Partnership for Advanced Computational Infrastructure Digital Library Architecture Reagan Moore Chaitan Baru Amarnath Gupta George Kremenek Bertram.
Introduction for BEAM Ecological Niche Modeling Working Meeting Deana Pennington University of New Mexico December 14, 2004.
University of California, Davis Daniel Zinn 1 University of California, Davis Daniel Zinn 1 Parallel Virtual Machines in Kepler Daniel Zinn Xuan Li Bertram.
Data R&D Issues for GTL Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego Bertram Ludäscher
Pipelines and Scientific Workflows with Ptolemy II Deana Pennington University of New Mexico LTER Network Office Shawn Bowers UCSD San Diego Supercomputer.
Grids and Portals for VLAB Marlon Pierce Community Grids Lab Indiana University.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
Kepler/pPOD: Scientific Workflow and Provenance Support for Assembling the Tree of Life UC DAVIS Department of Computer Science The Kepler/pPOD Team Shawn.
Workflow Topics for the Next- Generation SDM-Center Ilkay Altintas Bertram Ludäscher San Diego Supercomputer Center.
Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego
Enabling Access to High-Resolution LiDAR Topography through Cyberinfrastructure-Based Data Distribution and Processing Christopher J. Crosby, J Ramón Arrowsmith.
Semantic Mediation in SEEK/Kepler: Exploiting Semantic Annotation for Discovery, Analysis, and Integration of Scientific Data and Workflows Bertram Ludäscher.
Accelerating Scientific Exploration Using Workflow Automation Systems Terence Critchlow (LLNL) Ilkay Altintas (SDSC) Scott Klasky(ORNL) Mladen Vouk (NCSU)
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
1 Ilkay ALTINTAS - July 24th, 2007 Ilkay ALTINTAS Director, Scientific Workflow Automation Technologies Laboratory San Diego Supercomputer Center, UCSD.
Ontologies in Data and Application Integration – an Update Kai Lin Bertram Ludäscher Knowledge-Based Information Systems Lab Data and Knowledge Systems.
Grid Architecture William E. Johnston Lawrence Berkeley National Lab and NASA Ames Research Center (These slides are available at grid.lbl.gov/~wej/Grids)
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Using R in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
Kepler includes contributors from GEON, SEEK, SDM Center and Ptolemy II, supported by NSF ITRs (SEEK), EAR (GEON), DOE DE-FC02-01ER25486.
SDM center Supporting Heterogeneous Data Access in Genomics Terence Critchlow Center for Applied Scientific Computing Lawrence Livermore National Laboratory.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
SDM center Supporting Heterogeneous Data Access in Genomics Terence Critchlow Ling Liu, Calton Pu GT Reagan Moore, Bertam Ludaescher, SDSC Amarnath Gupta.
Toward interactive visualization in a distributed workflow Steven G. Parker Oscar Barney Ayla Khan Thiago Ize Steven G. Parker Oscar Barney Ayla Khan Thiago.
Towards Self-Describing Workflows for Climate Models Kathy Saint – UCAR Ufuk Utku Turuncoglu – ITU Sylvia Murphy – NCAR Cecelia DeLuca – NCAR.
SEEK Science Environment for Ecological Knowledge l EcoGrid l Ecological, biodiversity and environmental data l Computational access l Standardized, open.
Semantic Mediation and Scientific Workflows Bertram Ludäscher Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego.
GEONSearch: From Searching to Recommending GeoInformatics 2006 May 10-12, Reston, Virginia Ullas Nambiar, Bertram Ludaescher Dept. of Computer Science.
Satisfying Requirements BPF for DRA shall address: –DAQ Environment (Eclipse RCP): Gumtree ISEE workbench integration; –Design Composing and Configurability,
SDM Center Experience with Fusion Workflows Norbert Podhorszki, Bertram Ludäscher Department of Computer Science University of California, Davis UC DAVIS.
Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 Scientific Workflows for OOI Ilkay Altintas Charles.
Workflow-Driven Science using Kepler Ilkay Altintas, PhD San Diego Supercomputer Center, UCSD words.sdsc.edu.
GEON IT Solutions: Products and Demos Chaitan Baru San Diego Supercomputer Center.
Efrat Jaeger – SDSC Bertram Ludäscher – UC DAVIS Krishna Sinha – Virginia Tech Ashraf Memon – SDSC Ghulam Memon – SDSC Ilkay Altintas – SDSC Kai Lin –
EcoGrid in SEEK A Data Grid System for Ecology Bertram Ludaescher University of California, Davis Arcot Rajasekar San Diego Supercomputer Center, University.
A Semantic Type System and Propagation
KEPLER: Overview and Project Status
Presentation transcript:

1 Kepler/SPA Extensions for Scientific Workflows – Now and Upcoming Ilkay Altintas SWAT lead San Diego Supercomputer Center Bertram Ludäscher Dept. of Computer Science & Genome Center University of California, Davis + many other SDM/SPA & Kepler contributors! San Diego Supercomputer Center UC DAVIS Department of Computer Science

2 KEPLER/CSP: Contributors, Sponsors, Projects Ilkay Altintas SDM, NLADR, Resurgence, EOL, … Kim Baldridge Resurgence, NMI Chad Berkley SEEK Shawn Bowers SEEK Terence Critchlow SDM Tobin Fricke ROADNet Jeffrey Grethe BIRN Christopher H. Brooks Ptolemy II Zhengang Cheng SDM Dan Higgins SEEK Efrat Jaeger GEON Matt Jones SEEK Werner Krebs, EOL Edward A. Lee Ptolemy II Kai Lin GEON Bertram Ludaescher SDM, SEEK, GEON, BIRN, ROADNet Mark Miller EOL Steve Mock NMI Steve Neuendorffer Ptolemy II Jing Tao SEEK Mladen Vouk SDM Xiaowen Xin SDM Yang Zhao Ptolemy II Bing Zhu SEEK Ptolemy II LLNL, NCSU, SDSC, UCB, UCD, UCSB, UCSD, …, Zurich Collab. tools: IRC, cvs, skype, Wiki: hotTopics, FAQs,.. Ptolemy II SPA

3 GEON Dataset Generation & Registration (a co-development in KEPLER) Xiaowen (SDM/SPA) Edward et al.(Ptolemy) Yang (Ptolemy) Efrat (GEON) Ilkay (SDM/SPA) SQL database access (JDBC) Matt,Chad, Dan et al. (SEEK) % Makefile $> ant run % Makefile $> ant run

4 Update: endo-SPA (exo-Kepler), endo- Kepler (exo-SPA), … w/o counting peas… No/minor changes:No/minor changes: –XSLT, , … Web service actor (SDM)Web service actor (SDM) –Updated: dynamic operation display, error reporting Command line actor (SDM)Command line actor (SDM) –Updated: improved interface and error handling SSH2 actor (SDM)SSH2 actor (SDM) –New: implements ssh2 protocol for remote execution (no plain password sent over the wire) Timestamp actor (SDM)Timestamp actor (SDM) –New: for logging BrowserUIv2.0 (SDM)BrowserUIv2.0 (SDM) –reimplemented, improved interface –v3.0 planned (“catching” http-get/post via localhost) Execution logger (SDM)Execution logger (SDM) –New: workflow “black box” for keeping track of runs Documentation framework (SDM)Documentation framework (SDM) –Autogenerated actor documentation (new doclets and taglets) Ontology-based actor and dataset classification (SEEK)Ontology-based actor and dataset classification (SEEK) –Finding relevant components: actors and datasets, suggesting possible connections, … Kepler/SRB toolkit (GEON, SDM, SEEK, …)Kepler/SRB toolkit (GEON, SDM, SEEK, …) –improved interfaces, new functions …

5 Application Pull vs Technology Push Use case driven (application pull)Use case driven (application pull) –PIW, TSI-1, TSI-2, … –Solve technology issues along the way (+) solve the particular scientists’ problem (-) one-of-a-kind solutions, few generic & reusable technology Example: –TSI-1 and TSI-2 are conceptually almost identical scientific (“Grid/HPC/HTC”) workflows –but implemented very differently  limited reuse, e.g., evolving/customizing one into the other is hard/impossible…

6 Application Pull vs Technology Push Technology driven (technology push)Technology driven (technology push) –Generic application integration mechanisms: web service actor, harvester, command-line actors, ssh2 actor, BrowserUI, … –Specialized interfaces to HPC/HTC systems: Large-scale data management: –SDSC SRB toolkit (set of SRB actors), –SRM?, PVFS2?, MPI-IO?, … Interfacing with generic job schedulers: –NIMROD, Condor, APST, … –Interfacing with scientific packages: –Statistics toolkit (R, …), GIS (Grass, ArcIMS, Mapserver…) –GAMESS toolkit, APBS (visualization)… (+) developing a reusable technology / toolkits (!) still need guidance by domain scientists’ problems, but need to lift one-of solutions into a general SWF engineering methodology

7 Increasing number of Kepler actors…

8 … creating prototype workflows and test cases (for automated tests) …

9 … putting them together in generic, reusable packages, e.g. Kepler/SRB toolkit SRB SDSC only: 404 TB in 59 million files across 5167 users (12/16/’04, Reagan Moore)

10 KEPLER/R Toolkit (under development) Source: Dan Higgins, Kepler/SEEK

11

12 New Developments & Directions

13 Ontology-based Actor & Dataset Discovery Ontology based actor (service) and dataset search Result Display

14

15

16 Example: GAMESS Quantum-mechanics cheminformatics workflow Job management infrastructure in place Results database: under development Goal: 1000’s of GAMESS jobs (quantum mechanics)

17 Towards a Framework for “Grid/HPC/HTC” WFs & Job Management

18 Technology-oriented meeting: May 12th Ptolemy/Kepler Miniconference in Berkeley

19 What’s needed, what’s next Build generic toolkits / packagesBuild generic toolkits / packages Don’t reinvent – Reuse!Don’t reinvent – Reuse! –Improved R coupling, SCIRun coupling, … SWF Framework that lets scientists choose…SWF Framework that lets scientists choose… –SRB (Sput, Sget,…), SRM, MPI-IO, GlobusTK (GridFTP,…), Sabul, …, pNetCDF, parallel-R, … packages –Condor, Nimrod, … schedulers –GRASS, … General purpose SWF system/PSE that scientists can use themselvesGeneral purpose SWF system/PSE that scientists can use themselves

20 Towards a KEPLER School of Expression (Flow-based Design Patterns) Generality vs specialization of actorsGenerality vs specialization of actors –also loosely coupled vs tightly coupled Data transformation pipelinesData transformation pipelines –alternate compute and data transformation steps Stage-execute-fetch pattern (Grid/HPC/HTC-WFs)Stage-execute-fetch pattern (Grid/HPC/HTC-WFs) Loops, higher-order functions (map, foldr, …)Loops, higher-order functions (map, foldr, …) –cf. Taverna’s automatic loop insertion based on data types F-map producer [f 1, f 2, … f n ] methods functions map f [x 1, x 2, … x n ] producer [f(x) 1,…,f(x n )] X ABC connect JDBC/SRB connection tokens, proxies, certificates

21 Blurring Design (ToDo) and Execution

22 Davis Genome Center: Scientific Workflows to Support the Complete (Wet-lab) Experiment Lifecycle Try to capture and (semi-)automate the Experiment Lifecycle:Try to capture and (semi-)automate the Experiment Lifecycle: –Discover similar experiments, … –reuse, customize, –execute, monitor, –manage results, –Register back to an experiment repository  Support Experiment Design, Execution, & Reuse  Scientific workflows and semantic extensions (ontologies, metadata++)

23 Summary: What we could/should do Push technology:Push technology: –Distributed Kepler & “detached” execution –Making Kepler more X-aware, where … … X=Data plumbing (SRB toolkit, GridTK, others, …) … X=Grid & Scheduling (need a “Grid director”? Condor director?), … X=Parameter-sweep (“Nimrod/APST”… director?) … X=Statistics & other specialized packages (R, parallel-R?, …, Grass, … ) … X=Visualization (SciRUN, …) –Semantic extensions Actors and datasets have “semantic types” to support reource discovery, WF design, … Create “Packages” or “Rolls”Create “Packages” or “Rolls” –… targeting certain scientific user groups & communities SWF Life-cycle support:SWF Life-cycle support: –Design, execution, monitoring, archival, re-use/re-run –Design patterns, “Kepler School of Expression”