KEPLER: Overview and Project Status Bertram Ludäscher San Diego Supercomputer Center Associate Professor Dept. of Computer Science.

Slides:



Advertisements
Similar presentations
Overview of the Science Environment for Ecological Knowledge (SEEK) Ricardo Scachetti Pereira.
Advertisements

ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies Provenance Collection Support in the Kepler Scientific Workflow.
The KEPLER Scientific Workflow System Bertram Ludäscher Ilkay Altintas … & the Kepler Team San Diego Supercomputer Center University of California, San.
Semantic Extensions for Scientific Workflows on the Grid Bertram Ludäscher San Diego Supercomputer Center Associate Professor Dept.
Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.
Chad Berkley National Center for Ecological Analysis and Synthesis (NCEAS), University of California, Santa Barbara February.
Workflow Exchange and Archival: The KSW File and the Kepler Object Manager Shawn Bowers (For Chad Berkley & Matt Jones) University of California, Davis.
6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 Distributed Computing in Kepler Ilkay Altintas Lead, Scientific Workflow Automation Technologies.
February 11, 2010 Center for Hybrid and Embedded Software Systems Ptolemy II - Heterogeneous Concurrent Modeling and Design.
Ngu, Texas StatePtolemy Miniconference, February 13, 2007 Flexible Scientific Workflows Using Dynamic Embedding Anne H.H. Ngu, Nicholas Haasch Terence.
KEPLER: Overview and Project Status Bertram Ludäscher San Diego Supercomputer Center Associate Professor Dept. of Computer Science.
Leveraging semantic metadata for ecological data discovery and integration for analysis and modeling Matthew B. Jones Mark P. Schildhauer with contributions.
Department of Electrical Engineering and Computer Sciences University of California at Berkeley The Ptolemy II Framework for Visual Languages Xiaojun Liu.
Composing Models of Computation in Kepler/Ptolemy II Summary. A model of computation (MoC) is a formal abstraction of execution in a computer. There is.
Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.
 Scientific workflow management system based on Ptolemy II  Allows scientists to visually design and execute scientific workflows  Actor-oriented.
1 Ilkay ALTINTAS - October, 2007 Ilkay ALTINTAS Lab Director, Scientific Workflow Automation Technologies San Diego Supercomputer Center, UCSD Kepler Scientific.
Biology.sdsc.edu CIPRes in Kepler: An integrative workflow package for streamlining phylogenetic data analyses Zhijie Guan 1, Alex Borchers 1, Timothy.
January, 23, 2006 Ilkay Altintas
SDM Center A Quick Update on the TSI and PIW workflows SDM All Hands March 2-3, Terence Critchlow, Xiaowen Xin, Bertram.
Bertram Ludäscher Managing Scientific Data: From Data Integration to Scientific Workflows Bertram Ludäscher UC.
Scientific Workflow reusing and long term big data preservation Salima Benbernou Université Paris Descartes Project.
Composing Models of Computation in Kepler/Ptolemy II
Introduction for BEAM Ecological Niche Modeling Working Meeting Deana Pennington University of New Mexico December 14, 2004.
Data R&D Issues for GTL Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego Bertram Ludäscher
Scientific Data & Workflow Engineering Preliminary Notes from the Cyberinfrastructure Trenches Bertram Ludäscher San Diego Supercomputer Center Associate.
Workflow Project Luciano Piccoli Illinois Institute of Technology.
Pipelines and Scientific Workflows with Ptolemy II Deana Pennington University of New Mexico LTER Network Office Shawn Bowers UCSD San Diego Supercomputer.
Privacy issues in integrating R environment in scientific workflows Dr. Zhiming Zhao University of Amsterdam Virtual Laboratory for e-Science Privacy issues.
1 Kepler/SPA Extensions for Scientific Workflows – Now and Upcoming Ilkay Altintas SWAT lead San Diego Supercomputer Center Bertram Ludäscher.
Kepler/pPOD: Scientific Workflow and Provenance Support for Assembling the Tree of Life UC DAVIS Department of Computer Science The Kepler/pPOD Team Shawn.
Workflow Topics for the Next- Generation SDM-Center Ilkay Altintas Bertram Ludäscher San Diego Supercomputer Center.
Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego
Science Environment for Ecological Knowledge: EcoGrid Matthew B. Jones National Center for.
Semantic Mediation in SEEK/Kepler: Exploiting Semantic Annotation for Discovery, Analysis, and Integration of Scientific Data and Workflows Bertram Ludäscher.
Accelerating Scientific Exploration Using Workflow Automation Systems Terence Critchlow (LLNL) Ilkay Altintas (SDSC) Scott Klasky(ORNL) Mladen Vouk (NCSU)
SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
1 Ilkay ALTINTAS - July 24th, 2007 Ilkay ALTINTAS Director, Scientific Workflow Automation Technologies Laboratory San Diego Supercomputer Center, UCSD.
Research Design for Collaborative Computational Approaches and Scientific Workflows Deana Pennington January 8, 2007.
Grid Technologies Arcot Rajasekar (SEEK) Paul Watson (North East eScience Centre)
Rule-Based Programming for VORBs Bertram Ludaescher Arcot Rajasekar Data and Knowledge Systems San Diego Supercomputer Center U.C. San Diego.
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
Kepler includes contributors from GEON, SEEK, SDM Center and Ptolemy II, supported by NSF ITRs (SEEK), EAR (GEON), DOE DE-FC02-01ER25486.
GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
SDM center Supporting Heterogeneous Data Access in Genomics Terence Critchlow Center for Applied Scientific Computing Lawrence Livermore National Laboratory.
EScience Workshop on Scientific Workflows Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara.
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
SDM center Supporting Heterogeneous Data Access in Genomics Terence Critchlow Ling Liu, Calton Pu GT Reagan Moore, Bertam Ludaescher, SDSC Amarnath Gupta.
Toward interactive visualization in a distributed workflow Steven G. Parker Oscar Barney Ayla Khan Thiago Ize Steven G. Parker Oscar Barney Ayla Khan Thiago.
Towards Self-Describing Workflows for Climate Models Kathy Saint – UCAR Ufuk Utku Turuncoglu – ITU Sylvia Murphy – NCAR Cecelia DeLuca – NCAR.
SEEK Science Environment for Ecological Knowledge l EcoGrid l Ecological, biodiversity and environmental data l Computational access l Standardized, open.
2005 GRIDS Community Workshop1 Learning From Cyberinfrastructure Initiatives Grid Research Integration Development & Support
OOI Cyberinfrastructure and Semantics OOI CI Architecture & Design Team UCSD/Calit2 Ocean Observing Systems Semantic Interoperability Workshop, November.
GEONSearch: From Searching to Recommending GeoInformatics 2006 May 10-12, Reston, Virginia Ullas Nambiar, Bertram Ludaescher Dept. of Computer Science.
Satisfying Requirements BPF for DRA shall address: –DAQ Environment (Eclipse RCP): Gumtree ISEE workbench integration; –Design Composing and Configurability,
Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 Scientific Workflows for OOI Ilkay Altintas Charles.
Workflow-Driven Science using Kepler Ilkay Altintas, PhD San Diego Supercomputer Center, UCSD words.sdsc.edu.
Staging of the Ecological Niche Modeling Mammal Prototype Project Deana Pennington University of New Mexico December 14, 2004.
GEON IT Solutions: Products and Demos Chaitan Baru San Diego Supercomputer Center.
Efrat Jaeger – SDSC Bertram Ludäscher – UC DAVIS Krishna Sinha – Virginia Tech Ashraf Memon – SDSC Ghulam Memon – SDSC Ilkay Altintas – SDSC Kai Lin –
EcoGrid in SEEK A Data Grid System for Ecology Bertram Ludaescher University of California, Davis Arcot Rajasekar San Diego Supercomputer Center, University.
Ptolemy II - Heterogeneous Concurrent Modeling and Design in Java
Data R&D Issues for GTL Bertram Ludäscher Data and Knowledge Systems
Ptolemy II - Heterogeneous Concurrent Modeling and Design in Java
A Semantic Type System and Propagation
KEPLER: Overview and Project Status
GGF10 Workflow Workshop Summary
Presentation transcript:

KEPLER: Overview and Project Status Bertram Ludäscher San Diego Supercomputer Center Associate Professor Dept. of Computer Science & Genome Center University of California, Davis Fellow San Diego Supercomputer Center University of California, San Diego UC DAVIS Department of Computer Science 6 th Biennial Ptolemy Miniconference Featuring the Kepler Project May 12 th, 2005, Berkeley, CA 6 th Biennial Ptolemy Miniconference Featuring the Kepler Project May 12 th, 2005, Berkeley, CA

6 th Biennial Ptolemy Miniconf., May 12 th, 2005, BerkeleyKepler Overview, B. Ludäscher Outline Scientific Workflows (SWFs) –Cyberinfrastructure, from bioinformatics to astrophysics Some Kepler History –… or why Ptolemy II rules Current and Emerging Kepler Features –from SWF plumbing/hacking to SWF design Outlook

6 th Biennial Ptolemy Miniconf., May 12 th, 2005, BerkeleyKepler Overview, B. Ludäscher Scientific Workflows: Pre-Cyberinfrastructure Data Federation & Grid “Plumbing”: –access, move, replicate, query … data (Data-Grid) authenticate … SRB Sget/Sput … OPeNDAP, … Antelope/ORBs –schedule, launch, monitor jobs (Compute-Grid) Globus, Condor, Nimrod, APST, … Data Integration: –Conceptual querying & integration, structure & semantics, e.g. mediation w/ SQL, XQuery + OWL (Semantics-enabled Mediator) Data Analysis, Mining, Knowledge Discovery: –manual/textbook (e.g. ternary diagrams), Excel, R, simulations, … Visualization: –3-D (volume), 4-D (spatio-temporal), n-D (conceptual views) …  one-of-a-kind custom apps., detached (island) solutions  workflows are hard to reproduce, maintain  no/little workflow design, automation, reuse, documentation  need for an integrated scientific workflow environment

6 th Biennial Ptolemy Miniconf., May 12 th, 2005, BerkeleyKepler Overview, B. Ludäscher What is a Scientific Workflow (SWF)? Model the way scientists work with their data and tools –Mentally coordinate data export, import, analysis via software systems Scientific workflows emphasize data flow ( ≠ business workflows) Metadata (incl. provenance info, semantic types etc.) is crucial for automated data ingestion, data analysis, … Goals: –SWF automation, –SWF & component reuse, –SWF design & documentation –making scientists ’ data analysis and management easier!

6 th Biennial Ptolemy Miniconf., May 12 th, 2005, BerkeleyKepler Overview, B. Ludäscher Some Scientific Workflow Features Typical requirements/characteristics: –data-intensive and/or compute-intensive –plumbing-intensive –dataflow-oriented –distribution (data, processing) –user-interaction “in the middle”, … –… vs. (C-z; bg; fg)-ing (“detach” and reconnect) –advanced programming constructs (map(f), zip, takewhile, …) –logging, provenance, “registering back” (intermediate) products –… … easy to recognize a SWF when you see one!

6 th Biennial Ptolemy Miniconf., May 12 th, 2005, BerkeleyKepler Overview, B. Ludäscher Promoter Identification Workflow (Napkin Drawing) Source: Matt Coleman (LLNL)

6 th Biennial Ptolemy Miniconf., May 12 th, 2005, BerkeleyKepler Overview, B. Ludäscher Ecology: Analysis Pipeline for Invasive Species Prediction (Napkin Drawing) Training sample (d) GARP rule set (e) Test sample (d) Integrated layers (native range) (c) Species presence & absence points (native range) (a) EcoGrid Query EcoGrid Query Layer Integration Layer Integration Sample Data + A3 + A2 + A1 Data Calculation Map Generation Validation User Validation Map Generation Integrated layers (invasion area) (c) Species presence &absence points (invasion area) (a) Native range predictio n map (f) Model quality parameter (g) Environmental layers (native range) (b) Generate Metadata Archive To Ecogrid Registered Ecogrid Database Registered Ecogrid Database Registered Ecogrid Database Registered Ecogrid Database Environmental layers (invasion area) (b) Invasion area prediction map (f) Model quality parameter (g) Selected predictio n maps (h) Source: NSF SEEK (Deana Pennington et. al, UNM)

6 th Biennial Ptolemy Miniconf., May 12 th, 2005, BerkeleyKepler Overview, B. Ludäscher Promoter Identification Workflow in Kepler

6 th Biennial Ptolemy Miniconf., May 12 th, 2005, BerkeleyKepler Overview, B. Ludäscher Ecological Niche Modeling in Kepler (200 to 500 runs per species x 2000 mammal species x 3 minutes/run) = 833 to 2083 days

6 th Biennial Ptolemy Miniconf., May 12 th, 2005, BerkeleyKepler Overview, B. Ludäscher GEON Analysis Workflow in KEPLER

6 th Biennial Ptolemy Miniconf., May 12 th, 2005, BerkeleyKepler Overview, B. Ludäscher Commercial & Open Source Scientific Workflow and (Dataflow) Systems & Problem Solving Environments Kensington Discovery Edition from InforSense Taverna Triana SciRUN II

6 th Biennial Ptolemy Miniconf., May 12 th, 2005, BerkeleyKepler Overview, B. Ludäscher see!see! try!try! read!read! Source: Edward Lee et al. Our Starting Point: Ptolemy II

6 th Biennial Ptolemy Miniconf., May 12 th, 2005, BerkeleyKepler Overview, B. Ludäscher Why Ptolemy II ? Ptolemy II Objective: –“The focus is on assembly of concurrent components. The key underlying principle in the project is the use of well-defined models of computation that govern the interaction between components. A major problem area being addressed is the use of heterogeneous mixtures of models of computation.” Dataflow Process Networks w/ natural support for abstraction, pipelining (streaming) actor-orientation, actor reuse User-Orientation –Workflow design & exec console (Vergil GUI) –“Application/Glue-Ware” excellent modeling and design support run-time support, monitoring, … not a middle-/underware (we use someone else’s, e.g. Globus, SRB, …) but middle-/underware is conveniently accessible through actors! PRAGMATICS –Ptolemy II is mature, continuously extended & improved, well-documented (500+pp) –open source system –many research results –Ptolemy II participation in Kepler

6 th Biennial Ptolemy Miniconf., May 12 th, 2005, BerkeleyKepler Overview, B. Ludäscher KEPLER/CSP: Contributors, Sponsors, Projects Ilkay Altintas SDM, NLADR, Resurgence, EOL, … Kim Baldridge Resurgence, NMI Chad Berkley SEEK Shawn Bowers SEEK Terence Critchlow SDM Tobin Fricke ROADNet Jeffrey Grethe BIRN Christopher H. Brooks Ptolemy II Zhengang Cheng SDM Dan Higgins SEEK Efrat Jaeger GEON Matt Jones SEEK Werner Krebs, EOL Edward A. Lee Ptolemy II Kai Lin GEON Bertram Ludaescher SDM, SEEK, GEON, BIRN, ROADNet Mark Miller EOL Steve Mock NMI Steve Neuendorffer Ptolemy II Jing Tao SEEK Mladen Vouk SDM Xiaowen Xin SDM Yang Zhao Ptolemy II Bing Zhu SEEK Ptolemy II LLNL, NCSU, SDSC, UCB, UCD, UCSB, UCSD, U Man… Utah,…, UTEP, …, Zurich Collab. tools: IRC, cvs, skype, Wiki: hotTopics, FAQs,.. Ptolemy II SPA

6 th Biennial Ptolemy Miniconf., May 12 th, 2005, BerkeleyKepler Overview, B. Ludäscher GEON Dataset Generation & Registration (and co-development in KEPLER) Xiaowen (SDM) Edward et al.(Ptolemy) Yang (Ptolemy) Efrat (GEON) Ilkay (SDM) SQL database access (JDBC) Matt et al. (SEEK) % Makefile $> ant run % Makefile $> ant run

6 th Biennial Ptolemy Miniconf., May 12 th, 2005, BerkeleyKepler Overview, B. Ludäscher Some KEPLER Components (Actors)

6 th Biennial Ptolemy Miniconf., May 12 th, 2005, BerkeleyKepler Overview, B. Ludäscher A User’s Wish List Usability Closing the “lid” (cf. vnc) Dynamic plug-in of actors (cf. actor & data registries/repositories) Distributed WF execution Grid awareness Semantics awareness WF Deployment (as a web site, as a web service, …) “Power apps” (SciRUN II) …

6 th Biennial Ptolemy Miniconf., May 12 th, 2005, BerkeleyKepler Overview, B. Ludäscher Separation of Concerns A shining example: –Ptolemy Directors – “factoring out” the concern of workflow “orchestration” (MoC) –common aspects of overall execution not left to the actors Similarly: –The “Black Box” (“flight recorder”) a kind of “recording central” to avoid wiring 100’s of components to recording-actor(s) –The “Red Box” (error handling, fault tolerance) ……… –The “Yellow Box” (type checking) ……… –The “Blue Box” (shipping-and-handling) central handling of data transport (by value, by reference, by scp, SRB, GridFTP, …) SDF/PN/DE/… Recorder Static Analysis On Error

6 th Biennial Ptolemy Miniconf., May 12 th, 2005, BerkeleyKepler Overview, B. Ludäscher Separation of Concerns: Port Types Token consumption (& production) type –a director’s concern Token transport type –by value, reference (which one), protocol (SOAP, scp, GridFTP, scp, SRB, …) –a SHA concern Structural and semantic types –SAT (static analysis & typing) concern –built after static unit type system… (should become a special case!?)

6 th Biennial Ptolemy Miniconf., May 12 th, 2005, BerkeleyKepler Overview, B. Ludäscher Hybrid Types (Structure + Semantics) Services can be semantically compatible, but structurally incompatible Source Actor Source Actor Target Actor Target Actor PsPs PtPt Semantic Type P s Semantic Type P t Structural Type P t Structural Type P s Desired Connection Incompatible Compatible (⋠)(⋠) (⊑)(⊑) (Ps)(Ps) (Ps)(Ps)  (≺)(≺) Ontologies (OWL) Source: [Bowers-Ludaescher, DILS’04]

6 th Biennial Ptolemy Miniconf., May 12 th, 2005, BerkeleyKepler Overview, B. Ludäscher Scientific Workflow Design Support SWF design & reuse, via: –Structural data types –Semantic types –Associations (=constraints) between them –Type checking, inference, propagation  Separation of concerns: –structure, semantics, WF orchestration, etc.

6 th Biennial Ptolemy Miniconf., May 12 th, 2005, BerkeleyKepler Overview, B. Ludäscher Breaking into the Parallel (e.g. MPI) and Stream Processing Worlds!? Clean functional semantics facilitates algebraic workflow (program) transformations (Bird-Meertens); e.g. mapS f mapS g  mapS (f g) Source: Real-Time Signal Processing: Dataflow, Visual, and Functional Programming, Hideki John Reekie, University of Technology, Sydney

6 th Biennial Ptolemy Miniconf., May 12 th, 2005, BerkeleyKepler Overview, B. Ludäscher KEPLER Today Support for SWF life cycle –Design, share, prototype, run, monitor, deploy, … Coarse-grained scientific workflows, e.g., –web service actors, grid actors, command-line actors, … Fine grained workflows and simulations, e.g., –Database access, XSLT transformations, … Kepler Extensions –support for data- and compute-intensive workflows (SDM/SPA, SEEK) –real-time data streaming (ROADNet) –other special and generic extensions (e.g. GEON, SEEK) Status –first release (alpha) was in May 2004 –nightly builds w/ version tests –“Link-Up Sister Project” w/ other SWF systems (myGrid/Taverna, Triana, …), SciRUN II (DOE SciDAC/SDM) –Participation in various workshops and conferences (GGF10, SSDBMs, eScience WF workshop, …)

6 th Biennial Ptolemy Miniconf., May 12 th, 2005, BerkeleyKepler Overview, B. Ludäscher KEPLER Tomorrow Application-driven extensions (here: SDM): –access to/integration with other IDMAF components PnetCDF?, PVFS(2)?, MPI-IO?, parallel-R?, ASPECT?, FastBit, … –support for execution of new SWF domains Astrophysics, Fusion, …. Further generic extensions: –addtl. support for data-intensive and compute-intensive workflows (all SRB Scommands, CCA support, …) –semantics-intensive workflows –(C-z; bg; fg)-ing (“detach” and reconnect) –workflow deployment models –distributed execution Additional “domain awareness” (esp. via new directors) –time series, parameter sweeps, job scheduling (CONDOR, Globus, …) –hybrid type system with semantic types (“Sparrow” extensions) Consolidation –More installers, regular releases, improved usability, documentation, …

6 th Biennial Ptolemy Miniconf., May 12 th, 2005, BerkeleyKepler Overview, B. Ludäscher Job Management (here: NIMROD) Job management infrastructure in place Results database: under development Goal: 1000’s of GAMESS jobs (quantum mechanics)

6 th Biennial Ptolemy Miniconf., May 12 th, 2005, BerkeleyKepler Overview, B. Ludäscher ORB