The KEPLER Scientific Workflow System Bertram Ludäscher Ilkay Altintas … & the Kepler Team San Diego Supercomputer Center University of California, San.

Slides:



Advertisements
Similar presentations
Nimrod/K: Towards Massively Parallel Dynamic Grid Workflows David Abramson, Colin Enticott, Monash Ilkay Altinas, UCSD.
Advertisements

A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Overview of the Science Environment for Ecological Knowledge (SEEK) Ricardo Scachetti Pereira.
UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies Provenance Collection Support in the Kepler Scientific Workflow.
Jianwu Wang, Daniel Crawl, Ilkay Altintas San Diego Supercomputer Center, University of California, San Diego 9500 Gilman Drive, MC 0505 La Jolla, CA ,
Introduction to Scientific Workflows and the KEPLER System Instructors: Bertram Ludaescher Ilkay Altintas Instructors: Bertram Ludaescher Ilkay Altintas.
Concurrent Computational Systems Edward A. Lee Professor, UC Berkeley Ptolemy Project CHESS: Center for Hybrid and Embedded Software Systems University.
Process-Based Software Components for Networked Embedded Systems Edward A. Lee, PI UC Berkeley Core Technical Team (Mobies, SEC, and GSRC): Christopher.
Experiences in Integration of the 'R' System into Kepler Dan Higgins – National Center for Ecological Analysis and Synthesis (NCEAS), UC Santa Barbara.
Workflow Exchange and Archival: The KSW File and the Kepler Object Manager Shawn Bowers (For Chad Berkley & Matt Jones) University of California, Davis.
Type System, March 12, Data Types and Behavioral Types Yuhong Xiong Edward A. Lee Department of Electrical Engineering and Computer Sciences University.
Behavioral Types as Interface Definitions for Concurrent Components Center for Hybrid and Embedded Software Systems Edward A. Lee Professor UC Berkeley.
6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 Distributed Computing in Kepler Ilkay Altintas Lead, Scientific Workflow Automation Technologies.
February 11, 2010 Center for Hybrid and Embedded Software Systems Ptolemy II - Heterogeneous Concurrent Modeling and Design.
Ngu, Texas StatePtolemy Miniconference, February 13, 2007 Flexible Scientific Workflows Using Dynamic Embedding Anne H.H. Ngu, Nicholas Haasch Terence.
Review of “Embedded Software” by E.A. Lee Katherine Barrow Vladimir Jakobac.
An Extensible Type System for Component-Based Design
KEPLER: Overview and Project Status Bertram Ludäscher San Diego Supercomputer Center Associate Professor Dept. of Computer Science.
Department of Electrical Engineering and Computer Sciences University of California at Berkeley System-Level Types for Component-Based Design Edward A.
Department of Electrical Engineering and Computer Sciences University of California at Berkeley Concurrent Component Patterns, Models of Computation, and.
5 th Biennial Ptolemy Miniconference Berkeley, CA, May 9, 2003 The Component Interaction Domain: Modeling Event-Driven and Demand- Driven Applications.
KEPLER: Overview and Project Status Bertram Ludäscher San Diego Supercomputer Center Associate Professor Dept. of Computer Science.
Embedded Software Challenges for the Next 10 Years Chess: Center for Hybrid and Embedded Software Systems Infineon Embedded Software Days Munich, Sept.
Panel: What Comes After C++ in System-Level Specification Edward Lee UC Berkeley Forum on Design Languages Workshop on System Specification & Design Languages.
KEPLER Scientific Workflow System Bertram Ludäscher Knowledge-Based Information Systems Lab San Diego Supercomputer Center & Dept. of Computer Science.
Lee & Henzinger ESWG #1 UC Berkeley Mobies Technology Project Process-Based Software Components for Networked Embedded Systems PI: Edward Lee CoPI: Tom.
System-Level Types for Component-Based Design Paper by: Edward A. Lee and Yuhong Xiong Presentation by: Dan Patterson.
Department of Electrical Engineering and Computer Sciences University of California at Berkeley The Ptolemy II Framework for Visual Languages Xiaojun Liu.
Composing Models of Computation in Kepler/Ptolemy II Summary. A model of computation (MoC) is a formal abstraction of execution in a computer. There is.
Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.
 Scientific workflow management system based on Ptolemy II  Allows scientists to visually design and execute scientific workflows  Actor-oriented.
1 Ilkay ALTINTAS - October, 2007 Ilkay ALTINTAS Lab Director, Scientific Workflow Automation Technologies San Diego Supercomputer Center, UCSD Kepler Scientific.
Biology.sdsc.edu CIPRes in Kepler: An integrative workflow package for streamlining phylogenetic data analyses Zhijie Guan 1, Alex Borchers 1, Timothy.
January, 23, 2006 Ilkay Altintas
SDM Center A Quick Update on the TSI and PIW workflows SDM All Hands March 2-3, Terence Critchlow, Xiaowen Xin, Bertram.
Scientific Workflow reusing and long term big data preservation Salima Benbernou Université Paris Descartes Project.
Composing Models of Computation in Kepler/Ptolemy II
Introduction for BEAM Ecological Niche Modeling Working Meeting Deana Pennington University of New Mexico December 14, 2004.
University of California, Davis Daniel Zinn 1 University of California, Davis Daniel Zinn 1 Parallel Virtual Machines in Kepler Daniel Zinn Xuan Li Bertram.
Data R&D Issues for GTL Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego Bertram Ludäscher
Pipelines and Scientific Workflows with Ptolemy II Deana Pennington University of New Mexico LTER Network Office Shawn Bowers UCSD San Diego Supercomputer.
1 Kepler/SPA Extensions for Scientific Workflows – Now and Upcoming Ilkay Altintas SWAT lead San Diego Supercomputer Center Bertram Ludäscher.
Kepler/pPOD: Scientific Workflow and Provenance Support for Assembling the Tree of Life UC DAVIS Department of Computer Science The Kepler/pPOD Team Shawn.
Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego
Accelerating Scientific Exploration Using Workflow Automation Systems Terence Critchlow (LLNL) Ilkay Altintas (SDSC) Scott Klasky(ORNL) Mladen Vouk (NCSU)
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
1 Ilkay ALTINTAS - July 24th, 2007 Ilkay ALTINTAS Director, Scientific Workflow Automation Technologies Laboratory San Diego Supercomputer Center, UCSD.
Research Design for Collaborative Computational Approaches and Scientific Workflows Deana Pennington January 8, 2007.
L6-S1 UML Overview 2003 SJSU -- CmpE Advanced Object-Oriented Analysis & Design Dr. M.E. Fayad, Professor Computer Engineering Department, Room #283I College.
Scientific Workflows. 2 Overview More background on workflows Kepler Details Example Scientific Workflows Other Workflow Systems.
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
Kepler includes contributors from GEON, SEEK, SDM Center and Ptolemy II, supported by NSF ITRs (SEEK), EAR (GEON), DOE DE-FC02-01ER25486.
Design Languages in 2010 Chess: Center for Hybrid and Embedded Software Systems Edward A. Lee Professor UC Berkeley Panel Position Statement Forum on Design.
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
Toward interactive visualization in a distributed workflow Steven G. Parker Oscar Barney Ayla Khan Thiago Ize Steven G. Parker Oscar Barney Ayla Khan Thiago.
Satisfying Requirements BPF for DRA shall address: –DAQ Environment (Eclipse RCP): Gumtree ISEE workbench integration; –Design Composing and Configurability,
Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 Scientific Workflows for OOI Ilkay Altintas Charles.
ACCESSING DATA IN THE NIS USING THE KEPLER WORKFLOW SYSTEM Corinna Gries.
Workflow-Driven Science using Kepler Ilkay Altintas, PhD San Diego Supercomputer Center, UCSD words.sdsc.edu.
Efrat Jaeger – SDSC Bertram Ludäscher – UC DAVIS Krishna Sinha – Virginia Tech Ashraf Memon – SDSC Ghulam Memon – SDSC Ilkay Altintas – SDSC Kai Lin –
Ptolemy II - Heterogeneous Concurrent Modeling and Design in Java
Ptolemy II - Heterogeneous Concurrent Modeling and Design in Java
Retargetable Model-Based Code Generation in Ptolemy II
Ptolemy II - Heterogeneous Concurrent Modeling and Design in Java
Ptolemy II - Heterogeneous Concurrent Modeling and Design in Java
A Semantic Type System and Propagation
KEPLER: Overview and Project Status
GGF10 Workflow Workshop Summary
Presentation transcript:

The KEPLER Scientific Workflow System Bertram Ludäscher Ilkay Altintas … & the Kepler Team San Diego Supercomputer Center University of California, San Diego SDM Center AHM, LBL, August 3-5, 2004

Kepler, B. Ludäscher, SDSC 2 Outline Project Overview –from Ptolemy II to Kepler Workflow Modeling Issues –from Dataflow to Control-flow (CCA et al) Current Kepler Features –from plumbing to distributed execution Example Workflows –from bioinformatics to geoinformatics Future Plans –from today to tomorrow ;-)

Kepler, B. Ludäscher, SDSC 3 What is a Scientific Workflow (SWF)? Goals: –automate a scientist’s repetitive steps (data analysis, data transformation, computational steps, …) –can encompass data generation, aggregation, analysis, visualization (WF granularity) –design, test, share, deploy, execute, reuse, … SWFs Typical requirements/characteristics: –data-intensive and/or compute-intensive –plumbing-intensive –dataflow-oriented –distribution (data, processing) –user-interaction “in the middle”, … –… vs. (C-z; bg; fg)-ing (“detach” and reconnect) –advanced programming constructs (map(f), zip, takewhile, …) –logging, provenance, “registering back” (intermediate) products… … easy to recognize a SWF when you see one!

Kepler, B. Ludäscher, SDSC 4 Promoter Identification Workflow Source: Matt Coleman (LLNL)

Kepler, B. Ludäscher, SDSC 5 Source: NIH BIRN (Jeffrey Grethe, UCSD)

Kepler, B. Ludäscher, SDSC 6 Ecology: GARP Analysis Pipeline for Invasive Species Prediction Training sample (d) GARP rule set (e) Test sample (d) Integrated layers (native range) (c) Species presence & absence points (native range) (a) EcoGrid Query EcoGrid Query Layer Integration Layer Integration Sample Data + A3 + A2 + A1 Data Calculation Map Generation Validation User Validation Map Generation Integrated layers (invasion area) (c) Species presence &absence points (invasion area) (a) Native range prediction map (f) Model quality parameter (g) Environmental layers (native range) (b) Generate Metadata Archive To Ecogrid Registered Ecogrid Database Registered Ecogrid Database Registered Ecogrid Database Registered Ecogrid Database Environmental layers (invasion area) (b) Invasion area prediction map (f) Model quality parameter (g) Selected prediction maps (h) Source: NSF SEEK (Deana Pennington et. al, UNM)

Kepler, B. Ludäscher, SDSC 7

Starting Point for SDM- Center/SPA + SEEK: Ptolemy II see!see! try!try! read!read! Source: Edward Lee et al.

Kepler, B. Ludäscher, SDSC 9 An Early Example: Promoter Identification SSDBM, AD 2003 Scientist models application as a “workflow” of connected components (“actors”) If all components exist, the workflow can be automated/ executed Different directors can be used to pick appropriate execution model (often “pipelined” execution: PN director)

Kepler, B. Ludäscher, SDSC 10 Why Ptolemy II (and thus KEPLER)? Ptolemy II Objective: –“The focus is on assembly of concurrent components. The key underlying principle in the project is the use of well-defined models of computation that govern the interaction between components. A major problem area being addressed is the use of heterogeneous mixtures of models of computation.” Dataflow Process Networks w/ natural pipelining/streaming support User-Orientation –Workflow design & exec console (Vergil GUI) –“Application/Glue-Ware” excellent modeling and design support run-time support, monitoring, … not a middle-/underware (we use someone else’s, e.g. Globus, SRB, …) but middle-/underware is conveniently accessible through actors! PRAGMATICS –Ptolemy II is mature, continuously extended & improved, well-documented (500+pp) –open source system –Ptolemy II folks actively participate in KEPLER

Kepler, B. Ludäscher, SDSC 11 KEPLER: An Open Collaboration “Founding projects”: –DOE SDM/SPA and NSF SEEK Open Source (BSD-style license) Intensive Communications: –Web-archived mailing lists –IRC (!) Co-development: –via shared CVS repository –joining as a new co-developer (currently): get a CVS account (read-only) local development + contribution via existing KEPLER member be voted “in” as a member/co-developer Software & social engineering –How to better accommodate new groups/communities? –How to better accommodate different usage/contribution models (core dev … special purpose extender … user)?

Kepler, B. Ludäscher, SDSC 12 KEPLER/CSP: Contributors, Sponsors, Projects (or loosely coupled Communicating Sequential Persons ;-) Ilkay Altintas SDM, Resurgence Kim Baldridge Resurgence, NMI Chad Berkley SEEK Shawn Bowers SEEK Terence Critchlow SDM Tobin Fricke ROADNet Jeffrey Grethe BIRN Christopher H. Brooks Ptolemy II Zhengang Cheng SDM Dan Higgins SEEK Efrat Jaeger GEON Matt Jones SEEK Werner Krebs, EOL Edward A. Lee Ptolemy II Kai Lin GEON Bertram Ludaescher BIRN, SDM, SEEK, GEON Mark Miller EOL Steve Mock NMI Steve Neuendorffer Ptolemy II Jing Tao SEEK Mladen Vouk SDM Xiaowen Xin SDM Yang Zhao Ptolemy II Bing Zhu SEEK Ptolemy II

Kepler, B. Ludäscher, SDSC 13 History Gabriel ( ) –Written in Lisp –Aimed at signal processing –Synchronous dataflow (SDF) block diagrams –Parallel schedulers –Code generators for DSPs –Hardware/software co-simulators Ptolemy Classic ( ) –Written in C++ –Multiple models of computation –Hierarchical heterogeneity –Dataflow variants: BDF, DDF, PN –C/VHDL/DSP code generators –Optimizing SDF schedulers –Higher-order components Ptolemy II ( ) –Written in Java –Domain polymorphism –Multithreaded –Network integrated –Modal models –Sophisticated type system –CT, HDF, CI, GR, etc. PtPlot (1997-??) –Java plotting package Tycho ( ) –Itcl/Tk GUI framework Diva ( ) –Java GUI framework Copernicus (code generator) KEPLER ( ) –scientific workflow extensions Source (Ptolemy): Edward Lee et al. Ptolemy II: A laboratory for investigating design KEPLER: A problem-solving environment for Scientific Workflows KEPLER = “Ptolemy II + X” for Scientific Workflows

Kepler, B. Ludäscher, SDSC 14 KEPLER then …

Kepler, B. Ludäscher, SDSC 15 … and KEPLER today… What is HPC? … so,you see, scientific workflows need domain and data- polymorphic actors & must scale to HPC! What’s a scientific workflow? What’s a poly- morphic actor? BTW: Kepler is NOT a GUI (Vergil is)

Kepler, B. Ludäscher, SDSC 16 The KEPLER/Ptolemy II GUI (Vergil) “Directors” define the component interaction & execution semantics Large, polymorphic component (“Actors”) and Directors libraries (drag & drop)

Kepler, B. Ludäscher, SDSC 17 Actor-Oriented Design Object orientation: class name data methods call return What flows through an object is sequential control (cf. CCA, MPI) Actor/Dataflow orientation: actor name data (state) ports Input data parameters Output data What flows through an object is a stream of data tokens (in SWFs/KEPLER also references!!) Source: Edward Lee et al.

Kepler, B. Ludäscher, SDSC 18 Object-Oriented vs. Actor-Oriented Interfaces Actor/Dataflow Oriented AO interface definition says “Give me text and I’ll give you speech” OO interface gives procedures that have to be invoked in an order not specified as part of the interface definition. TextToSpeech initialize(): void notify(): void isReady(): boolean getSpeech(): double[] Object Oriented Source: Edward Lee et al.

Kepler, B. Ludäscher, SDSC 19 Ptolemy II: Actor-Oriented Modeling Component (“actor”) interaction semantics not hard-wired inside components, but “factored out” in a “director” Different directors for different modeling and execution needs (… can even be combined!)  Better abstraction, modeling, component reuse, …

Kepler, B. Ludäscher, SDSC 20 Behavioral Polymorphism in Ptolemy These polymorphic methods implement the communication semantics of a domain in Ptolemy II. The receiver instance used in communication is supplied by the director, not by the component. (cf. CCA, WS-??, [G]BPL4??, … !) producer actor consumer actor IOPort Receiver Director Behavioral polymorphism is the idea that components can be defined to operate with multiple models of computation and multiple middleware frameworks. Source: Edward Lee et al.

Kepler, B. Ludäscher, SDSC 21 Component Composition & Interaction Components linked via ports Dataflow (and msg/ctl-flow) Where is the component interaction semantics defined?? –each component is its own director! But still useful for special applications, e.g. parallel programs (MPI, …) Source: GRIST/SC4DEVO workshop, July 2004, Caltech DIR1 DIR2 DIR3 DIR4 ???

Kepler, B. Ludäscher, SDSC 22 Data/Control-Flow Spectrum Data (tokens) flow –(almost) no other side effects –WYSIWYG (usually) References flow –token reference type may be “http-get”, “ftp-get”, “hsi put”… –generic handling still possible Application specific tokens flow –e.g. current Nimrod job management in Resurgence –“invisible contract” between components –Director is unaware of what’s going on … (sounds familiar? ;-) Specific messages passing protocols (e.g., CSP, MPI) –for systems of tightly coupled components “clean” data(=ctl)-flow special tokens flow message passing, control flow

Kepler, B. Ludäscher, SDSC 23 CCA via special (“look the other way”) Director(s)? CCA!? Dataflow in CCA a CCA “convention” can be used to accommodate actor- oriented/dataflow modeling CCA/Message Passing in KEPLER Kepler/Ptolemy can be extended to accommodate message passing semantics (CSP is already in Ptolemy II)

Kepler, B. Ludäscher, SDSC 24 Domains and Directors: Semantics for Component Interaction CI – Push/pull component interaction CSP – concurrent threads with rendezvous CT – continuous-time modeling DE – discrete-event systems DDE – distributed discrete events FSM – finite state machines DT – discrete time (cycle driven) Giotto – synchronous periodic GR – 2-D and 3-D graphics PN – process networks SDF – synchronous dataflow SR – synchronous/reactive TM – timed multitasking Source: Edward Lee et al. For (coarse grained) Scientific Workflows! For (finer-grained) concurrent jobs!?

Kepler, B. Ludäscher, SDSC 25 Polymorphic Actor Components Working Across Data Types and Domains Actor Data Polymorphism : –Add numbers (int, float, double, Complex) –Add strings (concatenation) –Add complex types (arrays, records, matrices) –Add user-defined types Actor Behavioral Polymorphism: –In dataflow, add when all connected inputs have data –In a time-triggered model, add when the clock ticks –In discrete-event, add when any connected input has data, and add in zero time –In process networks, execute an infinite loop in a thread that blocks when reading empty inputs –In CSP, execute an infinite loop that performs rendezvous on input or output –In push/pull, ports are push or pull (declared or inferred) and behave accordingly –In real-time CORBA, priorities are associated with ports and a dispatcher determines when to add By not choosing among these when defining the component, we get a huge increment in component re- usability. But how do we ensure that the component will work in all these circumstances? Source: Edward Lee et al.

Kepler, B. Ludäscher, SDSC 26 Directors and Combining Different Component Interaction Semantics Source: Edward Lee et al. Possible app. in SWF: time-series aware … parameter-sweep aware … XY aware … … execution models

A Few Specific Kepler Features and Example Workflows

Kepler, B. Ludäscher, SDSC 28 Web Services  Actors (WS Harvester)  “Minute-made” (MM) WS-based application integration Similarly: MM workflow design & sharing w/o implemented components

Kepler, B. Ludäscher, SDSC 29 Recent Actor Additions

Kepler, B. Ludäscher, SDSC 30 Digression: Who are the clients? Domain scientists –C/Perl/Python/Java/WS/DB-enabled ones –others (the rest of us?) Goal: make the life better for both categories! –Workflow automation –Plumbing support –Execution monitoring, steering, runtime revision (pause-inspect-modify-resume cycle)

Kepler, B. Ludäscher, SDSC 31 GEON Mineral Classification Workflow

Kepler, B. Ludäscher, SDSC 32 … inside the Classifier BrowserUI actor w/ SVG client display

Kepler, B. Ludäscher, SDSC 33 GEON Dataset Generation & Registration (and co-development in KEPLER) Xiaowen (SDM) Edward et al.(Ptolemy) Yang (Ptolemy) Efrat (GEON) Ilkay (SDM) SQL database access (JDBC) Matt et al. (SEEK) % Makefile $> ant run % Makefile $> ant run

Kepler, B. Ludäscher, SDSC 34 GEON Data Registration UI

Kepler, B. Ludäscher, SDSC 35 GEON Data Registration in KEPLER

Kepler, B. Ludäscher, SDSC 36 Registered Resources show up in Vergil (joint SEEK, SPA, GEON, … Registry!?)

Kepler, B. Ludäscher, SDSC 37 Data Analysis: Biodiversity Indices

Kepler, B. Ludäscher, SDSC 38 Traffic info for a list of highways: Uses iterate (higher-order “map”) actor to access highway info web service repeatedly, sending out one per highway.

Kepler, B. Ludäscher, SDSC 39 Traffic info for a list of highways: Uses iterate (higher-order “map”) actor to access highway info web service repeatedly, sending out one per highway.

Kepler, B. Ludäscher, SDSC 40 Traffic info for a list of highways: Uses iterate (higher-order “map”) actor to access highway info web service repeatedly, sending out one per highway.

Kepler, B. Ludäscher, SDSC 41 Re-engineered PIW w/ Iteration Constructs AD 2004 map(GenbankWS) Input: {“NM_001924”, “NM020375”} Output: {“CAGT…AATATGAC",“GGGGA…CAAAGA“}

Kepler, B. Ludäscher, SDSC 42 Streaming Real-time Data Laser Strainmeter Channels in; Scientific Workflow; Earth-tide signal out Straightforward Example : Seismic Waveforms

Kepler, B. Ludäscher, SDSC 43 ORB

Kepler, B. Ludäscher, SDSC 44 Job Management (here: NIMROD) Job management infrastructure in place Results database: under development Goal: 1000’s of GAMESS jobs (quantum mechanics) – Fall/Winter’04

Kepler, B. Ludäscher, SDSC 45 KEPLER Today Support for SWF life cycle –Design, share, prototype, run, monitor, deploy, … Coarse-grained scientific workflows, e.g., –web service actors, grid actors, command-line actors, … Fine grained workflows and simulations, e.g., –Database access, XSLT transformations, … Kepler Extensions –SDM Center/SPA: support for data- and compute-intensive workflows! –real-time data streaming (ROADNet) –other special and generic extensions (e.g. GEON, SEEK) Status –first release (alpha) was in May 2004 –nightly builds w/ version tests –“Link-Up Sister Project” w/ other SWF systems (UK Taverna, Triana, …) –Participation in various workshops and conferences (GGF10, SSDBMs, eScience WF workshop, …)

Kepler, B. Ludäscher, SDSC 46 KEPLER Tomorrow Application-driven extensions: –access to/integration with other IDMAF components SciRUN?, PnetCDF?, PVFS(2)?, MPI-IO?, parallel-R?, ASPECT?, FastBit, … –support for execution of new SWF domains Astrophysics: TSI/Blondin (SPA/NCSU) Nuclear Physics: Swesty (SPA/LLNL) … Generic extensions: –addtl. support for data-intensive and compute-intensive workflows (all SRB Scommands, CCA support, …) –(C-z; bg; fg)-ing (“detach” and reconnect) –workflow deployment models Additional “domain awareness” (e.g. via new directors) –time series, parameter sweeps, job scheduling, … –hybrid type system with semantic types Consolidation –More installers, regular releases, improved documentation, …

Kepler, B. Ludäscher, SDSC 47 KEPLER & SPA First alpha releases since May 2004

Kepler, B. Ludäscher, SDSC 48 Breaking into the Parallel (MPI) and Stream Computing Worlds!? Clean functional semantics facilitates algebraic workflow (program) transformations (Bird-Meertens); e.g. mapS f mapS g  mapS (f g) Source: Real-Time Signal Processing: Dataflow, Visual, and Functional Programming, Hideki John Reekie, University of Technology, Sydney

Kepler, B. Ludäscher, SDSC 49 Hybrid Types (Structure + Semantics) Services can be semantically compatible, but structurally incompatible Source Service Source Service Target Service Target Service PsPs PtPt Semantic Type P s Semantic Type P t Structural Type P t Structural Type P s Desired Connection Incompatible Compatible (⋠)(⋠) (⊑)(⊑) (Ps)(Ps) (Ps)(Ps)  (≺)(≺) Ontologies (OWL) Source: [Bowers-Ludaescher, DILS’04]