NeSCR Dec-3 -2003 Bertram Ludaescher Scientific Workflows Based on Dataflow Process Networks (or from Ptolemy to Kepler) (or Workflow Considered Harmful.

Slides:



Advertisements
Similar presentations
Nimrod/K: Towards Massively Parallel Dynamic Grid Workflows David Abramson, Colin Enticott, Monash Ilkay Altinas, UCSD.
Advertisements

LEAD Portal: a TeraGrid Gateway and Application Service Architecture Marcus Christie and Suresh Marru Indiana University LEAD Project (
18 Copyright © 2005, Oracle. All rights reserved. Distributing Modular Applications: Introduction to Web Services.
Abstraction Layers Why do we need them? –Protection against change Where in the hourglass do we put them? –Computer Scientist perspective Expose low-level.
Database System Concepts and Architecture
Web Service Ahmed Gamal Ahmed Nile University Bioinformatics Group
1 XML Web Services Practical Implementations Bob Steemson Product Architect iSOFT plc.
A component- and message-based architectural style for GUI software
UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies Provenance Collection Support in the Kepler Scientific Workflow.
Data Model driven applications using CASE Data Models as the nucleus of software development in a Computer Aided Software Engineering environment.
G O B E Y O N D C O N V E N T I O N WORF: Developing DB2 UDB based Web Services on a Websphere Application Server Kris Van Thillo, ABIS Training & Consulting.
Lecturer: Sebastian Coope Ashton Building, Room G.18 COMP 201 web-page: Lecture.
6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 Distributed Computing in Kepler Ilkay Altintas Lead, Scientific Workflow Automation Technologies.
UvA, Amsterdam June 2007WS-VLAM Introduction presentation WS-VLAM Requirements list known as the WS-VLAM wishlist System and Network Engineering group.
Course Instructor: Aisha Azeem
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
TIBCO Designer TIBCO BusinessWorks is a scalable, extensible, and easy to use integration platform that allows you to develop, deploy, and run integration.
Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Biology.sdsc.edu CIPRes in Kepler: An integrative workflow package for streamlining phylogenetic data analyses Zhijie Guan 1, Alex Borchers 1, Timothy.
January, 23, 2006 Ilkay Altintas
FALL 2005CSI 4118 – UNIVERSITY OF OTTAWA1 Part 4 Web technologies: HTTP, CGI, PHP,Java applets)
Chapter 7: Architecture Design Omar Meqdadi SE 273 Lecture 7 Department of Computer Science and Software Engineering University of Wisconsin-Platteville.
Katanosh Morovat.   This concept is a formal approach for identifying the rules that encapsulate the structure, constraint, and control of the operation.
SDM Center A Quick Update on the TSI and PIW workflows SDM All Hands March 2-3, Terence Critchlow, Xiaowen Xin, Bertram.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
Scientific Workflow reusing and long term big data preservation Salima Benbernou Université Paris Descartes Project.
Composing Models of Computation in Kepler/Ptolemy II
Introduction for BEAM Ecological Niche Modeling Working Meeting Deana Pennington University of New Mexico December 14, 2004.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
An Introduction to Software Architecture
Data R&D Issues for GTL Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego Bertram Ludäscher
1 HKU CSIS DB Seminar: HKU CSIS DB Seminar: Web Services Oriented Data Processing and Integration Speaker: Eric Lo.
4/2/03I-1 © 2001 T. Horton CS 494 Object-Oriented Analysis & Design Software Architecture and Design Readings: Ambler, Chap. 7 (Sections to start.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 10Slide 1 Architectural Design l Establishing the overall structure of a software system.
SAMANVITHA RAMAYANAM 18 TH FEBRUARY 2010 CPE 691 LAYERED APPLICATION.
Pipelines and Scientific Workflows with Ptolemy II Deana Pennington University of New Mexico LTER Network Office Shawn Bowers UCSD San Diego Supercomputer.
NeSCR Dec Bertram Ludaescher Scientific Workflows Based on Dataflow Process Networks (or from Ptolemy to Kepler) (or Workflow Considered Harmful.
Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego
Accelerating Scientific Exploration Using Workflow Automation Systems Terence Critchlow (LLNL) Ilkay Altintas (SDSC) Scott Klasky(ORNL) Mladen Vouk (NCSU)
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 10Slide 1 Architectural Design l Establishing the overall structure of a software system.
Architectural Design Yonsei University 2 nd Semester, 2014 Sanghyun Park.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
Research Design for Collaborative Computational Approaches and Scientific Workflows Deana Pennington January 8, 2007.
An Ontological Framework for Web Service Processes By Claus Pahl and Ronan Barrett.
Rule-Based Programming for VORBs Bertram Ludaescher Arcot Rajasekar Data and Knowledge Systems San Diego Supercomputer Center U.C. San Diego.
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida DGL: The Assembly Language for Grid Computing Arun swaran Jagatheesan.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Kepler includes contributors from GEON, SEEK, SDM Center and Ptolemy II, supported by NSF ITRs (SEEK), EAR (GEON), DOE DE-FC02-01ER25486.
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
Toward interactive visualization in a distributed workflow Steven G. Parker Oscar Barney Ayla Khan Thiago Ize Steven G. Parker Oscar Barney Ayla Khan Thiago.
SEEK Science Environment for Ecological Knowledge l EcoGrid l Ecological, biodiversity and environmental data l Computational access l Standardized, open.
Slide 1 Service-centric Software Engineering. Slide 2 Objectives To explain the notion of a reusable service, based on web service standards, that provides.
On Using BPEL Extensibility to Implement OGSI and WSRF Grid Workflows Aleksander Slomiski Presented by Onyeka Ezenwoye CIS Advanced Topics in Software.
1 Service Oriented Architecture SOA. 2 Service Oriented Architecture (SOA) Definition  SOA is an architecture paradigm that is gaining recently a significant.
Satisfying Requirements BPF for DRA shall address: –DAQ Environment (Eclipse RCP): Gumtree ISEE workbench integration; –Design Composing and Configurability,
Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 Scientific Workflows for OOI Ilkay Altintas Charles.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Slide 1 Chapter 8 Architectural Design. Slide 2 Topics covered l System structuring l Control models l Modular decomposition l Domain-specific architectures.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
A service Oriented Architecture & Web Service Technology.
Efrat Jaeger – SDSC Bertram Ludäscher – UC DAVIS Krishna Sinha – Virginia Tech Ashraf Memon – SDSC Ghulam Memon – SDSC Ilkay Altintas – SDSC Kai Lin –
Part 3 Design What does design mean in different fields?
Service-centric Software Engineering
A Semantic Type System and Propagation
GGF10 Workflow Workshop Summary
Presentation transcript:

NeSCR Dec Bertram Ludaescher Scientific Workflows Based on Dataflow Process Networks (or from Ptolemy to Kepler) (or Workflow Considered Harmful …) Bertram Ludäscher San Diego Supercomputer Center

NeSCR Dec Bertram Ludaescher Overview 1.Scientific Workflow (SWF) Examples 2.SWF Requirements & Characteristics 3. Workflow standards considered harmful for SWF!? 4.Dataflow Process Networks (Ptolemy II) 5.Scientific Workflows (Kepler = Ptolemy II + X)

NeSCR Dec Bertram Ludaescher Acknowledgements I NSF, NIH, DOENSF, NIH, DOE GEOsciences Network (NSF)GEOsciences Network (NSF) – Biomedical Informatics Research Network (NIH)Biomedical Informatics Research Network (NIH) – Science Environment for Ecological Knowledge (NSF)Science Environment for Ecological Knowledge (NSF) –seek.ecoinformatics.org Scientific Data Management Center (DOE)Scientific Data Management Center (DOE) –sdm.lbl.gov/sdmcenter/

NeSCR Dec Bertram Ludaescher Acknowledgements II Ilkay Altintas SDMIlkay Altintas SDM Chad Berkley SEEKChad Berkley SEEK Shawn Bowers SEEKShawn Bowers SEEK Jeffrey Grethe BIRNJeffrey Grethe BIRN Christopher H. Brooks Ptolemy IIChristopher H. Brooks Ptolemy II Zhengang Cheng SDMZhengang Cheng SDM Efrat Jaeger GEONEfrat Jaeger GEON Matt Jones SEEKMatt Jones SEEK Edward A. Lee Ptolemy IIEdward A. Lee Ptolemy II Kai Lin GEONKai Lin GEON Bertram Ludaescher BIRN, GEON, SDM, SEEKBertram Ludaescher BIRN, GEON, SDM, SEEK Stephen Neuendorffer Ptolemy IIStephen Neuendorffer Ptolemy II Mladen Vouk SDMMladen Vouk SDM Yang Zhao Ptolemy IIYang Zhao Ptolemy II … Coming soon!?:Coming soon!?: –ROADNet, myGrid, GriPhyN,... Ptolemy II

NeSCR Dec Bertram Ludaescher Promoter Identification Workflow (PIW) Source: Matt Coleman (LLNL)

NeSCR Dec Bertram Ludaescher Promoter Identification Workflow in Ptolemy-II (SSDBM03) Execution Semantics

NeSCR Dec Bertram Ludaescher GARP Invasive Species Pipeline Training sample (d) GARP rule set (e) Test sample (d) Integrated layers (native range) (c) Species presence & absence points (native range) (a) EcoGrid Query EcoGrid Query Layer Integration Layer Integration Sample Data + A3 + A2 + A1 Data Calculation Map Generation Validation User Validation Map Generation Integrated layers (invasion area) (c) Species presence &absence points (invasion area) (a) Native range prediction map (f) Model quality parameter (g) Environmental layers (native range) (b) Generate Metadata Archive To Ecogrid Registered Ecogrid Database Registered Ecogrid Database Registered Ecogrid Database Registered Ecogrid Database Environmental layers (invasion area) (b) Invasion area prediction map (f) Model quality parameter (g) Selected prediction maps (h) Source: NSF SEEK (Deana Pennington et. al, UNM)

NeSCR Dec Bertram Ludaescher Rock & Mineral Classification Workflow

NeSCR Dec Bertram Ludaescher A Look Inside Classification Diagrams information and transitions between them. Extracted from the mineral composition and this levels diagram coordinates. SVG to polygons. Classifier: Locates the points region. Finer granularity Displays the point in the diagram for this level.

NeSCR Dec Bertram Ludaescher Source: NIH BIRN (Jeffrey Grethe, UCSD)

NeSCR Dec Bertram Ludaescher SWF Requirements & Characteristics Scientist friendly "problem solving environment"Scientist friendly "problem solving environment" –WF design –WF execution –WF steering and UI pause; revise; resume; rollback (cf. SCIRun) –repositories of reusable components –data and WF provenance (virtual data concept) logging, cache reuse/partial re-derive, reports, … –Conceptual modeling support complex data (semantics) support wiring support (cf. web service composition) planning support

NeSCR Dec Bertram Ludaescher SWF Requirements & Characteristics "Modeling" support"Modeling" support –Abstraction, hierarchical modeling –Models of Computation (MoC) –component interaction; combination of MoCs (cf. CCA) –WF multi-grain/granola: powder to bolders (and back) Boolean (N)AND, (N)OR,… vs. chaining together Grid-apps –Rich data structures and type systems End user "programming" supportEnd user "programming" support –high-level programming constructs e.g. map/3 for iteration, filter, select, branch, merge,... –data transformations –legacy tool integration (plug-ins) –data streaming How to tame (e.g., starve a dataflow; then resume)? Zauberlehrlings problem

NeSCR Dec Bertram Ludaescher SWF Requirements & Characteristics Grid-enabling SWFsGrid-enabling SWFs –transparent use of (remote) resources –big data –big computation requirements –early/late binding of logical to physical resources, … –planning, scheduling, … cf. Chimera, Pegasus, DAGman, Condor(-G)

NeSCR Dec Bertram Ludaescher Scientific Workflows: Some Findings More dataflow than (business) workflowMore dataflow than (business) workflow –but some branching looping, merging, … –not: documents/objects undergoing modifications –instead often: dataset-out = analysis(dataset-in) Need for programming extensionNeed for programming extension –Iterations over lists (foreach); filtering; functional composition; generic & higher-order operations (zip, map(f), …) Need for abstraction and nested workflowsNeed for abstraction and nested workflows Need for data transformations (compute/transform alternations)Need for data transformations (compute/transform alternations) Need for rich user interaction & workflow steering:Need for rich user interaction & workflow steering: –pause / revise / resume –select & branch; e.g., web browser capability at specific steps as part of a coordinated SWF Need for high-throughput transfers (grid-enabling, streaming)Need for high-throughput transfers (grid-enabling, streaming) Need for persistence of intermediate productsNeed for persistence of intermediate products data provenance (virtual data concept)

NeSCR Dec Bertram Ludaescher Scientific WF vs Business WF Scientific WorkflowsScientific Workflows –Dataflow and data transformations –Data problems: volume, complexity, heterogeneity –Grid-aspects Distributed computation Distributed data –User-interactions/WF steering –Data, tool, and analysis integration Dataflow and control-flow are married! Business WorkflowsBusiness Workflows –Process composition –Tasks, documents, etc. undergo modifications (e.g., flight reservation from reserved to ticketed), but modified WF objects still identifiable throughout –Complex control flow, task-oriented: travel reservations; credit approval Dataflow and control-flow are divorced!

NeSCR Dec Bertram Ludaescher A ZOO of Workflow Standards and Systems Source: W.M.P. van der Aalst et al. Source: W.M.P. van der Aalst et al.

NeSCR Dec Bertram Ludaescher Business Workflows Business WorkflowsBusiness Workflows –show their office automation ancestry –documents and work-tasks are passed –no data streaming, no data-intensive pipelines –lots of standards to choose from: WfMC, WSFL, BMPL, BPEL4WS,.. XPDL,… –but often no clear execution semantics for constructs as simple as this: Source: Expressiveness and Suitability of Languages for Control Flow Modelling in Workflows, PhD thesis, Bartosz Kiepuszewski, 2002

NeSCR Dec Bertram Ludaescher On Workflow Standards…

NeSCR Dec Bertram Ludaescher Workflow Standards Debunked Source: Dont go with the flow:Web services composition standards exposed,W.M.P. van der Aalst, Trends & Controversies, Jan/Feb 2003 issue of IEEE Intelligent Systems Web Services - Been there done that?

NeSCR Dec Bertram Ludaescher Workflow Standards Debunked Source: Dont go with the flow:Web services composition standards exposed,W.M.P. van der Aalst, Trends & Controversies, Jan/Feb 2003 issue of IEEE Intelligent Systems Web Services - Been there done that?

NeSCR Dec Bertram Ludaescher But never mind the standards discussion: Many Scientific Workflows are Dataflows! (Check YOUR examples …)

NeSCR Dec Bertram Ludaescher Commercial Workflow/Dataflow Systems

NeSCR Dec Bertram Ludaescher SCIRun: Component-Based Problem Solving Environments for Large-Scale Scientific Computing SCIRun: problem solving environment for interactive construction, debugging, and steering of large-scale scientific computationsSCIRun: problem solving environment for interactive construction, debugging, and steering of large-scale scientific computations Component model, based on generalized dataflow programmingComponent model, based on generalized dataflow programming Contact: Steve Parker (cs.utah.edu); SciDAC/SDM collaborationContact: Steve Parker (cs.utah.edu); SciDAC/SDM collaboration

NeSCR Dec Bertram Ludaescher Workflow and distributed computation grid created with Kensington Discovery Edition from InforSense.

NeSCR Dec Bertram Ludaescher Dataflow Process Networks: Putting Computation Models first! Synchronous Dataflow Network (SDF)Synchronous Dataflow Network (SDF) –Statically schedulable single-threaded dataflow Can execute multi-threaded, but the firing-sequence is known in advance –Maximally well-behaved, but also limited expressiveness Process Network (PN)Process Network (PN) –Multi-threaded dynamically scheduled dataflow –More expressive than SDF (dynamic token rate prevents static scheduling) –Natural streaming model Other Execution Models (Domains)Other Execution Models (Domains) –Implemented through different Directors actor typed i/o ports FIFO advanced push/pull

NeSCR Dec Bertram Ludaescher Dataflow Process Networks and Ptolemy-II see!see! try!try! read!read! Source: Edward Lee et al.

NeSCR Dec Bertram Ludaescher Why Ptolemy-II? PTII Objective:PTII Objective: –The focus is on assembly of concurrent components. The key underlying principle in the project is the use of well-defined models of computation that govern the interaction between components. A major problem area being addressed is the use of heterogeneous mixtures of models of computation. Data & Process oriented:Data & Process oriented: –Dataflow process networks Natural Data Streaming SupportNatural Data Streaming Support End user WF console (Vergil GUI)End user WF console (Vergil GUI) PRAGMATICSPRAGMATICS –mature, actively maintained, well-documented –open source system –leverage sister projects activities (e.g. SEEK, SDM, BIRN,…)

NeSCR Dec Bertram Ludaescher Source: Edward Lee et al.

NeSCR Dec Bertram Ludaescher Source: Edward Lee et al.

NeSCR Dec Bertram Ludaescher Marrying & Divorcing Control- & Dataflow Source: Edward Lee et al.

NeSCR Dec Bertram Ludaescher Another Goodie: Ptolemy-II Type System

NeSCR Dec Bertram Ludaescher Support for Multiple Workflow Granularities Bolders Abstraction: Sand to Rocks Sand Powder Plumbing

NeSCR Dec Bertram Ludaescher Scientific Workflows = Dataflow Process Networks + X X = …X = … –Database plug-ins –Legacy application plug-ins (via command line, as web services, …) –Grid extensions: Actors as web/grid services 3 rd party data transfer, high-throughput data streaming Dealing with thousands of files (cf. astrophysics, astronomy, HEP, … examples) Data and service repositories, discovery Extended type system (structural & semantic extensions) –Programming extensions (declarative/FP) and –Rich user interactions/workflow steering –Rich data transformations (compute/transform alternations) –Data provenance (semi-)automatic meta-data creation Kepler = Ptolemy-II + X

NeSCR Dec Bertram Ludaescher Status update / specific tasks for Kepler $DONE, %ONGOING, *NEW User interaction, workflow steeringUser interaction, workflow steering –$ Pause/revise/resume –$ BrowserUI actor (browser as a 0-learning display and selection tool) Distributed executionDistributed execution –$ Dynamically port-specializing WSDL actor –* Dynamically specializing Grid service actor Port & actor type extensions (SEEK leverage)Port & actor type extensions (SEEK leverage) –* Structural types (XML Schema) –* Semantic types (OWL) incl. unit types w/ automatic conversion Programming extensionsProgramming extensions –% Data transformation actors (XSLT, XQuery, Python, Perl,…) –* map, zip, zipWith, …, loop, switch patterns Specialized Data SourcesSpecialized Data Sources –$ EML (SEEK), –% MS Access (GEON), *JDBC, –*XML, *NetCDF, …

NeSCR Dec Bertram Ludaescher Some specific tasks for Kepler (all NEW) Design & develop transparent, Grid-enabled PNs:Design & develop transparent, Grid-enabled PNs: –Communication protocol details –Grid-actor extensions and/or –Grid-Process Network director (G-PN) –Host/Source-location becomes actor parameter add active-inline parameter display for grid-actors channels source-actors Activity MonitoringActivity Monitoring –Add activity status display (green, yellow, red) to replace PtII animation (needed for concurrently executing PN!) Registration & Deployment mechanismsRegistration & Deployment mechanisms –Actor/Data/Workflow repository (=composite actors) –Shows up as (configable) actor library –OGSA Service Registry approach? ( SEEK leverage; UDDI complex & limited says MattJ ) Extensions to deal with failures (fault tolerance)Extensions to deal with failures (fault tolerance)

NeSCR Dec Bertram Ludaescher Example: Database actors for Ptolemy II (Kepler-GEON; Efrat Jaeger)

Database Actors Database Connection actor:Database Connection actor: Database Query actor:Database Query actor:

Database Actors Example

NeSCR Dec Bertram Ludaescher Example: Web service-enabling Ptolemy II (Kepler-SDM; Ilkay Altintas)

NeSCR Dec Bertram Ludaescher A Generic Web Service Actor Configure – select WSDL url from repository Configure - select service operation

NeSCR Dec Bertram Ludaescher Set Parameters and Commit Specialized Actor Set parameters and commit

NeSCR Dec Bertram Ludaescher Web Service Actor after Instantiation

NeSCR Dec Bertram Ludaescher Composing Third-Party Web Services Output of previous web service User interaction & Transformations Input of next web service

NeSCR Dec Bertram Ludaescher Results of the Execution User I/O via standard brower! Run Window / WF Deployment

NeSCR Dec Bertram Ludaescher Composing Legacy Applications (here: Phylogeny): Shell / Command-Line Actors

NeSCR Dec Bertram Ludaescher Example: Grid-enabling Ptolemy II ( Kepler-SEEK, Chad Berkley Kepler-SDM, Ilkay Altintas, … myGrid?, … …GriPhyN?, … … OGS{I|A}-[DAI]...)

NeSCR Dec Bertram Ludaescher Transparently Grid-Enabling PTII: Handles AB GAGB 1.A GA: get_handle 2.GA A: return &X 3.A B: send &X 4.B GB: request &X 5.GB GA: request &X 6.GA GB: send *X 7.GB B: send done(&X) Example: &X = GA.17 *X = PTII space Grid space Logical token transfer (3) requires get_handle(1,2); then exec_handle(4,5,6,7) for completion.

NeSCR Dec Bertram Ludaescher Transparently Grid-Enabling PTII Different phasesDifferent phases –Register designed WF (could include external validation service) –Find suitable grid service hosts for actors –Pre-stage execution –Execute (w/ provenance) Interactively steer (pause; revise; resume) Batch process; re-run parts later –Register/store data products and execution logs Kepler implementation choices:Kepler implementation choices: –Grid-actors (no change of Director necessary!?) and/or –Grid-(PN)-director (also need to change actors!?) –Add grid service host id as actor parameter: –Similar for data:

NeSCR Dec Bertram Ludaescher C-z ; bf & – Detach your WF execution! Currently in PTIICurrently in PTII –tight coupling of WF execution and PTII Java client (also Vergil GUI) To-do for Kepler:To-do for Kepler: –detaching WF console (Vergil) from a Grid-aware execution engine Grid-PN Director! Transport protocol parameter Data location parameter Host location parameter

NeSCR Dec Bertram Ludaescher Semantic Type-enabling Ptolemy II (OWL – here we go… ;-) (Kepler-SEEK; Shawn Bowers)

NeSCR Dec Bertram Ludaescher Semantic Type Extensions Take concepts and relationships from an ontology to semantically type the data-in/out portsTake concepts and relationships from an ontology to semantically type the data-in/out ports Application: e.g., design support:Application: e.g., design support: –smart/semi-automatic wiring, generation of massaging actors m 1 (normalize) p3p3 p4p4 Takes Abundance Count Measurements for Life Stages Returns Mortality Rate Derived Measurements for Life Stages

NeSCR Dec Bertram Ludaescher

Semantic Types The semantic type signatureThe semantic type signature –Type expressions over the (OWL) ontology m 1 (normalize) p3p3 p4p4 SemType m1 :: Observation & itemMeasured.AbundanceCount & hasContext.appliesTo.LifeStageProperty -> DerivedObservation & itemMeasured.MortalityRate & hasContext.appliesTo.LifeStageProperty

NeSCR Dec Bertram Ludaescher Extended Type System (here: OWL Semantic Types) SemType m1 :: Observation & itemMeasured.AbundanceCount & hasContext.appliesTo.LifeStageProperty DerivedObservation & itemMeasured.MortalityRate & hasContext.appliesTo.LifeStageProperty Substructure association: XML raw-data =(X)Query=> object model =link => OWL ontology

NeSCR Dec Bertram Ludaescher Programming Extensions (some lessons from SciDAC/SSDBM demo)

NeSCR Dec Bertram Ludaescher Promoter Identification Workflow in Ptolemy-II (SSDBM03) hand-crafted control solution; also: forces sequential execution! designed to fit hand-crafted Web-service actor Complex backward control-flow No data transformations available

NeSCR Dec Bertram Ludaescher Promoter Identification Workflow in FP genBankG :: GeneId -> GeneSeq genBankP :: PromoterId -> PromoterSeq blast :: GeneSeq -> [PromoterId] promoterRegion :: PromoterSeq -> PromoterRegion transfac :: PromoterRegion -> [TFBS] gpr2str :: (PromoterId, PromoterRegion) -> String d0 = Gid "7" -- start with some gene-id d1 = genBankG d0 -- get its gene sequence from GenBank d2 = blast d1 -- BLAST to get a list of potential promoters d3 = map genBankP d2 -- get list of promoter sequences d4 = map promoterRegion d3 -- compute list of promoter regions and... d5 = map transfac d get transcription factor binding sites d6 = zip d2 d4 -- create list of pairs promoter-id/region d7 = map gpr2str d6 -- pretty print into a list of strings d8 = concat d7 -- concat into a single "file" d9 = putStr d8 -- output that file

NeSCR Dec Bertram Ludaescher Cleaned up Process Network PIW Back to purely functional dataflow process networkBack to purely functional dataflow process network (= also a data streaming model!) Re-introducing map (f) to Ptolemy-II (was there in PT Classic)Re-introducing map (f) to Ptolemy-II (was there in PT Classic) no control-flow spaghetti data-intensive apps free concurrent execution free type checking automatic support to go from piw(GeneId) to PIW := map (piw) over [GeneId] map (f)-style iterators Powerful type checking Generic, declarative programming constructs Generic data transformation actors Forward-only, abstractable sub- workflow piw(GeneId)

NeSCR Dec Bertram Ludaescher Optimization by Declarative Rewriting I PIW as a declarative, referentially transparent functional processPIW as a declarative, referentially transparent functional process optimization via functional rewriting possible e.g. map(f o g) = map(f) o map(g) Details:Details: –Technical report &PIW specification in Haskell map(f o g) instead of map(f) o map(g) Combination of map and zip

NeSCR Dec Bertram Ludaescher Optimizing II: Streams & Pipelines Clean functional semantics facilitates algebraic workflow (program) transformations (Bird-Meertens); e.g. mapS f mapS g mapS (f g)Clean functional semantics facilitates algebraic workflow (program) transformations (Bird-Meertens); e.g. mapS f mapS g mapS (f g) Source: Real-Time Signal Processing: Dataflow, Visual, and Functional Programming, Hideki John Reekie, University of Technology, Sydney

NeSCR Dec Bertram Ludaescher Summary Many (most of ours anyways) scientific workflows are dataflowsMany (most of ours anyways) scientific workflows are dataflows –lots of workflow standards (messy and not focused on SWF problems) –should we start a new wave of dataflow standards?? Importance of clear semantics forImportance of clear semantics for –different MoCs (models of computation: PN, SDF, DE, CT, …) –component composition across MoCs –component interaction – Ptolemy II directors Kepler:Kepler: –Based on extensible Ptolemy II system –Cross-project activity (SEEK, SDM, Ptolemy II, GEON, BIRN, and counting) –Plug-in / interface with your SWF planner, execution engine, grid-WF tool!

NeSCR Dec Bertram Ludaescher Your Projects & Icons Your Projects & Icons

NeSCR Dec Bertram Ludaescher A Note on the Style of these Slides Due to lack of time, most of the following slides are by reference only ;-) – …Each speaker was given four minutes to present his paper, as there were so many scheduled from 64 different countries. To help expedite the proceedings, all reports had to be distributed and studied beforehand, while the lecturer would speak only in numerals, calling attention in this fashion to the salient paragraphs of his work.... Stan Hazelton of the U.S. delegation immediately threw the hall into a flurry by emphatically repeating: 4, 6, 11, and therefore 22; 5, 9, hence 22; 3, 7, 2, 11, from which it followed that 22 and only 22!! Someone jumped up, saying yes but 5, and what about 6, 18, or 4 for that matter; Hazelton countered this objection with the crushing retort that, either way, 22. I turned to the number key in his paper and discovered that 22 meant the end of the world… [The Futurological Congress, Stanislaw Lem, translated from the Polish by Michael Kandel, Futura 1977]

NeSCR Dec Bertram Ludaescher F I N: Words to/from the Wise FYI: Flow-based programming has been re-discovered/re-invented several times by different communities. Here is an IBM practitioners view: – Flow-based Programming, … In "Flow-Based Programming" (FBP), applications are defined as networks of "black box" processes, which exchange data across predefined connections. These black box processes can be reconnected endlessly to form different applications without having to be changed internally. It is thus naturally component-oriented. To describe this capability, the distinguished IBM engineer, Nate Edwards, coined the term "configurable modularity", which he calls the basis of all true engineered systems. When using FBP, the application developer works with flows of data, being processed asynchronously, rather than the conventional single hierarchy of sequential, procedural code. It is thus a good fit with multiprocessor computers, and also with modern embedded software. In many ways, an FBP application resembles more closely a real-life factory, where items travel from station to station, undergoing various transformations. Think of a soft drink bottling factory, where bottles are filled at one station, capped at the next and labelled at yet another one. FBP is therefore highly visual: it is quite hard to work with an FBP application without having the picture laid out on one's desk, or up on a screen! For an example, see Sample DrawFlow Diagram.Sample DrawFlow Diagram Strangely though, in spite of being at the leading edge of application development, it is also simple enough that trainee programmers can pick it up, and it is a much better match with the primitives of data processing than the conventional primitives of procedural languages. The key, of course (and perhaps the reason why it hasn't caught on more widely), is that it involves a significant paradigm shift that changes the way you look at programming, and once you have made this transition, you find you can never go back! FBP seems to dovetail neatly with a concept that I call "smart data". There is a section on this in stuff about the author. A new web page on this topic has just been uploaded - see "Smart Data" and Business Data Types - and we will be publishing more as it develops.stuff about the author"Smart Data" and Business Data Types …