Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Kepler Project Overview, Status, and Future Directions Matthew B. Jones on behalf of the Kepler Project team National Center for Ecological Analysis.

Similar presentations


Presentation on theme: "The Kepler Project Overview, Status, and Future Directions Matthew B. Jones on behalf of the Kepler Project team National Center for Ecological Analysis."— Presentation transcript:

1 The Kepler Project Overview, Status, and Future Directions Matthew B. Jones on behalf of the Kepler Project team National Center for Ecological Analysis and Synthesis University of California, Santa Barbara

2 SWDBAug 29, 2004 The Kepler Project Goals Produce an open-source scientific workflow system enable scientists to design scientific workflows and execute them Support scientists in a variety of disciplines e.g., biology, ecology, astronomy Important features access to scientific data flexible means for executing complex analyses enable use of Grid-based approaches to distributed computation semantic models of scientific tasks effective UI for workflow design

3 SWDBAug 29, 2004 Kepler Collaboration Open-source Builds on Ptolemy II Collaborators SEEK Project SciDAC SDM Center Ptolemy Project GEON Project ROADNet Project Resurgence Project Goals Create powerful analytical tools that are useful across disciplines Ecology, Biology, Engineering, Geology, Physics, Chemistry, Astronomy, … Ptolemy II

4 SWDBAug 29, 2004 Usage statistics Source code access 154 people accessed source code 30 members have write permission –Projects using Kepler: SEEK (ecology) SciDAC (molecular bio,...) CPES (plasma simulation) GEON (geosciences) CiPRes (phylogenetics) CalIT2 ROADnet (real-time data) LOOKING (oceanography) CAMERA (metagenomics) Resurgence (Computational chemistry) NORIA (ocean observing CI) NEON (ecology observing CI) ChIP-chip (genomics) COMET (environmental science) Cheshire Digital Library (archival) Digital preservation (DIGARCH) Cell Biology (Scripps) DART (X-Ray crystallography) Ocean Life Assembling theTree of Life project Processing Phylodata (pPOD) FermiLab (particle physics) Kepler downloads Total = 9204 Beta = 6675 red=Windows blue=Macintosh

5 SWDBAug 29, 2004 Kepler advances Data and Actor search EarthGrid data access system Kepler Component Library Kepler Archive (KAR) format Integrated support for LSID identifiers for all objects Object Manager and cache Web service execution RExpression & MatlabExpression actors Redesigned user interface Authentication subsystem Null-value handling

6 SWDBAug 29, 2004 More advances Documentation Collection-oriented workflows (COMAD) Domain-specific actors for case studies e.g., GARP, phylogenetics actors Provenance system Grid computing support NIMROD, Globus, ssh,... Semantics support annotation, search, workflow validation, integration

7 SWDBAug 29, 2004 Distributed execution Opportunities for parallel execution Fine-grained parallelism Coarse-grained parallelism Few or no cycles Limited dependencies among components ‘Trivially parallel’ Many science problems fit this mold parameter sweep, iteration of stochastic models Current ‘plumbing’ approaches to distributed execution workflow acts as a controller stages data resources writes job description files controls execution of jobs on nodes requires expert understanding of the Grid system Scientists need to focus on just the computations try to avoid plumbing as much as possible

8 SWDBAug 29, 2004 Higher-order component for executing a model on one or more remote nodes Master and slave controllers handle setup and communication among nodes, and establish data channels Extremely easy for scientist to utilize requires no knowledge of grid computing systems Distributed Kepler OUT IN MasterSlave Controller

9 SWDBAug 29, 2004 Token {1,5,2} Need for integrated management of external data EarthGrid access is partial, need refactoring Include other data sources, such as JDBC, OpeNDAP, etc. Data needs to be a first class object in Kepler, not just represented as an actor Need support for data versioning to support provenance e.g., Need to pass data by reference workflows contain large data tokens (100’s of megabytes) intelligent handling of unique identifiers (e.g., LSID) Token ref-276 {1,5,2} Data Management AB

10 SWDBAug 29, 2004 New projects: REAP Management and Analysis of Environmental Observatory Data using the Kepler Scientific Workflow System Extend Kepler to: Manage and monitor sensor networks Consume data from sensors Integrate sensor data handling with data archive handling Terrestrial ecology and oceanography use cases PIs Jones Altintas Estrin Seabloom Gallagher Cornillon Hosseini Institutions UCSB UCD UCSD UCLA OSU OpeNDAP Ludäscher Schildhauer Reichman Baru Potter Borer

11 SWDBAug 29, 2004 REAP breakdown

12 SWDBAug 29, 2004 New projects: ChIP-chip A Collaborative Scientific Workflow Environment for Accelerating Genome-Scale Biological Research CS/IT: Ludaescher, Bowers, McPhillips Bio: Peggy Farnham, Mark Bieda Integrate a web-based "experiment workspace" environment with a flexible scientific workflow system Support rapid prototyping and easy addition of new "methods" templates details of how key steps are left out until runtime, then late-binding of one or more specific algorithms or data which is the best motif-finding algorithm? parts of workflow have similar set of steps need to compare results from parallel analyses Support client-server (i.e., enterprise) deployments for group/lab- wide collaboration different people have different roles [Software dev (Tim), Bioinformatics specialist (Mark), Biologist (Peggy)]

13 SWDBAug 29, 2004 external components and services ChIP-chip Data Analysis (ChIPOTle, HMM, …) Motif Finding Algorithms (MEME, MDscan, …) Visualization Packages and Statistics Tools Public Databases & Services (GenBank, David,TransFac, …) Component Specification (wrapping, integration, and creation of components) Workflow Specification (workflow design and template creation) T1T1 F2F2 F1F1 T2T2 F2F2 F3F3 Workflow Automation (configuration and execution support) Configuration Management Execution Management Monitoring Support (1) select design template and configure (2) generate optimized executable workflow Provenance Tracking component repository design repository provenance repository Experiment Workspace (setup, run, and manage) experiment repository Setup “protocol” Import/Export Data Data Display, Visualization Run Experiment Peggy “biologist” Mark “bioinformatics specialist” Tim “software developer” Kepler Workflow Engine Figure from Bowers and McPhillips

14 SWDBAug 29, 2004 Kepler C.O.R.E. proposal Development of Kepler CORE -- A Comprehensive, Open, Robust, and Extensible Scientific Workflow Infrastructure Ludäscher, Altintas, Bowers, Jones, McPhillips Goals Reliable refactored build more modular design improved engineering practices Independently extensible Open architecture, open project improved governance

15 SWDBAug 29, 2004 Kepler C.O.R.E. -- Extensibility

16 SWDBAug 29, 2004 Kepler C.O.R.E. -- Governance

17 SWDBAug 29, 2004 Kepler C.O.R.E. -- Sustainability How does Kepler persist? Now, via research grants unsustainable for production purposes Future new models for financial support support contracts? extension contracts? new science domains? continued research dollars? foundations? exploring 501.3c organization that can sustain Kepler and similar open-source initiatives

18 SWDBAug 29, 2004 Funding The National Science Foundation under Grant Numbers 9980154, 9904777, 0131178, 9905838, 0129792, 0225676, and 0619060. The National Center for Ecological Analysis and Synthesis, a Center funded by NSF (Grant Number 0072909), the University of California, and the UC Santa Barbara campus. The Andrew W. Mellon Foundation The Department of Energy Collaborators NCEAS (UC Santa Barbara), University of New Mexico (Long Term Ecological Research Network Office), San Diego Supercomputer Center, University of Kansas, University of Vermont, University of North Carolina, Napier University, Arizona State University, UC Davis Kepler contributors SEEK, Ptolemy II, SDM/SciDAC, GEON, RoadNet, EOL, Resurgence Acknowledgements


Download ppt "The Kepler Project Overview, Status, and Future Directions Matthew B. Jones on behalf of the Kepler Project team National Center for Ecological Analysis."

Similar presentations


Ads by Google