Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Ilkay ALTINTAS - October, 2007 Ilkay ALTINTAS Lab Director, Scientific Workflow Automation Technologies San Diego Supercomputer Center, UCSD Kepler Scientific.

Similar presentations


Presentation on theme: "1 Ilkay ALTINTAS - October, 2007 Ilkay ALTINTAS Lab Director, Scientific Workflow Automation Technologies San Diego Supercomputer Center, UCSD Kepler Scientific."— Presentation transcript:

1 1 Ilkay ALTINTAS - October, 2007 Ilkay ALTINTAS Lab Director, Scientific Workflow Automation Technologies San Diego Supercomputer Center, UCSD Kepler Scientific Workflows : Current and Future Development

2 2 Ilkay ALTINTAS - October, 2007 Scientific Workflow Systems Combination of –data integration, analysis, and visualization steps –automated "scientific process” Mission of scientific workflow systems –Promote “scientific discovery” by providing tools and methods to generate scientific workflows –Create an extensible and customizable graphical user interface for scientists from different scientific domains –Support computational experiment creation, execution, sharing, reuse and provenance –Design frameworks which define efficient ways to connect to the existing data and integrate heterogeneous data from multiple resources

3 3 Ilkay ALTINTAS - October, 2007 Ptolemy II: A laboratory for investigating design KEPLER: A problem-solving environment for Scientific Workflow KEPLER = “Ptolemy II + X” for Scientific Workflows Kepler is a Scientific Workflow System … and a cross-project collaboration 3rd Beta release (Jan 8, 2007) www.kepler-project.org Builds upon the open- source Ptolemy II framework

4 4 Ilkay ALTINTAS - October, 2007 Kepler use cases represent many science domains! Ecology –SEEK: Ecological Niche Modeling and climate change –REAP: Modeling parasite invasions in grasslands using sensor networks –NEON: Ecological sensor networks –COMET: Environmental science Geosciences –GEON: LiDAR data processing, Geological data integration –NEESit: Earthquake engineering Molecular biology –SDM: Gene promoter identification and ScalaBLAST –ChIP-chip: Genome-scale research –CAMERA: Metagenomics Oceanography –REAP: SST data processing –LOOKING/OOI CI: ocean observing CI –ROADNet: real-time data modeling and analysis –Ocean Life project Phylogenetics –ATOL: Processing Phylodata –CiPRES: Phylogentic tools Chemistry –Resurgence: Computational chemistry –DART/ARCHER: X-Ray crystallography Library science –DIGARCH: Digital preservation –UK Text Mining Center: Cheshire feature and archival Conservation biology –SanParks: Thresholds of Potential Concerns Physics –SDM: astrophysics TSI-1 and TSI-2 –CPES: Plasma fusion simulation –ITER-EU: ITM fusion workflows

5 5 Ilkay ALTINTAS - October, 2007 Kepler today is a research prototype and a production workflow tool! Some of the current R&D –Distributed execution of workflow parts (peer to peer) –Efficient data transfer –Provenance tracking of data and processes –Tracking workflow evolution –Streaming data analysis –Easy-to-deploy batch interfaces –Intuitive workflow design –Customizable semantic typing –Interoperability with other workflow and analytical environments (at exec level) Production workflow examples: –GEON LiDAR workflow (GLW) 116 registered, 106 active users 2076 submitted jobs to date –Center for Plasma Edge Simulation Code-Coupling Workflow (CPES-CCW) 2000 actors, 5 levels of model hierarchy Longest run duration 3 hours –PtII AirForce Lab Model 12920 actors, 65331 attributes Longest run duration: 10 minutes –Longest running real-time simple monitoring model in PtII - months at a time All generated using the GUI and executed in batch mode… –No coding and text manipulation

6 6 Ilkay ALTINTAS - October, 2007 REAP: Realtime Environment for Analytical Processing Funded 2006-2009 –NSF CEO:P Jones(PI), Altintas, Baru, Ludaescher, Schildhauer –Partners: NCEAS/UCSB (Lead), SDSC/UCSD, UCDavis, CENS/UCLA, OpenDAP, OSU Management and Analysis of Observatory Data using Kepler Scientific Workflows The vision: –An integrated environment for analyzing data from observatories Two scientific use cases: –Terrestrial ecology –Oceanography reap.ecoinformatics.org

7 7 Ilkay ALTINTAS - October, 2007 REAP Views For data-grid engineers –monitoring and management capabilities of underlying sensor networks For outside users –access to observatory data and results of models, approachable to non- scientists. For scientists –capabilities for designing and executing complex analytical models over near real-time and archived data sources

8 8 Ilkay ALTINTAS - October, 2007 REAP: Terrestrial Ecology Usecase Workflows to develop and test models exploring the impacts of abiotic factors (real-time light, temperature, and rainfall measurements) on the dynamics of plant host populations and their susceptibility to viral pathogens.

9 9 Ilkay ALTINTAS - October, 2007 REAP: RBNB Streaming Data Actor Example data from Terrestrial UseCase Hardware: a Campbell Scientific CR800 datalogger with eight attached sensors, operating on a workbench.

10 10 Ilkay ALTINTAS - October, 2007 REAP: Oceanographic Usecase Facilitate the quantitative evaluation of SST data sets.

11 11 Ilkay ALTINTAS - October, 2007 Kepler/C.O.R.E Funded 2007-2010 –NSF SDCI Ludaescher(PI), Altintas, Bowers, Jones, Mc Phillips, Schildhauer –Partners: Genome Center/UCDavis (Lead), SDSC/UCSD, NCEAS/UCSB SDCI NMI Improvement: Development of Kepler/CORE – A Comprehensive, Open, Reliable, and Extensible Scientific Workflow Infrastructure The vision: –Coordinate development of a comprehensive, open, reliable and extensible Kepler scientific workflow infrastructure kepler-project.org Builds on community participation as a driving force for Kepler.

12 12 Ilkay ALTINTAS - October, 2007 Kepler/C.O.R.E. Comprehensive –First-class support for technical features Open –well designed and clearly articulated mechanisms and interfaces provided to facilitate developing extensions Reliable –Both as a development platform and as a run-time environment for the user Extensible –Independently extensible by groups not directly collaborating with the team

13 13 Ilkay ALTINTAS - October, 2007 Directors in Kepler Means to execute networks of components under multiple execution models –Dataflow (SDF, PN, DDF) vs. time-based (CT) vs. event-based (DE) vs. all combined Makes use of separation of concerns principle –e.g., component execution, workflow execution and provenance tracking The manager acts like a “common execution environment” –governing different concerns related to execution of the network and services Ptolemy and Kepler are unique in combining different execution models in heterogeneous models! Process Networks Rendezvous Publish and Subscribe Continuous Time Finite State Machines Dataflow Time Triggered Synchronous/reactive model Discrete Event Wireless

14 14 Ilkay ALTINTAS - October, 2007 Credits Kepler community and colleagues On REAP and Kepler/CORE: –Shawn Bowers, Bertram Ludaescher, Timothy Mc Phillips, Genome Center, UCD –Matt Jones, Derik Barseghian, Mark Schildhauer, NCEAS, UCSB –Eric Seabloom, OSU –Peter Cornillion, OpenDAP

15 15 Ilkay ALTINTAS - October, 2007 Ilkay Altintas altintas@sdsc.edu +1 (858) 822-5453 http://www.sdsc.edu Questions…


Download ppt "1 Ilkay ALTINTAS - October, 2007 Ilkay ALTINTAS Lab Director, Scientific Workflow Automation Technologies San Diego Supercomputer Center, UCSD Kepler Scientific."

Similar presentations


Ads by Google