Applications and Requirements for Scientific Workflow May 1 2006 NSF Geoffrey Fox Indiana University.

Slides:



Advertisements
Similar presentations
National e-Science Centre Glasgow e-Science Hub Opening: Remarks NeSCs Role Prof. Malcolm Atkinson Director 17 th September 2003.
Advertisements

DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
1 Challenges and New Trends in Data Intensive Science Panel at Data-aware Distributed Computing (DADC) Workshop HPDC Boston June Geoffrey Fox Community.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Presentation at WebEx Meeting June 15,  Context  Challenge  Anticipated Outcomes  Framework  Timeline & Guidance  Comment and Questions.
Workshop on Workflows in Support of Large-Scale Science June 20, Paris, France In conjunction with HPDC 2006 HPDC Ewa Deelman,
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21) NSF-wide Cyberinfrastructure Vision People, Sustainability, Innovation,
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,
Using Sakai to Support eScience Sakai Conference June 12-14, 2007 Sayeed Choudhury Tim DiLauro, Jim Martino, Elliot Metsger, Mark Patton and David Reynolds.
Riding the Wave: a Perspective for Today and the Future APA Conference, November 2011 Monica Marinucci EMEA Director for Research, Oracle.
McGuinness – Microsoft eScience – December 8, Semantically-Enabled Science Informatics: With Supporting Knowledge Provenance and Evolution Infrastructure.
Astrophysics, Biology, Climate, Combustion, Fusion, Nanoscience Working Group on Simulation-Driven Applications 10 CS, 10 Sim, 1 VR.
The Changing Face of Research Anthony Beitz DART Integration Manager.
WORKFLOWS IN CLOUD COMPUTING. CLOUD COMPUTING  Delivering applications or services in on-demand environment  Hundreds of thousands of users / applications.
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
Mining Large Data at SDSC Natasha Balac, Ph.D.. A Deluge of Data Astronomy Life Sciences Modeling and Simulation Data Management and Mining Geosciences.
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
SCIENCE-DRIVEN INFORMATICS FOR PCORI PPRN Kristen Anton UNC Chapel Hill/ White River Computing Dan Crichton White River Computing February 3, 2014.
Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.
An Answer to the EC Expert Group on CLOUD Computing Keith G Jeffery Scientific Coordinator.
1 Common Challenges Across Scientific Disciplines Laurence Field CERN 18 th November 2013.
Designing the Microbial Research Commons: An International Symposium Overview National Academy of Sciences Washington, DC October 8-9, 2009 Cathy H. Wu.
OpenQuake Infomall ACES Meeting Maui May Geoffrey Fox
material assembled from the web pages at
Ohio State University Department of Computer Science and Engineering 1 Cyberinfrastructure for Coastal Forecasting and Change Analysis Gagan Agrawal Hakan.
Science Environment for Ecological Knowledge: EcoGrid Matthew B. Jones National Center for.
Linked-data and the Internet of Things Payam Barnaghi Centre for Communication Systems Research University of Surrey March 2012.
1 USC Information Sciences Institute Yolanda Gil AAAI-08 Tutorial July 13, 2008 AAAI-08 Tutorial on Computational Workflows for Large-Scale.
Reading Discussions Metcalfe’s Law paper What is metcalfe’s Law? Examples from the Web? How can we utilize it? How semantics contribute to social networks,
DOE 2000, March 8, 1999 The IT 2 Initiative and NSF Stephen Elbert program director NSF/CISE/ACIR/PACI.
FutureGrid Connection to Comet Testbed and On Ramp as a Service Geoffrey Fox Indiana University Infra structure.
Data and storage services on the NGS Mike Mineter Training Outreach and Education
Geosciences - Observations (Bob Wilhelmson) The geosciences in NSF’s world consists of atmospheric science, ocean science, and earth science Many of the.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
SEEK Welcome Malcolm Atkinson Director 12 th May 2004.
Grid Computing & Semantic Web. Grid Computing Proposed with the idea of electric power grid; Aims at integrating large-scale (global scale) computing.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic.
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
Non-Traditional Databases. Reading 1. Scientific data management at the Johns Hopkins institute for data intensive engineering and science Yanif Ahmad,
ISERVOGrid Architecture Working Group Brisbane Australia June Geoffrey Fox Community Grids Lab Indiana University
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
Applications and Requirements for Scientific Workflow Introduction May NSF Geoffrey Fox Indiana University.
Interoperability from the e-Science Perspective Yannis Ioannidis Univ. Of Athens and ATHENA Research Center
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
Computational Science & Engineering meeting national needs Steven F. Ashby SIAG-CSE Chair March 24, 2003.
Biodiversity Data Exchange Using PRAGMA Cloud Umashanthi Pavalanathan, Aimee Stewart, Reed Beaman, Shahir Shamsir C. J. Grady, Beth Plale Mount Kinabalu.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
System Development & Operations NSF DataNet site visit to MIT February 8, /8/20101NSF Site Visit to MIT DataSpace DataSpace.
Applications and Requirements for Scientific Workflow May NSF Geoffrey Fox Indiana University.
Project number: ENVRI and the Grid Wouter Los 20/02/20161.
E ARTHCUBE C ONCEPTUAL D ESIGN A Scalable Community Driven Architecture Overview PI:
High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.
The Global Scene Wouter Los University of Amsterdam The Netherlands.
All Hands Meeting 2005 BIRN-CC: Building, Maintaining and Maturing a National Information Infrastructure to Enable and Advance Biomedical Research.
1 Open Science Grid: Project Statement & Vision Transform compute and data intensive science through a cross- domain self-managed national distributed.
European Life Sciences Infrastructure for Biological Information ELIXIR’s needs from the EOSC Steven Newhouse, EMBL-EBI Part of the.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Support to scientific.
Geoffrey Fox Panel Talk: February
Accessing the VI-SEEM infrastructure
The BlueBRIDGE project
Pasquale Pagano (CNR-ISTI) Project technical director
RDA US Science workshop Arlington VA, Aug 2014 Cees de Laat with many slides from Ed Seidel/Rob Pennington.
Earthquakes: Some staggering facts
Workshop on Cyberinfrastructure National Science Foundation
Introduction to D4Science
Cyberinfrastructure and PolarGrid
Brian Matthews STFC EOSCpilot Brian Matthews STFC
Bird of Feather Session
Presentation transcript:

Applications and Requirements for Scientific Workflow May NSF Geoffrey Fox Indiana University

Team Members Geoffrey Fox, Indiana University (lead) Mark Ellisman, UCSD Constantinos Evangelinos, Massachusetts Institute of Technology Alexander Gray, Georgia Tech Walt Scacchi, University of California, Irvine Ashish Sharma, Ohio State University Alex Szalay, John Hopkins University

The Application Drivers Workflow is underlying support for new science model –Distributed interdisciplinary data deluged scientific methodology as an end (instrument, conjecture) to end (paper, Nobel prize) process is a transformative approach –Provide CS support for this scientific revolution This emerging model for science –Spans all NSF directorates –Astronomy (multi-wavelength VO), Biology (Genomics/Proteomics), Chemistry (Drug Discovery), Environmental Science (multi-sensor monitors as in NEON), Engineering (NEES, multi-disciplinary design), Geoscience (Ocean/Weather/Earth(quake) data assimilation), Medicine (multi-modal/instrument imaging), Physics (LHC, Material design), Social science (Critical Infrastructure simulations for DHS) etc.

What has changed? Exponential growth in Compute(18), Sensors(18?), Data storage(12), Network(8) (doubling time in months); performance variable in practice (e.g. last mile for networks) Data deluge (ignored largely in grand challenges, HPCC ) Algorithms (simulation, data analysis) comparable additional improvements Science is becoming intrinsically interdisciplinary Distributed scientists and distributed shared data (not uniform in all fields) Establishes distributed data deluged scientific methodology We recommend computer science workflow research to enable transformative interdisciplinary science to fully realize this promise

Application Requirements I Reproducibility core to scientific method and requires rich provenance, interoperable persistent repositories with linkage of open data and publication as well as distributed simulations, data analysis and new algorithms. –Distributed Science Methodology publishes all steps in a new electronic logbook capturing scientific process (data analysis) as a rich cloud of resources including s, PPT, Wikis as well as databases, compiler options, build time/runtime configuration… Need to separate wheat from chaff in implicit electronic record (logbook) keeping only that required to make process reproducible; need to be able to electronically reference steps in process; Traditional workflow including BPEL/Kepler/Pegasus/Taverna only describes a part of this Abstract model of logbook becomes a high level executable meta-workflow Multiple collaborative heterogeneous interdisciplinary approaches to all aspects of the distributed science methodology inevitable; need research on integration of this diversity –Need to maximize innovation (diversity) preserving reproducibility

Application Requirements II Interdisciplinary science requires that we federate ontologies and metadata standards coping with their inevitable inconsistencies and even absence Support for curation, data validation and “scrubbing” in algorithms and provenance; –QoS; reputation and trust systems for data providers Multiple “ibilities” (security, reliability, usability, scalability) As we scale size and richness of data and algorithms, need a scalable methodology that hides complexity (compatible with number of scientists increasing slowly); must be simple and validatable Automate efficient provisioning, deployment and provenance generation of complex simulations and data analysis; support deployment and interoperable specification of user’s abstract workflow; support interactive user Support automated and innovative individual contributions to core “black boxes” (produced by “marine corps” for “common case”) and for general user’s actions such as choice and annotation