SCEC5 Planning for Scientific Computing and NSF Data Management Plan

Slides:



Advertisements
Similar presentations
The Future of Scholarship in the Digital Age: The Role of Institutional Repositories Ann J. Wolpert Director of Libraries Massachusetts Institute of Technology.
Advertisements

Peter Griffith and Megan McGroddy 4 th NACP All Investigators Meeting February 3, 2013 Expectations and Opportunities for NACP Investigators to Share and.
Dept. of Computing and Technology (CaT) School of Science and Technology B.S. in Computer Information Systems (CIS) CIP Code: Program Code: 411.
December 2008 MRC Data Support Services (DSS) Chris Morris 13 th February 2009 Sharing Research Data: Pioneers, Policies and Protocols The seventh cat.
Data Management Plans PAUL H. BERN, PH.D. APRIL 3, 2014.
NSF Data Management Plan Requirements Alex Kanous
Introduction to Implementing an Institutional Repository Delivered to Technical Services Staff Dr. John Archer Library University of Regina September 21,
Institutional Repositories Tools for scholarship Mary Westell University of Calgary AMTEC Conference May 26, 2005.
National Science Foundation: Transforming Undergraduate Education in Science, Technology, Engineering, and Mathematics (TUES)
Senior Review Evaluations (1 of 5) Proposals due: 6 March 2015 Panel evaluations: Week of 22 April 2015 Performance factors to be evaluated will include.
DMPTool Expert Resources and Support for Data Management Planning Tao Zhang Michael Witt Purdue University Libraries 1.
Research Data Management Services Katherine McNeill Social Sciences Librarians Boot Camp June 1, 2012.
Managing Research Data – The Organisational Challenge at Oxford James A J Wilson Friday 6 th December,
ACCESS for VALIDITY ACCESS for INNOVATION. Starting January 2011 for NEW proposals Not voluntary – “integral part” of proposal and FastLane Required for.
Elements of a Data Management Plan Bill Michener University Libraries University of New Mexico Data Management Practices for.
BPR PHASE IV IMPLEMENTATION NOAA Business Process Re-Engineering Current Status: June 21, 2006 NESDIS Cooperative Institute Directors and Administrators.
Data Archiving and Networked Services DANS is an institute of KNAW en NWO Data Archiving and Networked Services Introduction to Data Management Planning.
1.UCERF3 development (Field/Milner) 2.Broadband Platform development (Silva/Goulet/Somerville and others) 3.CVM development to support higher frequencies.
Agency Requirements: NSF Data Management Plans Ruth Duerr National Snow and Ice Data Center Version 1.0 October 2012 Section: The Case for Data Stewardship.
Changing Implementation of NSF Data Policy Dr. Jennifer M. Schopf, NSF OD/OIA/EPSCoR On behalf of the NSF Data Working Group March 17, 2011 CASC Spring.
1 ARRO: Anglia Ruskin Research Online Making submissions: Benefits and Process.
U.S. Department of the Interior U.S. Geological Survey CDI Webinar Series 2013 Data Management at the National Climate Change and Wildlife Science Center.
NOAA Data Citation Procedural Directive 8 November 2012 DAARWG.
06/22/041 Data-Gathering Systems IRIS Stanford/ USGS UNAVCO JPL/UCSD Data Management Organizations PI’s, Groups, Centers, etc. Publications, Presentations,
Course, Curriculum, and Laboratory Improvement (CCLI) Transforming Undergraduate Education in Science, Technology, Engineering and Mathematics PROGRAM.
DOE Data Management Plan Requirements
Data Management Plans PAUL H. BERN, PH.D. APRIL 3, 2014.
Preliminary Findings Baseline Assessment of Scientists’ Data Sharing Practices Carol Tenopir, University of Tennessee
Southern California Earthquake Center SI2-SSI: Community Software for Extreme-Scale Computing in Earthquake System Science (SEISM2) Wrap-up Session Thomas.
SCEC: An NSF + USGS Research Center Focus on Forecasts Motivation.
Data Infrastructure Building Blocks (DIBBS) NSF Solicitation Webinar -- March 3, 2016 Amy Walton, Program Director Advanced Cyberinfrastructure.
PEER 2003 Meeting 03/08/031 Interdisciplinary Framework Major focus areas Structural Representation Fault Systems Earthquake Source Physics Ground Motions.
Merit JISC Collections Merit: presentation for UKCORR Hugh Look, Project Director.
Grant Writing 2012 Grant Writing for Digital Projects September 2012 IODE Project Office IODE Project Office Oostende, Belgium Oostende, Belgium.
CESSDA SaW Training on Trust, Identifying Demand & Networking
Our Digital Showcase Scholars’ Mine Annual Report from July 2015 – June 2016 Providing global access to the digital, scholarly and cultural resources.
Priorities for International Development of e-Infrastructure and Data Management in Global Change Research Presentation by Robert Gurney, University of.
Digital Repository Certification Schema A Pathway for Implementing the GEO Data Sharing and Data Management Principles Robert R. Downs, PhD Sr. Digital.
Auditing of Trustworthy Data Repositories – Speakers
GISELA & CHAIN Workshop Digital Cultural Heritage Network
EPSRC research data expectations and research software management
Engaging with global clinical communities (on a day to day basis)
2. What are the major research priorities for the LAC region?
Persistent Identifiers Implementation in EOSDIS
GEOSS Data Sharing Principles
Meeting Objectives Discuss proposed CISM structure and activities
Responsibilities & Tasks Week 2
SowiDataNet - A User-Driven Repository for Data Sharing and Centralizing Research Data from the Social and Economic Sciences in Germany Monika Linne, 30.
High-F Project Southern California Earthquake Center
VI-SEEM Data Repository
Introduction to Implementing an Institutional Repository
Cathy Manduca, SERC Earth Educators Rendezvous 2017
EOSCpilot Skills Landscape & Framework
Support for the AASHTO Committee on Planning (COP) and its Subcommittees in Responding to the AASHTO Strategic Plan Prepared for NCHRP 8-36, TASK 138.
The Case for Data Management: Agency Requirements
Implementation Guide for Linking Adults to Opportunity
Digital Stewardship Curriculum
Chapter 3 – Agile Software Development
A Funders Perspective Maria Uhle Co-Chair, Belmont Forum Directorates for Geosciences, US National Science Foundation.
Research Infrastructures: Ensuring trust and quality of data
How to Succeed with NSF: September 14, 2018
GISELA & CHAIN Workshop Digital Cultural Heritage Network
Bird of Feather Session
Jamie Weinstein, MPH The MayaTech Corporation,
Nancy Y. McGovern Digital Preservation Officer, ICPSR IASSIST 2007
Digital Library and Plan for Institutional Repository
S-STEM (NSF ) NSF Scholarships for Science, Technology, Engineering, & Mathematics Information Materials 6 Welcome! This is the seventh in a series.
The Data Management Plan (DMP) and your NSF proposal
Digital Library and Plan for Institutional Repository
Draft Charter Community of Practice for Direct Access Entities
Presentation transcript:

SCEC5 Planning for Scientific Computing and NSF Data Management Plan SCEC Leadership Retreat 2 June 2015 I’ve learned that proposals represent planning opportunities. SCEC should use these opportunities to

SCEC5 Data Management Planning

USGS View of Data Management

NSF Data Management Plan Requirement The National Science Foundation (NSF) has published a revised version of their Proposal and Award Policies and Procedures Guide (PAPPG) that requires, in all proposals submitted or due on or after January 18, 2011, a supplementary document of no more than two pages describing a Data Management Plan for the proposed research. As a supplementary document, the data management plan is not included in the 15-page limit for proposal bodies. Fastlane will not permit submission of a proposal that is missing the Data Management Plan.

Contents of a NSF Data Management Plans Products of the Research: The types of data, samples, physical collections, software, curriculum materials, and other materials to be produced in the course of the project. Data Formats: The standards to be used for data and metadata format and content (where existing standards are absent or deemed inadequate, this should be documented along with any proposed solutions or remedies). Access to Data and Data Sharing Practices and Policies: Policies for access and sharing including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, or other rights or requirements. Policies for Re-Use, Re-Distribution, and Production of Derivatives. Archiving of Data: Plans for archiving data, samples, and other research products, and for preservation of access to them.

Division of Earth Sciences (GEO/EAR) Specific Requirements Preservation of all data, samples, physical collections and other supporting materials needed for long- term earth science research and education is required of all EAR-supported researchers. Data archives must include easily accessible information about the data holdings, including quality assessments, supporting ancillary information, and guidance and aids for locating and obtaining data. It is the responsibility of researchers and organizations to make results, data, derived data products, and collections available to the research community in a timely manner and at a reasonable cost. In the interest of full and open access, data should be provided at the lowest possible cost to researchers and educators. This cost should, as a first principle, be no more than the marginal cost of filling a specific user request. Data may be made available for secondary use through submission to a national data center, publication in a widely available scientific journal, book or website, through the institutional archives that are standard for a particular discipline (e.g. IRIS for seismological data, UNAVCO for GPS data), or through other EAR-specified repositories..

Information on Creating NSF Data Management Plans NSF Data Management Plan Requirements https://www.nsf.gov/eng/general/dmp.jsp 2. Directorate for Geosciences--Data Policies http://www.nsf.gov/geo/geo-data-policies/index.jsp 3. DataOne (NSF Geo Data Management Project) https://www.dataone.org/data-management-planning 4. UCLA Library Data Management Plan Information http://www.library.ucla.edu/support/publishing-data-management/scholarly-communication-services/data-management-curation-services/data-management-plans 5. University of Michigan Library Systems http://www.lib.umich.edu/research-data-services/nsf-data-management-plans

f. Biographical Sketch(es) (c) Products A list of: (i) up to five products most closely related to the proposed project; and (ii) up to five other significant products, whether or not related to the proposed project. Acceptable products must be citable and accessible including but not limited to publications, data sets, software, patents, and copyrights. Unacceptable products are unpublished documents not yet submitted for publication, invited lectures, and additional lists of products. Only the list of 10 will be used in the review of the proposal. Each product must include full citation information including (where applicable and practicable) names of all authors, date of publication or release, title, title of enclosing work such as journal or book, volume, issue, pages, website and Uniform Resource Locator (URL) or other Persistent Identifier. If only publications are included, the heading "Publications" may be used for this section of the Biographical Sketch.

SCEC5 Data Management Strategies

Data Citation Citation of Data Using Digital Object Identifiers (DOIs) Use Digital Object Identifiers (DOIs) to signify datasets that are complete, in a useable format, stable (changes are implemented by publication of new versions), have valid metadata, have passed the quality control checks within the domain of expertise of the data centre, and have long-term stewardship guaranteed by that data centre, underwritten by the ICSU World Data System. This provides the basis for a dataset to be cited as if it were a research paper, putting it on a par with other scientific outputs. [Reference] The International Journal of Digital Curation Volume 7, Issue 1 | 2012

Data Citation Creating a Culture of Data Citation: http://www.icpsr.umich.edu/icpsrweb/ICPSR/curation/citations.jsp 2. Directorate for Geosciences--Data Policies http://www.nsf.gov/geo/geo-data-policies/index.jsp 3. DataOne (NSF Geo Data Management Project) https://www.dataone.org/data-management-planning 4. UCLA Library Data Management Plan Information http://www.library.ucla.edu/support/publishing-data-management/scholarly-communication-services/data-management-curation-services/data-management-plans 5. University of Michigan Library Systems http://www.lib.umich.edu/research-data-services/nsf-data-management-plans

SCEC5 Data Exchange As Focus of External Collaborations With adequate personnel support, SCEC5 could establish itself as a valuable contributor and collaborator with external NSF and other projects including IRIS, CIG, EarthScope, EarthCube, NHERP, USGS, CGS, DOE, etc. Collaborations might work as follows: SCEC5 expresses interest and willingness to collaborate with external projects. SCEC5 meets with each group and discuss what data products they produce that might be of interest to SCEC. Describe SCEC products that might be of interest to external group. Agree on data to be exchanged. Agree on exchange format Agree on Metadata content Implement Data formatting of selected products Implement access mechanism Release prototype date exchange

Discussion SCEC5 Computational Research

SCEC5 Scientific Computing Scientific computing software development is a valuable capability within core SCEC and within Special Projects. SCEC5 planning should include Scientific Computing, for several reasons, including: Scientific Computing is Expensive Scientific Computing Could Lead to SCEC5 Growth In this session, I’ll present issues and suggestions for SCEC5 scientific computing for discussion.

SCEC5 Scientific Computing SCEC computer activities under several names including: Scientific Computing Research Computing High Performance Computing Big Data Processing Computer Science Community Modeling Environment Information Technology Computational Science SCEC5 scientific computing includes but is not limited to High Performance Computing

SCEC5 Scientific Software Capabilities SCEC’s core computing skill is scientific software. Both core SCEC and Special Projects have this capability: Core SCEC researchers develop new scientific codes, often to do individual research. Special projects often develop software to perform large-scale community calculations.

SCEC5 Scientific Computing SCEC5 should focus on developing scientific software and using the software to perform research. SCEC5 should avoid spending significant resources building and operating large-scale computer hardware.

Scientific Computing Core SCEC Core SCEC researchers should continue to create, evaluate, improve research software. Collaborative Computational Research Activities are Very Valuable: Source Inversion Site Response Dynamic Rupture Comparison Utilization of Ground Motion Simulations Core SCEC would benefit from a software developer available to the community. However, even if funding existed, finding the right person, and setting appropriate priorities would be a challenge.

CME Software Eco-System The SCEC Community Modeling Environment (CME) software means computing related to the computational pathways designed to improve ground motion forecasting. CME software represents an inter-related set of computational tools from CVM’s, to UCERF3, to CyberShake, to Full 3D Tomography. SCEC CME software is a collection of scientific codes that together provide a full range of seismic hazard analysis tools including SCEC Velocity Models, UCVM, Dynamic Rupture Codes, 1D Broadband, 3D AWP-ODC, 3D Hercules, OpenSHA, CyberShake, and full 3D tomography. In NSF OCI terminology, these programs form an software “eco-system” of inter-related and inter-dependent modeling tools that can be used to calculate physics-based probabilistic seismic hazard models.

SCEC Scientific Computing Successes SCEC’s most productive scientific computing collaborations are organized around an important seismic hazard data product or calculation that can be improved using advanced computational techniques. The scientific challenge defines the computational goal, and computing techniques are introduced as needed to reach the goal. SCEC scientific computing projects are integrative, bringing together inter-related SCEC structural and computational models. SCEC5 should continue to organize and focus integrative, science-driven, broad-impact, scientific computing projects.

SCEC Scientific Computing In Special Projects Within Special Projects, the most successful SCEC scientific computing projects have been multi-disciplinary collaborations that include scientists, engineers, computer scientists, and software developers. Examples Include: OpenSHA Broadband Platform CyberShake OEF CSEP Having software developers work with scientists and engineers is our key strategy to avoiding wasting software developer time, or developing software nobody uses.

SCEC Scientific Computing Successes SCEC special projects are often a mechanism for extending the computational capabilities of individual research codes into community-based, practical, computational data products. Special project calculations often represent more of a community calculation, rather than an individual researcher calculation. Core SCEC5 computational science should play an increased role in identifying the best available codes that should be used in special project calculations.

SCEC Scientific Computing Successes Due to the important of scientific software, SCEC5 should initiate efforts to improve software development capabilities within both scientific and research staff. SCEC should train scientific staff in software basics, such as the material covered in “Software Carpentry” and other software engineering overviews. (e.g. By end of SCEC5, SCEC researchers should use version control for their research software.) Due to the rapidly changing software field, SCEC software staff should be required to perform annual training to keep skills current. SCEC computer training likely needs to be increased. Increasing interactions between SCEC computational science and CEO might enable CEO to support SCEC computer training.

Project Sizes SCEC has most success coordinating the efforts of small software teams, working on well-focused research activities. We recommend that SCEC5 special software projects should be organized around project teams with approximately 6 people or less. If SCEC software project groups grow to larger sizes, SCEC will need to re-organize how groups are organized and managed.

Software Staff-related Issues To maintain a software staff, SCEC5 management must recognize that most software staff people are not academics. Often, the software developer’s goal is to produce working software, used by a community, or used to produce an important result, rather than to publish papers. Also, In the fast-paced software field, forward career motion is important to software people. To retain talented software staff, SCEC5 will need to provide a non-academic software staff career path through which staff members can reasonably progress. Staff software developer career path should define positions with gradually increasing responsibilities, and each SCEC position should be linked to an appropriate official USC staff positions. The career paths should enable staff to progress into either advanced technical, or management roles.

Finding Good Software Staff Special projects best source of staff software developers has been the UseIT Intern program. UseIT was a highly effective as a way to attract student interest in SCEC research, and evaluate the students’ readiness to contribute to SCEC software project. The SCEC intern programs work as a farm team for SCEC’s wider computational science program. If SCEC5 must maintain a significant software staff, operating a UseIT type intern programs could be very valuable for recruitment.

Obtaining HPC Time Both core SCEC and SCEC special project need HPC time. But special projects need more. If special projects are funded, including Keck and Central California, the importance of computing time will increase. To avoid shortfall, SCEC will need to dedicate personnel to obtaining, managing, and reporting on supercomputer hours. At the large proposed scales, the staff will not be able to both raise the computing hours and have time to perform the research. The cost of a person to raise the computing hours will be less than directly purchasing the cost of the computing time.

Obtaining HPC Time An important SCEC5 strategy to obtain large-scale computing activities will be to SCEC5 should work to stay qualified on largest systems to meet needs of HPC research. To stay qualified on a new system often requires a new, or re-written version of a high-performance code. SCEC wave propagation codes, which are being pushed to higher and higher frequencies, are good candidates for codes that SCEC can develop to keep us qualified on the newest and largest HPC systems. Participating with HPC centers developing next generation supercomputers (co-Design concept) is advanced HPC. It would require several more SCEC people including senior computer scientists involved. High-risk, high-reward, with greatest danger to SCEC that no research computing gets done, only system testing software.

Sustainability Strategy SCEC5 can benefit from a computational science group, and avoid wasting software development time by doing the following: Integrate the best available core SCEC scientific software into important broad impact data products such as CSEP, UCERF, Broadband, CyberShake, High-F, and Full 3D tomography. Evaluate all USGS seismological data products including EEW, ShakeMap, UCERF, Hazard Maps, OEF, and identify ways core SCEC research can improve them. Where clear improvements are possible, implement a multi-disciplinary group to implement the improvements.

Additional Topics (in HPC White Paper) Key Software Needed for SCEC5 Computational Science Contributing to SCEC Visibility Additional Software Sustainability Strategies

End