Presentation is loading. Please wait.

Presentation is loading. Please wait.

Successful Data Curation for Large Data Archives

Similar presentations


Presentation on theme: "Successful Data Curation for Large Data Archives"— Presentation transcript:

1 Successful Data Curation for Large Data Archives
Joseph L “Joey” Comeaux Steven J. Worley NCAR and Cliff Jacobs NSF 9/15/2019 AMS-2011

2 NCAR’s Research Data Archive
Managed by 8 members of the Data Support Section Within Computational and Information Systems Laboratory Started in ~1965 “RDA Overview” presented later by Doug Schuster 9/15/2019 AMS-2011

3 9/15/2019 AMS-2011

4 CISL Data Life-Cycle Model
US International Data Assistance Feedback Management Supervision Guidance Preservation Integrity Archiving Cataloging Access Ensure Integrity Curation Steward-ship Users Requests and Needs 9/15/2019 AMS-2011

5 CISL Data Life-Cycle Model
US International Data Assistance Feedback Management Supervision Guidance Preservation Integrity Archiving Cataloging Access Ensure Integrity Curation Steward-ship Users Requests and Needs 9/15/2019 AMS-2011

6 Six Requirements for Data Curation
Stable Funding Knowledgeable / Consistent Staffing Robust Storage Backups Partnerships Formats 9/15/2019 AMS-2011

7 Sustainable Data Curation
Stable Funding Focused on data management Allows flexibility Data management can evolve to meet science support expectations Holistic approach, not project specific Necessary to keep curated collection viable Stable Funding 9/15/2019 AMS-2011

8 Sustainable Data Curation
The problem of paying for long-term stewardship of research data is difficult…. Requirements for improved data management practices should not be imposed as unfunded mandates – “Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age,” The National Academy of Sciences, National Academy of Engineering and Institute of Medicine, 2004 9/15/2019 AMS-2011 AMS-2011

9 Sustainable Data Curation
Stable Funding Enriched Staff 9/15/2019 AMS-2011

10 Sustainable Data Curation
Enriched Staff Knowledgeable and educated in the specific discipline Important for checking integrity of data Choosing organization of data Creating adequate meta-data Designing access system and assisting users Staff is link to researches Consistent Staffing Levels Dedicated to best practices in archiving and stewardship Great deal of knowledge held by staff, regardless of documentation Value of human based knowledge cannot be under-estimated We find 5-10 years experience yields an expert RDA support group, cumulative 128 years experience Sustainable Data Curation Stable Funding Enriched Staff 9/15/2019

11 Sustainable Data Curation
Recommendation 6: NSF and the Community should act to ensure that there are a sufficient number of high-quality data scientist – “Long Lived Data Collections: :Enabling Digital Research and Education in the 21st Century”, NSF, Working Papers of the National Science Board, 2006 Enriched Staff 9/15/2019 AMS-2011

12 Sustainable Data Curation
Robust Storage Facilities Capability to meet growth and technology changes NCAR Tape based archive Size > 2x every 2.5 years : >10PB now Technology evolution, tapes : 20GB -> 60GB -> 200GB -> 1000GB Full RDA about 600TBs (600 tapes) Changes ‘transparent’ for RDA staff and users Enhance user access with disk Fast access Available for RDA : 1.5TB -> 12TB -> 70TB -> 300TB Careful deliberation needed Robust Storage Stable Funding Enriched Staff 9/15/2019 AMS-2011

13 Sustainable Data Curation
Backups Loss of data attributed to 2 general causes Physical equipment or facility failures Environmental -> Fire, Flood, Earthquake…. Equipment -> Drive failures, Tape deterioration Resolution Store copies of irreplaceable data at separate physical locations Robust Storage Stable Funding Enriched Staff Backups 9/15/2019 AMS-2011

14 Sustainable Data Curation
Robust Storage Backups Loss of data attributed to 2 general causes Poor Curation/Stewardship, e.g. breaks in best practice Loss of metadata Accidental data over-writes and deletions Lack of background knowledge Misjudge importance of particular data Resolution Treat metadata and data equally Implement safeguards to minimize risk of over-writes and accidental deletions Knowledgeable staff Stable Funding Enriched Staff Backups 9/15/2019 AMS-2011

15 Growth of RDA User Data 9/15/2019 AMS-2011

16 Growth of RDA 9/15/2019 AMS-2011

17 Growth of RDA 9/15/2019 AMS-2011

18 Growth of RDA RDA : 40% BACKUPS 9/15/2019 AMS-2011

19 Sustainable Data Curation
Robust Storage Stable Funding Enriched Staff Backups 9/15/2019 AMS-2011

20 Sustainable Data Curation
Data Format Ensure data access for long-term Fully documented to the byte level Use standard formats Non-proprietary Not dependent on OS, hardware or applications Choices can affect storage requirements Metadata Format Utilize a standard Multiple levels are becoming more important Discovery File attribute File content Backup of metadata is important RDA practices presented by Bob Dattore, IIPS Session 5 Robust Storage Stable Funding Enriched Staff Formats Backups 9/15/2019 AMS-2011

21 Sustainable Data Curation
Robust Storage Stable Funding Enriched Staff Formats Backups 9/15/2019 AMS-2011

22 Sustainable Data Curation
Robust Storage Partnerships World-wide sharing and unrestricted open access provides greater research opportunities than any one center can provide National and international No single institute can “do it all” Most users “need/want it all” Good way to share some costs Reanalyses project are examples Stable Funding Enriched Staff Formats Partner ships Backups 9/15/2019 AMS-2011

23 Sustainable Data Curation
Robust Storage Archive Pertinent Data Stable Funding Enriched Staff Formats Partner ships Backups 9/15/2019 AMS-2011

24 Only by sharing research data … can new knowledge be transformed into socially beneficial goods and services “Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age”, The National Academy of Sciences, National Academy of Engineering and Institute of Medicine, 2006 9/15/2019 AMS-2011

25 SUMMARY Six Requirements for Data Curation Stable Funding
Enriched Staff Robust Storage Backups Formats Partnerships 9/15/2019 AMS-2011


Download ppt "Successful Data Curation for Large Data Archives"

Similar presentations


Ads by Google