Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Integrity Issues: How to Proceed? Engineering Node Elizabeth Rye August 3, 2006

Similar presentations


Presentation on theme: "Data Integrity Issues: How to Proceed? Engineering Node Elizabeth Rye August 3, 2006"— Presentation transcript:

1 Data Integrity Issues: How to Proceed? Engineering Node Elizabeth Rye August 3, 2006 http://pds.nasa.gov

2 3 August 2006Data Integrity Issues2 of 19 PDS Requirements for Data Integrity The PDS has made a commitment to ensure the integrity of its data archives. This commitment is primarily spelled out in the Level 3 requirement 4.1.2: –“PDS will develop and implement procedures for periodically ensuring the integrity of the data.” Several other Level 3 requirements suggest additional implications for data integrity assurance.

3 3 August 2006Data Integrity Issues3 of 19 PDS Requirements for Data Integrity The PDS is responsible for assisting data providers in determining how to validate the data they provide: –“PDS will provide criteria for validating archival products” (1.3.3) The PDS is responsible for ascertaining that the data we deliver to the NSSDC is valid: –“PDS will meet U.S. federal regulations for the preservation and management of data.” (2.8.3) –“PDS will meet U.S. federal regulations for preservation and management of the data through its Memorandum of Understanding (MOU) with the National Space Science Data Center (NSSDC)” (4.1.5)

4 3 August 2006Data Integrity Issues4 of 19 PDS Requirements for Data Integrity The PDS is responsible for enabling our users to verify the integrity of the data they receive from us: –“PDS will develop and maintain online mechanisms allowing users to download portions of the archive” (3.2.1) –“PDS will develop and maintain a mechanism for offline delivery of portions of the archive to users”( 3.2.2) –“PDS will provide mechanisms to ensure that data have been transferred intact” (3.2.3) The PDS needs to ensure the maintenance of data integrity through the media refreshing process: –“PDS will develop and implement procedures for periodically refreshing the data by updating the underlying storage technology” (4.1.3)

5 3 August 2006Data Integrity Issues5 of 19 PDS Requirements for Data Integrity The PDS has a stated goal of utilizing standardized procedures in areas that affect inter-node data transfers: –“PDS will provide standard protocols for accessing data, metadata and computing resources across the distributed archive” (2.7.3)

6 3 August 2006Data Integrity Issues6 of 19 PDS Requirements for Data Integrity From the above requirements, we can derive several areas of concern for data integrity: –Verifying the integrity of data stored on physical media –Detecting errors introduced during transfer of data to newer media –Detecting errors that occur during transmission of data: From data providers to the PDS Between PDS nodes From the PDS to the NSSDC From the PDS to end users

7 3 August 2006Data Integrity Issues7 of 19 PDS Requirements for Data Integrity There are two additional areas not derivable from existing PDS requirements where data integrity issues are involved: –The re-delivery of non-archived data during the operations phase of a mission –The potential updating of data to newer formats long after it has been archived

8 3 August 2006Data Integrity Issues8 of 19 Mitch Gordon Survey For each numbered item, do you think that it is an important issue for us to address? Section A - It is critical that the PDS be able to ascertain the integrity of its archive. This includes (but is not limited to): 1.detecting errors that occur during the transmission of data from providers to the PDS, 2.detecting errors that occur during the transmission of data between PDS nodes, 3.detecting errors that occur during the transmission of data from the PDS to end users. 4.detecting errors that occur during the transmission of data from the PDS to the NSSDC 5.verifying the integrity of data stored on various types of external physical media (all of which have finite life spans), 6.detecting errors introduced during transfer of data to newer media,

9 3 August 2006Data Integrity Issues9 of 19 Mitch Gordon Survey LyleSteveEdSteveToddDavidAnneDickMitch ATMENGEOPPI SBN- Az SBN- Md RSRings AArchive Integrity A1Providers to PDSYYYYNYYNY A2PDS to PDSYYYYNYYNY A3PDS to usersYYMYNYYNY A4PDS to NSSDCYYMYNYYYY A5Integrity on mediaYYYYYYYYY A6Upgrading to new media YYYYYYYYY BIdentify tool B1Use MD5MYNMNYNMY CPolicies for use of toolYYNYYNMY

10 3 August 2006Data Integrity Issues10 of 19 Possible Solutions to the Problem Checksums are widely accepted in the broader community as a means for ensuring data integrity MD5 checksums, in particular, are well suited to this purpose There has been no mechanism beside checksums suggested by any of the nodes as a means for detecting changes in data There is no consensus within the PDS as to whether we should limit ourselves to the MD5 checksum algorithm There is little consensus within the PDS as to whether we should use a standardized approach to utilizing checksums to verify data integrity

11 3 August 2006Data Integrity Issues11 of 19 Mitch Gordon Survey Section B - Identify a tool that can help (not necessarily be sufficient) with any, or hopefully all, of the above. –Use a single tool, MD5, for generating and validating checksums Section C - Establish policies for the use of the tool in a variety of situations.

12 3 August 2006Data Integrity Issues12 of 19 Mitch Gordon Survey LyleSteveEdSteveToddDavidAnneDickMitch ATMENGEOPPI SBN- Az SBN- Md RSRings AArchive Integrity A1Providers to PDSYYYYNYYNY A2PDS to PDSYYYYNYYNY A3PDS to usersYYMYNYYNY A4PDS to NSSDCYYMYNYYYY A5Integrity on mediaYYYYYYYYY A6Upgrading to new media YYYYYYYYY BIdentify tool B1Use MD5MYNMNYNMY CPolicies for use of toolYYNYYNMY

13 3 August 2006Data Integrity Issues13 of 19 Issues to be Addressed

14 3 August 2006Data Integrity Issues14 of 19 Standardization Issue Should we have a standardized approach across the PDS for storing and accessing checksums or should each node be permitted to use whatever mechanism it chooses? –Some flexibility needed to deal with variety of ways in which data providers deliver data to the PDS –Standardization permits the development of tools for generating, accessing, and periodically validating against checksums –Standardization permits the addition of checksum tools to existing interfaces (like PDS-D and NSSDC delivery mechanism) to utilize and validate against checksums

15 3 August 2006Data Integrity Issues15 of 19 Urgency Volume of data returned from missions is increasing exponentially every couple of years Going back and calculating checksums for every file already in the PDS holdings is currently feasible, but will become a significantly more difficult task with each passing year

16 3 August 2006Data Integrity Issues16 of 19 Policy Questions to be Answered At what level of detail should checksums be required? For what parts of the archiving process should checksums be required? To what degree should standardization among nodes be insisted upon? When should we begin requiring checksums?

17 3 August 2006Data Integrity Issues17 of 19 Current Proposal (SCR 3-1034, V9) Mandates generation of file checksums for every file on every archive volume Mandates standardized format and location for storage of checksums Is insufficient to solve all data integrity problems, but is a necessary part of the solution Required for all missions archiving to v3.8 or higher of Standards Reference (roughly missions starting process late this year)

18 3 August 2006Data Integrity Issues18 of 19 Most Recent Votes on Checksum SCR Version 5 (MC)Version 9 (Tech) Science NodesATMnoreject GEOnoreject IMGyesrecommend PPIyesrecommend RINGSyesrecommend SBNnorecommend Support NodesENyesrecommend NAIFyesrecommend RSnoreject

19 3 August 2006Data Integrity Issues19 of 19 Options for Next Step Proceed with MC vote on version 9 of SCR Form new working group to come up with a new proposal MC draft policy on data integrity to provide further guidance to Tech group Drop the issue (fails to meet our requirements) Other?


Download ppt "Data Integrity Issues: How to Proceed? Engineering Node Elizabeth Rye August 3, 2006"

Similar presentations


Ads by Google