Presentation on theme: "Science Archives Workshop - April 25, 2007 - Page 1 Archive Policies and Implementation: A Personal View from a NASA Heliophysics Data Policy Perspective."— Presentation transcript:
Science Archives Workshop - April 25, Page 1 Archive Policies and Implementation: A Personal View from a NASA Heliophysics Data Policy Perspective D. Aaron Roberts NASA GSFC 25 April 2007
Science Archives Workshop - April 25, Page 2 Define:Archive (some Google results) A site containing a large number of files, possibly acquired over time, and often publicly accessible. (100 Best Web Hosting) A function permitting users to copy one or more files to a long-term storage device. Archive copies can: Accompany descriptive information; Imply data compression software usage; Be retrieved by archive date, file name, or description (Tivoli Storage Manager) Archive is a London-based Trip-hop group. (Wikipedia)
Science Archives Workshop - April 25, Page 3 Science Data Archive Definition Easily accessible, scientifically useable, well-documented, secure data = a good archive. Requires: Open data policy Independently useable data Science input (data preparation and serving) Proper registration and backup
Science Archives Workshop - April 25, Page 4 Archiving Homilies Archiving is a journey, not a destination “Archive early, archive often” as a natural extension of serving data “Central” archiving is more about knowledge than acquisition Knowledge must be easily available: presentation matters The customer is always right Standards are only as good as the community that supports them, but they are essential: “It’s the metadata, stupid” Consider the legacy
Science Archives Workshop - April 25, Page 5 Archiving is a journey Properly described, well-documented, accessible data should easily move from one archiving stage to the next: NASA missions produce Active Archives (nothing is “ingested”) Products, delivery, and initial long-term data plans in Project Data Management Plan Virtual Observatories provide uniform descriptions and access to many such archives The archive continues to develop in the extended mission A Mission Archive Plan provides updates to the Senior Reviews on status, plans, and actions for post mission products and service After the mission, a Resident Archive can continue to server data Active upgrades of data products to be funded by other means NSSDC manages the RAs “Permanent” archiving may just be moving the data and documentation to a more generic Resident Archive (e.g., SDAC, SPDF) for continued access At all stages, backups and registries maintain safety and knowledge of the data products
Science Archives Workshop - April 25, Page 6 “Central” archiving More about knowledge than acquisition: What exists? Where is it? Is it well documented? Is it safe? New focus for NSSDC role (at least for HP): knowledge of data environment; management of RAs. (Harvested) VO registries augmented as needed can provide a complete set of resources. Information about the above should be available in ways that provide easy overviews as well as details.
Science Archives Workshop - April 25, Page 7 The customer is always right The community determines directions: Peer review of VOs, RAs, Data Centers, Missions: What is working? What could be improved? What can go? HP Data and Computing Working Group provides feedback on HQ directions “Top down vision, bottom-up implementation” “Market-driven” including what we want from archives
Science Archives Workshop - April 25, Page 8 It’s the metadata, stupid Standards that work: Value of sharing data SPASE data model provides a uniform description of data products SPASE description + data = “SIP”, “AIP”, and “DIP” Preserved data should be in common, open, supported formats (e.g, FITS, HDF, CDF, documented ASCII, …) Communication and other standards TBD Important to decide the level of description
Science Archives Workshop - April 25, Page 9 Consider the legacy Preserving and serving what matters for the long term: What is most useful? (If “all” is not possible) What works now, and what will last (and how)? Calibrated, best-effort products should accompany level-zero plus software/algorithms
Science Archives Workshop - April 25, Page 10 A model Heliophysics never quite implemented Main problems: (1)“Planning” is a mission function (in collaboration with VOs and others) (2)“Ingest” is replaced by “production” and “transfer” (3)“Access” is a distributed function as are the archives in general
Science Archives Workshop - April 25, Page 11 The New Heliophysics Mission Data Lifecycle and Framework
Science Archives Workshop - April 25, Page 12 Summary Easily accessible, scientifically useable, well-documented, secure data = a good archive. Archiving is a journey, not a destination “Central” archiving is more about knowledge than acquisition Knowledge must be easily available: presentation matters The customer is always right Standards are only as good as the community that supports them, but they are essential: “It’s the metadata, stupid” Consider the legacy
Science Archives Workshop - April 25, Page 13 Backup Slides (HP Data Policy)
Science Archives Workshop - April 25, Page 14 The HP Data Environment l Data from the Heliophysics Great Observatory reside in a distributed environment and are served from multiple sources. l Multimission Data Centers n Solar Data Analysis Center n Space Physics Data Facility (CDAWeb, OMNIWeb, etc.) n National Space Science Data Center l Mission-level active archives: e.g. ACE, TIMED, TRACE, Cluster, etc. l Much of our data are served from individual instrument sites. l We are moving into a new data environment of n Virtual Observatories for convenient search and access of the distributed data, and n Resident Archives to retain the distributed data sources even after mission termination. l We have a Data and Computing Working Group to help us move ahead.
Science Archives Workshop - April 25, Page 15 Goals of the HP Science Data Management Policy l Improve management of and access to HP mission data. l Clarify the architecture and associated data lifecycle milestones of the data environment. l Provide guidelines for proposals, Project Data Management Plans, NRAs, peer reviews, and other activities related to the HP data environment.
Science Archives Workshop - April 25, Page 16 Basic Philosophy l Evolve the existing HP data environment: n take advantage of new computer and Internet technologies to n respond to our evolving mission set and community research needs (enable the HP Great Observatory) l Blend ‘bottoms-up’, ‘market-driven’ implementation approaches with a ‘top-down’ vision for an integrated data environment. l Assure that the HP science community participates in all levels of data management.
Science Archives Workshop - April 25, Page 17 Guiding Principles l All data produced by the HP missions will be open and made available as soon as is practical. n Gurman's "Right Amount of Glue” from the Fall 2002 AGU meeting sets the philosophy [see a key component of which is a standard of behavior - share one’s data with everyone. l Data will be independently scientifically usable. n adequate documentation including uniform SPASE descriptions n sustainable and open data formats n easy electronic access n provision of appropriate analysis tools.
Science Archives Workshop - April 25, Page 18 Architecture l The environment will be distributed n Many archives with different internal workings l Data integration capabilities provided by discipline- based virtual observatories (“VxO’s”; VSO first for x = “Solar” and now 5 others) n linked by a central dictionary (“SPASE Data model”) and machine- to-machine communication routines. n Easily permits the inclusion of essential data sets from non-NASA sources. n Provides a context for services and advanced analysis tools developed under, e.g. AISRP, LWS TR&T, and the VxOs.
Science Archives Workshop - April 25, Page 19 Policy Recommendations, Etc. l The Policy includes: n Roles of data environment components n “Rules of the Road” for data use, n Recommendations for Project Data Management Plans and Mission Archive Plans, n A timeline of the HP mission data lifecycle
Science Archives Workshop - April 25, Page 20 Implementation l Use peer-review processes to assist in managing the elements of the environment. n NRAs for: (a) VxOs, (b) Data quality and access improvement, (c) Resident Archives, and (d) Value-added services. n Mission and Data Center Senior Reviews RA reviews. l Success will be determined by community use and feedback. The process is “market-driven.”
Science Archives Workshop - April 25, Page 21 Current Activities l Finalizing the Data Policy with community input. n Our goal is to have this ready for the MIDEX AO l Implementing a second round of VxOs and processing the next round of proposals for VxOs and related services. l Coordinating these efforts through frequent interactions and work with the SPASE group. l Implementing Resident Archives and the processes to manage these archives. l Working with new missions to incorporate the Data Policy from the start, and “retrofitting” older missions through VxOs and other means. l Working on collaboration with other NASA science divisions, other US agencies, and international partners. l Maintaining a web site for latest news about our data environment: