Presentation is loading. Please wait.

Presentation is loading. Please wait.

Managing the Impacts of Programmatic Scale and Enhancing Incentives for Data Archiving A Presentation for “International Workshop on Strategies for Preservation.

Similar presentations


Presentation on theme: "Managing the Impacts of Programmatic Scale and Enhancing Incentives for Data Archiving A Presentation for “International Workshop on Strategies for Preservation."— Presentation transcript:

1 Managing the Impacts of Programmatic Scale and Enhancing Incentives for Data Archiving A Presentation for “International Workshop on Strategies for Preservation of and Open Access to Scientific Data” June 22, 2004 Beijing, China Raymond McCord Oak Ridge National Laboratory* Oak Ridge, Tennessee, USA *Oak Ridge National Laboratory is operated by UT-Battelle, LLC, for the U.S. Department of Energy under contract DE-AC05-00OR22725

2 Credits Concepts presented here are derived from 25+ years of managing data for environmental projects. Concepts presented here are derived from 25+ years of managing data for environmental projects. Variations of the concepts have been observed from these disciplines. Variations of the concepts have been observed from these disciplines. plant community research plant community research impact assessment in marine systems impact assessment in marine systems acid rain surveys acid rain surveys environmental monitoring and cleanup projects at DOE facilities environmental monitoring and cleanup projects at DOE facilities land use assessment land use assessment climate change research (atmospheric research) climate change research (atmospheric research) These concepts are believed to extend to other scientific disciplines. These concepts are believed to extend to other scientific disciplines.

3 Presentation Strategy Archiving and science Archiving and science Making connections Making connections Enhancing incentives for archiving Enhancing incentives for archiving Impacts of scale Impacts of scale Volume (files and bytes) Volume (files and bytes) Diversity Diversity Timing Timing Longevity Longevity

4 Source: American Scientist,Vol 886 p 525. You can’t keep running in here and demanding data every two years Challenge: engage scientists in the process of archiving their data and provide the mechanism for archiving. Challenge: engage scientists in the process of archiving their data and provide the mechanism for archiving.

5 Quotes from Raymond “Storing data is easy. Finding and using data later is not.” “Storing data is easy. Finding and using data later is not.” “Systematically and consistently organized data does not occur without cost. Consider the results from previous science projects with no extra effort for data archiving.” “Systematically and consistently organized data does not occur without cost. Consider the results from previous science projects with no extra effort for data archiving.” “The natural tendency over time for data and information is chaos. Effort must be exerted to overcome this.” “The natural tendency over time for data and information is chaos. Effort must be exerted to overcome this.” “Successfully managed data by projects may not be ready to be archived.” “Successfully managed data by projects may not be ready to be archived.”

6 Archive Functions Store data Store data Submitted by others Submitted by others Build a catalog and structure Build a catalog and structure Maintain storage across technology generations Maintain storage across technology generations Review new data (QA, metadata) Review new data (QA, metadata) “Advertise” contents “Advertise” contents Find data for users Find data for users Query and browse logic Query and browse logic Distribute data Distribute data Provide access to data Provide access to data References to documentation References to documentation

7 Presumptions about Archiving Information sharing is important. Information sharing is important. Multidisciplinary data access will foster more robust scientific discoveries. Multidisciplinary data access will foster more robust scientific discoveries. Archiving can be improved. Archiving can be improved. The “neurons” of archives are metadata. The “neurons” of archives are metadata. The limited number of permanent data archives will increase. The limited number of permanent data archives will increase. An expectation from “the Internet” An expectation from “the Internet”

8 Why Archive?? “I am doing Science. Trust me.”

9 Cycles of Research “An Information View” Planning Automation and review Information review Problem Definition (Research Objectives) Analysis and modeling Planning Measurement Collection Selection and extraction Archive of Data Publications Original Observations Secondary Observations 200 yrs 25 yrs

10 “Why Don’t I Archive My Data?” No incentives - What’s in it for me? No incentives - What’s in it for me? No acknowledgment - Does a dataset = a paper? No acknowledgment - Does a dataset = a paper? Give up publication rights - Will somebody scoop me? Give up publication rights - Will somebody scoop me? Poor planning - It was not in “the Plan”. Poor planning - It was not in “the Plan”. No resources - Who’s going to pay for it? No resources - Who’s going to pay for it? No future – Who will support this later? No future – Who will support this later? Lack of training - What do I do first? Lack of training - What do I do first? Unsure about metadata content - How much is enough? Unsure about metadata content - How much is enough?

11 “Why Should I Archive My Data?” (management hints!) Career advancement (give them credit) Career advancement (give them credit) Scientists need to get some recognition for archiving. Scientists need to get some recognition for archiving. Consider scientific journals that also provide companion “data publications”. Consider scientific journals that also provide companion “data publications”. “It may help me do science with broader view.” “It may help me do science with broader view.” Good scientific practice (create peer pressure) Good scientific practice (create peer pressure) Professional development (give them training) Professional development (give them training) Provide daily interactions between scientific and information specialists. Provide daily interactions between scientific and information specialists. Allow a reasonable time for initial discovery. Allow a reasonable time for initial discovery. Provide support for long-term “stewardship”. (Who will answer the questions after the project is completed?) Provide support for long-term “stewardship”. (Who will answer the questions after the project is completed?)

12 “Why Should I Archive My Data?” (more management hints!!) Institutional incentives (Have plans AND expectations) Institutional incentives (Have plans AND expectations) Archiving should be required by the sponsor. Archiving should be required by the sponsor. Data archiving is “in the plan” and resources are available to support it. Data archiving is “in the plan” and resources are available to support it. Interweave archiving with the planning and publication processes. Interweave archiving with the planning and publication processes. Technological advances (Give them hardware and software) Technological advances (Give them hardware and software) It is technically easier now and there are more options. It is technically easier now and there are more options. Consistent “self-discipline” is still challenging. Consistent “self-discipline” is still challenging.

13 “Why Should I Archive My Data?” (still more management hints!!!) “Change” will be managed. (Have standards AND flexibility!!??) “Change” will be managed. (Have standards AND flexibility!!??) Change is inherent in research. Change is inherent in research. Managing change without prior planning can become consumptive. Managing change without prior planning can become consumptive. Changes may cause confusion and diminish data usefulness. Changes may cause confusion and diminish data usefulness. A BIG issue – more details during tomorrow’s panel discussion on “Management and Technical Issues” A BIG issue – more details during tomorrow’s panel discussion on “Management and Technical Issues”

14 Archiving Supports Better Science The metadata required for archiving will improve data quality. The metadata required for archiving will improve data quality. Archiving extends data usefulness. Archiving extends data usefulness. Archived data increases your information base for doing research: Archived data increases your information base for doing research: More data volume and diversity More data volume and diversity Proper archives permit the replication of results. Proper archives permit the replication of results. A KEY concept of Science

15 The Effects of Project Scale on Archives “Metadata are archive neurons??”

16 Metadata Depends on Your “World View” Investigator Investigator Doesn’t need extensive formal metadata Doesn’t need extensive formal metadata Project Project Metadata needed for project integration and modeling activities may be limited Metadata needed for project integration and modeling activities may be limited Project data manager may help write metadata Project data manager may help write metadata Data archive Data archive More detailed metadata (e.g., spatial coordinates) More detailed metadata (e.g., spatial coordinates) More standardization (e.g., keywords) to communicate clearly with future users More standardization (e.g., keywords) to communicate clearly with future users Who writes the metadata? Who writes the metadata?

17 Measurement An Initial View of Data…

18 Measurement Single Experiment View date sample ID parameter name location

19 Measurement Research Project View QA flag media date sample ID parameter name location

20 Measurement Long-term or Multidisciplinary View QA flag media generator method date sample ID parameter name location records Units

21 Measurement Integrated System & Archive View QA flag media generator method date sample ID parameter name location records Units Sample def. type date location generator lab field Method def. words, words units method Parameter def. org.type name custodian address, etc. coord. elev. type depth Record system date words, words. QA def. Units def. GIS

22 Another View of Scale

23 Program Project Scale and Recorded Metadata PIMetadataGroupArchive Increasing User Scope Units Method QA flag Media Parameter name Measurement Date Sample ID Location Generator Records

24 Data Maturation and Scale Individual Investigators Individual Investigators collect data, quality assure, document, analyze, publish collect data, quality assure, document, analyze, publish Groups or Science Teams Groups or Science Teams collate data, enhance, synthesize, model, publish collate data, enhance, synthesize, model, publish Project Information System Project Information System collate data, review completeness, maintain data for project collate data, review completeness, maintain data for project Data Distribution and Archive Center Data Distribution and Archive Center long-term archive, distribute freely to users long-term archive, distribute freely to users Master Data Directory Master Data Directory searchable index with pointers to data searchable index with pointers to data

25 Preparing for Archiving I will not wait. I will not …

26 Measurement Generic Environmental Data Model (Which Piece Is First…?) QA flag media generator method date sample ID parameter name location records Units Sample def. type date location generator lab field Method def. words, words units method Parameter def. org.type name custodian address, etc. coord. elev. type depth Record system date words, words. QA def. Units def. GIS

27 Measurement Sequence of Information Birth QA flag media generator method date sample ID parameter name location records Units Sample def. type date location generator lab field Method def. words, words units method Parameter def. org.type name custodian address, etc. coord. elev. type depth Record system date words, words. QA def. Units def. GIS

28 Research ~ Publishing ~ Metadata Metadata design can be a “checklist” for research planning. Metadata design can be a “checklist” for research planning. Metadata preparation can be integrated with publication process. Metadata preparation can be integrated with publication process. Metadata are an investment in current and future science. Metadata are an investment in current and future science.

29 Summary Points Incentives to archive data are a “management responsibility”. Incentives to archive data are a “management responsibility”. “Management” should understand the “Big Picture” “Management” should understand the “Big Picture” The impacts of scale on archiving. The impacts of scale on archiving. Archives need structure and standards. Archives need structure and standards. Solutions include more than additional technology. Solutions include more than additional technology. New behavior is also VERY important. New behavior is also VERY important. Metadata are the “neurons” of Archives. Metadata are the “neurons” of Archives. Early metadata are better than later. Early metadata are better than later. The planning and decisions about archiving needs to be intentional and not accidental. The planning and decisions about archiving needs to be intentional and not accidental.

30 Future Thoughts Will we be able to know “Where are we?” as the capacity of information technology continues to expand? Will we be able to know “Where are we?” as the capacity of information technology continues to expand? How many 30 KB files are on a 100 GB tape cartridge? How many 30 KB files are on a 100 GB tape cartridge? The future limits will not be technology The future limits will not be technology But our minds… But our minds… We need to plan NOW about how to best leverage the future. We need to plan NOW about how to best leverage the future.

31 Looking Forward to a Future With Archives!!


Download ppt "Managing the Impacts of Programmatic Scale and Enhancing Incentives for Data Archiving A Presentation for “International Workshop on Strategies for Preservation."

Similar presentations


Ads by Google