Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Management 101 for Earth Scientists Managing Your Data Robert Cook Environmental Sciences Division Oak Ridge National Laboratory.

Similar presentations


Presentation on theme: "Data Management 101 for Earth Scientists Managing Your Data Robert Cook Environmental Sciences Division Oak Ridge National Laboratory."— Presentation transcript:

1 Data Management 101 for Earth Scientists Managing Your Data Robert Cook Environmental Sciences Division Oak Ridge National Laboratory

2 Data Management 101 for Earth Scientists Roadmap Introduction Metadata Best Practices for Data Management

3 Data Management 101 for Earth Scientists Benefits of Good Data Management Practices Short-term Spend less time doing data management and more time doing research. Easier to prepare and use data for yourself. Collaborators can readily understand and use data files. Long-term (data publication) Scientists outside your project can find, understand, and use your data to address broad questions. You get credit for archived data products and their use in other papers. Sponsors protect their investment. 3

4 Data Management 101 for Earth Scientists Metadata Information to let you find, understand, and use the data descriptors documentation 4

5 Data Management 101 for Earth Scientists Metadata needed to Understand Data The details of the data …. Measurement date Sample ID Parameter name location 5

6 Data Management 101 for Earth Scientists Metadata Needed to Understand Data Measurement QA flag media generator method date sample ID parameter name location records Units Sample def. type date location generator lab field Method def. units method Parameter def. org.type name custodian address, etc. coord. elev. type depth Record system date words, words. QA def. Units def. GIS 6

7 Data Management 101 for Earth Scientists The 20-Year Rule ( NRC 1991) The metadata accompanying a data set should be written for a user 20 years into the future-- what does that investigator need to know to use the data? Prepare the data and metadata / documentation for a user who is unfamiliar with the details of your project, methods, and observations 7

8 Data Management 101 for Earth Scientists Proper Curation Enables Data Reuse Time Information Content Planning Collection Assure Metadata and Documentation Archive Sufficient for Sharing and Reuse 8

9 Data Management 101 for Earth Scientists Proper Curation Enables Data Reuse Time Information Content Planning Collection Assure Metadata and Documentation Archive Sufficient for Sharing and Reuse 9

10 Data Management 101 for Earth Scientists Fundamental Data Practices 1.Define the contents of your data files 2.Use consistent data organization 3.Use stable file formats – Curt Tilmes 4.Assign descriptive file names 5.Perform basic quality assurance 6.Provide documentation 7.Protect your data 10

11 Data Management 101 for Earth Scientists 1. Define the contents of your data files Content flows from science plan (hypotheses) and is informed from requirements of final archive. Keep a set of similar measurements together in one file same investigator, methods, time basis, and instrument No hard and fast rules about contents of each files. 11

12 Data Management 101 for Earth Scientists 1. Define the Contents of Your Data Files Define the parameters 12 Scholes (2005) Use commonly accepted parameter names and units(SI Units) Be consistent Explicitly state units Use ISO formats Parameter Table

13 Data Management 101 for Earth Scientists 2. Use consistent data organization (one good approach) StationDateTempPrecip Units YYYYMMDDCmm HOGI19961001120 HOGI19961002143 HOGI1996100319-9999 Note: -9999 is a missing value code for the data set 13 Each row in a file represents a complete record, and the columns represent all the parameters that make up the record.

14 Data Management 101 for Earth Scientists 2. Use consistent data organization (a 2 nd good approach) StationDateParameterValueUnit HOGI19961001Temp12C HOGI19961002Temp14C HOGI19961001Precip0mm HOGI19961002Precip3mm 14 Parameter name, value, and units are placed in individual columns. This approach is used in relational databases.

15 Data Management 101 for Earth Scientists Use descriptive file names Unique Reflect contents ASCII characters only Avoid spaces Bad: Mydata.xls 2001_data.csv best version.txt Better:bigfoot_agro_2000_gpp.tiff Site name Year What was measured (abundance) Project Name File Format 4. Assign descriptive file names 15

16 Data Management 101 for Earth Scientists 16 Source : PhD Comics

17 Data Management 101 for Earth Scientists 5. Perform basic quality assurance Perform and review statistical summaries Plot data and assess errors No better QA than to analyze data 17

18 Data Management 101 for Earth Scientists 5. Perform basic quality assurance (con’t) Plot information to examine outliers 18 Model X uses UTC time, all others use Eastern Time Data from the North American Carbon Program Interim Synthesis (Courtesy of Yaxing Wei, ORNL) Model-Observation Intercomparison

19 Data Management 101 for Earth Scientists 5. Perform basic quality assurance (con’t) Plot information to examine outliers 19 Data from the North American Carbon Program Interim Synthesis (Courtesy of Yaxing Wei, ORNL) Model-Observation Intercomparison

20 Data Management 101 for Earth Scientists 6. Provide Documentation / Metadata What does the data set describe? Why was the data set created? Who produced the data set and Who prepared the metadata? How was each parameter measured? What assumptions were used to create the data set? When and how frequently were the data collected? Where were the data collected and with what spatial resolution? (include coordinate reference system) How reliable are the data?; what is the uncertainty, measurement accuracy?; what problems remain in the data set? What is the use and distribution policy of the data set? How can someone get a copy of the data set? Provide any references to use of data in publication(s) 20

21 Data Management 101 for Earth Scientists 7. Protect data Create back-up copies often Ideally three copies original, one on-site (external), and one off-site Frequency based on need / risk Know that you can recover from a data loss Periodically test your ability to restore information 21

22 Data Management 101 for Earth Scientists DataONE Resources dataone.org Data Management Plans Links to DMP Tool Best Practices Tools Database Everything you wanted to know about data management, but were afraid to ask. Carly Strasser, et al 22

23 Data Management 101 for Earth Scientists Best Practices: Conclusions Data management is important in today’s science Well organized data: enables researchers to work more efficiently can be shared easily by collaborators can potentially be re-used in ways not imagined when originally collected 23

24 Data Management 101 for Earth Scientists Questions? 24

25 Data Management 101 for Earth Scientists References and Resources (continued) Ball, C. A., G. Sherlock, and A. Brazma. 2004. Funding high-throughput data sharing. Nature Biotechnology 22:1179-1183. doi:10.1038/nbt0904-1179. Borer, ET., EW. Seabloom, M.B. Jones, and M. Schildhauer. 2009. Some Simple Guidelines for Effective Data Management,Bulletin of the Ecological Society of America. 90(2): 205- 214. Christensen, S. W. and L. A. Hook. 2007. NARSTO Data and Metadata Reporting Guidance. Provides reference tables of chemical, physical, and metadata variable names for atmospheric measurements. Available on-line at: http://cdiac.ornl.gov/programs/NARSTO/ http://cdiac.ornl.gov/programs/NARSTO/ Cook, Robert B, Richard J. Olson, Paul Kanciruk, and Leslie A. Hook. 2001. Best Practices for Preparing Ecological Data Sets to Share and Archive. Bulletin of the Ecological Society of America, Vol. 82, No. 2, April 2001. Hook, L. A., T. W. Beaty, S. Santhana-Vannan, L. Baskaran, and R. B. Cook. June 2007. Best Practices for Preparing Environmental Data Sets to Share and Archive. http://daac.ornl.gov/PI/pi_info.shtml http://daac.ornl.gov/PI/pi_info.shtml Kanciruk, P., R.J. Olson, and R.A. McCord. 1986. Quality Control in Research Databases: The US Environmental Protection Agency National Surface Water Survey Experience. In: W.K. Michener (ed.). Research Data Management in the Ecological Sciences. The Belle W. Baruch Library in Marine Science, No. 16, 193-207. 25

26 Data Management 101 for Earth Scientists References and Resources Management in the Ecological Sciences. The Belle W. Baruch Library in Marine Science, No. 16, 193-207. National Research Council (NRC). 1991. Solving the Global Change Puzzle: A U.S. Strategy for Managing Data and Information. Report by the Committee on Geophysical Data of the National Research Council Commission on Geosciences, Environment and Resources. National Academy Press, Washington, D.C.Michener, W K. 2006. Meta- information concepts for ecological data management. Ecological Informatics. 1:3-7. Michener, W.K. and J.W. Brunt (ed.). 2000. Ecological Data: Design, Management and Processing, Methods in Ecology, Blackwell Science. 180p. Michener, W. K., J. W. Brunt, J. Helly, T. B. Kirchner, and S. G. Stafford. 1997. Non- Geospatial Metadata for Ecology. Ecological Applications. 7:330-342. U.S. EPA. 2007. Environmental Protection Agency Substance Registry System (SRS). SRS provides information on substances and organisms and how they are represented in the EPA information systems. Available on-line at: http://www.epa.gov/srs/http://www.epa.gov/srs/ USGS. 2000. Metadata in plain language. Available on-line at: http://geology.usgs.gov/tools/metadata/tools/doc/ctc/ http://geology.usgs.gov/tools/metadata/tools/doc/ctc/ 26


Download ppt "Data Management 101 for Earth Scientists Managing Your Data Robert Cook Environmental Sciences Division Oak Ridge National Laboratory."

Similar presentations


Ads by Google