Presentation is loading. Please wait.

Presentation is loading. Please wait.

Local Data Management: Building understandable spreadsheets Jeff Arnfield National Climatic Data Center Version 1.0 Review Date.

Similar presentations


Presentation on theme: "Local Data Management: Building understandable spreadsheets Jeff Arnfield National Climatic Data Center Version 1.0 Review Date."— Presentation transcript:

1 Local Data Management: Building understandable spreadsheets Jeff Arnfield National Climatic Data Center Version 1.0 Review Date

2 Data Formats: Building understandable spreadsheets; Version 1.0, Reviewed --/--/---- Overview Spreadsheets are amazingly flexible, and are commonly used for data collection, analysis and management Spreadsheets are seldom self-documenting, and seldom well-documented Subtle (and not so subtle) errors are easily introduced during entry, manipulation and analysis Spreadsheet conventions – often ad hoc and evolutionary – may change or be applied inconsistently Spreadsheet file formats are proprietary and thus generally unacceptable as long term archival purposes

3 Data Formats: Building understandable spreadsheets; Version 1.0, Reviewed --/--/---- Prior planning prevents pitfalls Clearly document all parameters, conventions, algorithms and assumptions Use descriptive, consistent names Use optimal and consistent layouts and formats Preserve raw data and minimize duplication Validate data and formulae Keep formulae as simple and comprehensible as possible Use some method of version control to manage change Develop a strategy for archiving the content and logic

4 Data Formats: Building understandable spreadsheets; Version 1.0, Reviewed --/--/---- Planning and documenting Form follows function and content Data sources, volume, diversity of content and planned transformations may require multiple files or directories Your data will often suggest natural naming conventions and layouts Document your conventions as they’re established Revise documentation contemporaneously, not “after the fact” This work is also the basis for end-user or reviewer documentation What should you document? Everything! Data import, manipulation, QC procedures, special flags and encoding Naming conventions, layouts, headings, units and abbreviations Does TEMP mean “temporary,” “air temperature at time of observation,” or ? Formulae and constants

5 Data Formats: Building understandable spreadsheets; Version 1.0, Reviewed --/--/---- What’s in a name? Define naming conventions for Directory hierarchies (if the scope of your project requires them) Files, including different versions if necessary Column and row labels, and individual tabs on multi-tab spreadsheets Names should clearly and uniquely describe content Sheet1, Sheet2 and Sheet3 do not convey much useful information! Stable, consistent names can be referenced and processed If dates form part of a name, use a sortable format yyyymmdd (20110825) is easier to process and validate automatically than 25Aug2011, or the unthinkable 25Aug11 Consider standard abbreviations or keywords, if they exist

6 Data Formats: Building understandable spreadsheets; Version 1.0, Reviewed --/--/---- Layout and formatting Use appropriate data types Date/time data type permits sorting and calculations If numeric identifiers may have leading 0s, format as text Format numeric values with the appropriate number of decimal places Use separate columns to denote special cases rather than relying solely upon even well-documented color coding Color coded conditions cannot readily be sorted and aggregated Color coding is lost entirely if data are exported as ASCII text Include enough detail to make the data self-describing Item identifier, date, time, parameter name, units, value, quality flag Make the spreadsheet self-contained by adding a tab with legend details, column definitions and formula descriptions

7 Data Formats: Building understandable spreadsheets; Version 1.0, Reviewed --/--/---- Data handling and formulae Don’t combine multiple data layouts in a single sheet If sheets have common content, identical layouts and headings increase clarity and simplify linking for analysis Simplify updates by separating data and analyses If data are propagated across sheets or manually manipulated, refreshing data can be a daunting and error-prone task Freeze heading rows and columns to provide context as you work deep in the sheet Placing totals, counts and other summary statistics above the column headers ensures they are always visible Headers, footers and repeating headings improve printouts

8 Data Formats: Building understandable spreadsheets; Version 1.0, Reviewed --/--/---- Best practices Hard code nothing! Use a separate “values and assumptions” tab or area for constants and conversion factors. Use named ranges and cells rather than row/column references C2*CubicFtToGallonConvertFactor is clearer than C2*Assumptions!$B$3 When sorting rows or copying formulae, be sure cell references do not change unintentionally Most spreadsheets have data validation tools. Use them! If using spreadsheet for data entry, build entry validation rules Counts, averages, max/mins, standard deviations, value lookups and custom formulae provide more sophisticated QC checks Pivot tables are useful for QC and summarization

9 Data Formats: Building understandable spreadsheets; Version 1.0, Reviewed --/--/---- Version control and archiving Versioning and change management Periodic, dated backups are essential Simple version control is possible via a naming convention for saving incremental copies, along with a log of significant changes Version control systems, like Subversion, are powerful but complex How will you archive for long-term availability? Gear your approach to your data, and to your archive’s requirements Save content as delimited ASCII Must save each tab separately Document formulae separately, since results rather than formulae are saved Easy cheat: ~ will reveal formulae, which can then be exported intact. However, references to other sheets will not be readily resolved Convert to XML Printing as PDF preserves appearance, but complicates future reuse

10 Data Formats: Building understandable spreadsheets; Version 1.0, Reviewed --/--/---- References and Resources Cook, R. B., R. J. Olson, P. Kanciruk, L. A. Hook. 2001 “Best Practices for Preparing Ecological Data Sets to Share and Archive.” Bulletin of the Ecological Society of America 82(2):138-141. http://www.jstor.org/stable/20168543http://www.jstor.org/stable/20168543 Leong, K. “Seven deadly sins of spreadsheet use in business: Excel best practices.” http://production- scheduling.com/seven-deadly-spreadsheet-sins/http://production- scheduling.com/seven-deadly-spreadsheet-sins/ GCMD science and associated directory keywords http://gcmd.nasa.gov/Resources/valids/archives/keyword_li st.html http://gcmd.nasa.gov/Resources/valids/archives/keyword_li st.html CF Metadata Convention – CF Standard Names. http://cf-pcmdi.llnl.gov/documents/cf-standard-names/ http://cf-pcmdi.llnl.gov/documents/cf-standard-names/

11 Data Formats: Building understandable spreadsheets; Version 1.0, Reviewed --/--/---- Other Relevant Modules Avoiding proprietary formats File naming conventions Maintaining contemporaneous documentation Planning for longer term preservation Work with your archive early and often


Download ppt "Local Data Management: Building understandable spreadsheets Jeff Arnfield National Climatic Data Center Version 1.0 Review Date."

Similar presentations


Ads by Google