Presentation is loading. Please wait.

Presentation is loading. Please wait.

U.S. Department of the Interior U.S. Geological Survey Information Technology Exchange Meeting May 24 – 28, 2010 A New Decade in Support of Science Thinking.

Similar presentations


Presentation on theme: "U.S. Department of the Interior U.S. Geological Survey Information Technology Exchange Meeting May 24 – 28, 2010 A New Decade in Support of Science Thinking."— Presentation transcript:

1 U.S. Department of the Interior U.S. Geological Survey Information Technology Exchange Meeting May 24 – 28, 2010 A New Decade in Support of Science Thinking differently about data management “Adventures in support of ‘project’ data” Steve Tessler, NJ WSC Brian Reece, TX WSC Thinking differently about data management “Adventures in support of ‘project’ data” Steve Tessler, NJ WSC Brian Reece, TX WSC

2 Information Technology Exchange Meeting A New Decade in Support of Science Save a Process, not Intermediate Data Many datasets need to be converted from format ‘A’ to format ‘B’. Loading applications are one example of a conversion or transformation tool, but Researchers are often faced with converting ‘structured’ data to one or more separate formats for specific uses, and frequently save all intermediate steps as separate files or tables -- creating versioning nightmares and archival backaches… you know the drill… Data management is like tending a garden – care today means a good harvest tomorrow.

3 Information Technology Exchange Meeting A New Decade in Support of Science The Staging-Area approach to Data Handling Here’s a simple technique for transforming data from ‘format A’ to ‘format B’ – that semi-skilled data handlers can learn and use on their own I use MS Access for this, but the approach can be applied or implemented using other dbms’s or toolsets At a minimum all it takes is an ActionSet table, one function, some Source data, and a Target structure – Let me illustrate with a more complex example…. Repeating Groups? We don’t need no stinkin’ repeating groups!

4 Information Technology Exchange Meeting A New Decade in Support of Science

5 Information Technology Exchange Meeting A New Decade in Support of Science

6 Information Technology Exchange Meeting A New Decade in Support of Science The Staging-Area approach to Data Handling

7 Information Technology Exchange Meeting A New Decade in Support of Science The Staging-Area approach to Data Handling Benefits No tools to buy or new languages to learn (XSLT) Source data are never changed nor intermediates saved, and target tables meet the ‘format’ needs of the researcher (or replace ‘bad’ initial structures) Creates a logical ‘one button’ solution based on sequenced ‘action queries’ that users understand and can modify on their own No more ‘numbered’ query names – that need to be renumbered Very easy to Halt the sequence to check intermediate steps The process is self-documenting (descriptions in ActionSet table) Prevent ‘mystery data’ by giving all tables and fields good definitions

8 Information Technology Exchange Meeting A New Decade in Support of Science Minimum Documentation for Project Datasets We’ve talked this week about Metadata as a requirement for documenting and sharing project-level datasets. Metadata can take many forms but all are designed to inform a potential user about the nature and content of the dataset Here is a list of the other things I wish I got along with every dataset I was given to work with… Data mismanagement is like tending a garden – dirt everywhere!

9 Information Technology Exchange Meeting A New Decade in Support of Science Minimum Documentation for Project Datasets A basic Data Dictionary – table and field definitions Descriptions of Reference/Domain table items An Entity-Relationship diagram A ‘Loading Order’ report A Data-Mapping spreadsheet (if mapped or moved from A to B) Always start with a good data model.

10 Information Technology Exchange Meeting A New Decade in Support of Science Minimum Documentation for Project Datasets

11 Information Technology Exchange Meeting A New Decade in Support of Science Minimum Documentation for Project Datasets

12 Information Technology Exchange Meeting A New Decade in Support of Science Minimum Documentation for Project Datasets

13 Just kidding, sort of Uninstall Excel Excel surcharge Security features prevent people from running macros, login screen reminds us govt. computer, why not reminder Excel isn’t a data management tool Excel rules of behavior annual test?

14 A clue you need a data management plan 5/9/2009 8:51:11 File Name/Path too Long - Could not write file: \\igskiacwgsnas\PeerSync\Houston\Logger_Data\MATT -D-DRIVE\Matt\USGS\BACK UP\Inflow Pieces\Feb - June 08\3 17 08\Inflow\Other\Inflow bkup\Inflow\Inflow\data\all other\misc\histdata\EFork\EFork SPLUS\ESTREND\TrenTstRes\Cen Const\NH3 OrgN wf\dif dtval p sl.xls

15 Another clue This_is_a_really long filename_created_on_May 25, 2010_by_bdreece.pdf

16 Think abstraction You are generating digital objects, and eventually run out of filenames / directories

17


Download ppt "U.S. Department of the Interior U.S. Geological Survey Information Technology Exchange Meeting May 24 – 28, 2010 A New Decade in Support of Science Thinking."

Similar presentations


Ads by Google