Presentation is loading. Please wait.

Presentation is loading. Please wait.

# 1 METADATA: A LEGACY FOR OUR GRANDCHILDREN N. Scott Urquhart STARMAP Program Director Department of Statistics Colorado State University.

Similar presentations


Presentation on theme: "# 1 METADATA: A LEGACY FOR OUR GRANDCHILDREN N. Scott Urquhart STARMAP Program Director Department of Statistics Colorado State University."— Presentation transcript:

1 # 1 METADATA: A LEGACY FOR OUR GRANDCHILDREN N. Scott Urquhart STARMAP Program Director Department of Statistics Colorado State University

2 # 2 DISCLAIMERSDISCLAIMERS  The work reported here today was developed under the STAR Research Assistance Agreement CR awarded by the U.S. Environmental Protection Agency (EPA) to Colorado State University. This presentation has not been formally reviewed by EPA. The views expressed here are solely those of author and the STARMAP, the program he represents. EPA does not endorse any products or commercial services mentioned in this presentation.  The people of CEER-GOM have heard parts of this presentation. Sorry. That presentation at Ocean Springs, MS (3/26/02) led to an invitation for this talk.

3 # 3 CONTEXT FOR COMMENTS  SPACE-TIME AQUATIC RESOURCES MODELING AND ANALYSIS PROGRAM = STARMAP  STARMAP IS FUNDED BY EPA’s STAR PROGRAM, AS ARE ALL OF THE EaGLes PROGRAMS (==> “SIBLING” PROGRAMS)  STARMAP IS TO USE EMAP AS A DATA SOURCE AND CONTEXT  NSU = STARMAP PROGRAM CSU 10 YEARS OF COLLABORATION WITH EMAP 40 + YEARS AS STATISTICIAN WORKING WITH ECOLOGISTS

4 # 4 AN IMPORTANT LESSON  YOU DO NOT KNOW WHAT YOUR DATA WILL BE USED FOR 20 YEARS FROM NOW  BY THE TIME THE VARIOUS EaGLes PROGRAMS ARE COMPLETE WE, AS TAX PAYERS, WILL HAVE INVESTED > $40M IN THE VARIOUS STUDIES  THE RESULTING DATA NEEDS TO BE RESPONSIBLY AND READILY AVAILABLE TO FUTURE GENERATIONS

5 # 5 YOU DO NOT KNOW WHAT YOUR DATA WILL BE USED FOR 20 YEARS FROM NOW  POPULAR PRESPECTIVE - WE “KNOW” LOTS ABOUT THE “ENVIRONMENT”  REALITY: GOOD AQUATIC DATA IS SCARCE  SPATIALLY EXTENSIVE  OVER A REASONABLE TIME SPAN  WELL DOCUMENTED PROCEDURES  WELL TRAINED CREWS  CAREFULLY EXECUTED STUDIES  DATA PUBLICALLY AVAILABLE

6 # 6 THE VALUE OF “METADATA”  DATA  WITHOUT CONTEXT ARE NUMBERS NEARLY WORTHLESS TO OTHERS How many file cabinets full of data are in your park offices?  DATA WITH CONTEXT IS INFORMATION CAN BE VALUABLE TO OTHERS  CONTEXT IS CALLED METADATA

7 # 7 VERY DISCOURAGING EXPERIENCE WITH HISTORIC DATA  THREE HISTORIC DATA SETS  NUTRIENTS IN NORTHEAST LAKES Larsen, D. P., N. S. Urquhart and D. Kugler (1995). Regional scale trend monitoring of indicators of trophic condition of lakes. Water Resources Bulletin 31:  E. COLI IN A RIVER BASIN IN OREGON  NUTRIENTS IN LAKES & STREAMS IN EPA REGION 10  EMAP SURFACE WATERS I THOUGHT THIS WAS WELL DOCUMENTED!

8 # 8 SO WHAT IS METADATA?  BEST DEF’N SEEMS TO BE ORGANIZED “DATA ABOUT DATA”  VERY DIVERSE VIEWS ABOUT WHAT IT SHOULD CONTAIN: LIBRARIANS W3 - GROUP - - DEFINING FEATURES OF THE WORLD WIDE WEB { title, description, publication date and author } CENSUS-BUREAU TYPES, WORLDWIDE GEOGRAPHIC DATA STANDARDS EPA’s STORET

9 # 9 WHAT IS METADATA GOOD FOR?  A Librarian probably would answer  Discovery  Managing the resource (Ownership &responsibility) ARCHIVING AUTHENTICATING - QA/QC - UNCHANGING GROWING  This statistician answers  For correctly analyzing data in the future  Not discovery, but correct utilization  Paths to related documents based on the same dataset

10 # 10 METADATA COMPONENTS IMPORTANT TO A PERSON ANALYZING THE DATA  NAME OF DATASET  DEFINITION OF RESPONSES EVALUATED  MOTIVATING FACTORS  INTERNAL FEATURES OF DATASET

11 # 11 IMPORTANT METADATA COMPONENT: DATASET NAME  IS THIS REALLY IMPORTANT?  YES!  IMPORTANT FINDINGS FROM A DATASET WILL BE PUBLISHED. WE NEED TO ADOPT A CONVENTION THAT THE DATASET NAME IS A KEYWORD. Name needs to be permanent and consistently used THEN THEN FUTURE INVESTIGATORS CAN USE STANDARD SEARCH TOOLS TO FIND INFORMATION EXTRACTED FROM EACH DATASET.  MUCH LONGER LIVED THAN WEB LINKS

12 # 12 IMPORTANT METADATA COMPONENT: DATASET NAME { continued }  Filtering criteria for data on which publication is based Name of existing named subset Geographic/temporal subset Response subset

13 # 13 IMPORTANT METADATA COMPONENT: DEFINITION OF RESPONSES EVALUATED  USE IT TO DOCUMENT  SITE SELECTION AND LOCATION  FIELD PROTOCOLS FOR GATHERING DATA & MATERIAL Peck DV, Lazorchak JM, Klemm DJ, editors EMAP Surface Waters: Western Pilot Study field operations manual for wadeable streams. Corvallis (OR): U.S. Environmental Protection Agency, Office of Research and Development. 275 p.  LABORATORY METHODS  QUALITY ASSURANCE/QUALITY CONTROL

14 # 14 IMPORTANT METADATA COMPONENT: MOTIVATING FACTORS  WHAT WERE THE STUDY OBJECTIVES?  Scale = one page (perhaps a lot more in this context); Specific objectives Narrative on their origin  WHY & HOW WERE THE SITES SELECTED?  From some population of sites (restrictions)  Purposefully  Good idea - accessibility of whole study plan

15 # 15 IMPORTANT METADATA COMPONENT: INTERNAL FEATURES OF DATASET  LARGE DATASETS OFTEN CONSIST OF MANY SUB DATA SETS  EG: EMAP MAHA DATA COLLECTION CONSISTS OF 42 SAS DATASETS UNIQUE SITE IDENTIFICATION; WITH DATE OF SITE VISIT DATA IS UNIQUELY IDENTIFIED.  Why was this subset of the data constructed?  Who knows more about it  Which responses are in which data sets? Be careful that values are the same in each data set

16 # 16 IMPORTANT METADATA COMPONENT: INTERNAL FEATURES OF DATASET (continued)  Data dictionary  Usable paths to definition of variables  METHODS USED TO DEAL WITH  NONDETECTS, MISSING OR LOST DATA, ETC

17 # 17 THANK YOU FOR YOUR ATTENTION Acknowledgement: Nancy Chaffin, Metadata Librarian, Morgan Library, Colorado State University QUESTIONS and/or COMMENTS ARE WELCOME


Download ppt "# 1 METADATA: A LEGACY FOR OUR GRANDCHILDREN N. Scott Urquhart STARMAP Program Director Department of Statistics Colorado State University."

Similar presentations


Ads by Google