Presentation is loading. Please wait.

Presentation is loading. Please wait.

Science Metadata Viv Hutchison US Geological Survey

Similar presentations


Presentation on theme: "Science Metadata Viv Hutchison US Geological Survey"— Presentation transcript:

1 Science Metadata Viv Hutchison US Geological Survey
Core Science Analytics Synthesis & Libraries Denver, CO 5th NACP Principal Investigator’s Meeting Washington, DC January 25, 2015

2 Presenter: Viv Hutchison
US Geological Survey Core Science Analytics Synthesis & Libraries program Branch Chief, Science Data Management Lead a team that works on application of the science data lifecycle for USGS scientists through best practices, tools, training Background as a scientist, not as a data manager. My perspective is that of someone who is compiling data to use and share. ORNL, Oak Ridge, TN

3 Topics Why metadata? Examples of metadata standards and how to choose one to use Tips on how to write quality metadata records Publishing metadata In this segment of the course we will cover: What is metadata? What are examples of metadata in our daily lives? And what information needs to be included in a metadata record? CC image by Alec Couros on Flickr

4 The Data Life Cycle Plan Collect Assure Describe Preserve Discover
Integrate Analyze

5 Data Collection CC image by Justin See on Flickr CC image by CIMMYT on Flickr CC image by SEDAC on Flickr Data collection in the field is recorded in a wide variety of ways, including field notebooks, streaming data from satellites, data created from models, etc CC image by acordova on Flickr CC image by ISAS on Flickr CC image by kukkurovaca on Flickr

6 From Field Notes to Datasets
Average Temperature of Observation for Each Species Species Average Temperature Temperature Standard Deviation Number of Observations Minimum Temperature Maximum Temperature Northern Red-legged Frog 4.4 --- 1 Tailed Frog 7.0 3.0 3 4 10 Arizona Toad 10.0 Strecker's Chorus Frog 10.5 2.0 11 9 16 Oregon Spotted Frog 11.0 15.5 2 22 New Jersey Chorus Frog 11.5 4.5 17 Wood Frog 12.5 5.5 897 28.8 Spring Peeper 13.2 5.6 569 -1 32 13.3 5.9 27 After returning from the field, scientists will transfer field notes into spreadsheets and other types of databases in preparation for their data analysis. Displayed here is a partial copy of a data set taken from the website “Frog Watch”. Notice there is no indication of Celsius or Fahrenheit in the “temperature” column. This is a simple example of how it is difficult to understand a dataset without all of the information.

7 From Datasets to Published Papers
CC image by Heather Kennedy on Flickr Once scientists have collected and analyzed data, they publish their conclusions in appropriate science journals.

8 Metadata is a critical part of the data picture
CC image by I like on Flickr This is an example of a metadata record using the Federal Geographic Data Committee (FGDC) standard.

9 Why Care About Metadata?
Fourth Paradigm: scientific breakthroughs will increasingly be powered by advanced computing capabilities that help researchers manipulate and explore massive datasets. “Metadata must be preserved when scientific data is generated…” -- Jim Gray, The Fourth Paradigm Further the time/space distance between data producer and re-use, the more detailed metadata that is required.

10 Metadata: Why Care? “Please forgive my paranoia about protocols, standards, and data review. I'm in the latter stages of a long career with USGS (30 years, and counting), and have experienced much. Experience is the knowledge you get just after you needed it. Several times, I've seen colleagues called into court in order to testify about conditions they have observed. Without a strong tradition of constant review and approval of basic data, they would've been in deep trouble under cross-examination. Instead, they were able to produce field notes, data approval records, and the like to back up their testimony. It's one thing to be questioned by a college student who is working on a project for school. It's another entirely to be grilled by an attorney under oath with the media present.” Nelson Williams Eastern Region USGS Water

11 Metadata: Why Care? Senior climatologists were accused of manipulating important global temperature data The climate scientists at the centre of a media storm over leaked s were yesterday cleared of accusations that they fudged their results and silenced critics, but a review found they had failed to be open enough about their work. Investigations emphasized need for data to be more open to ensure credibility and avoid future misguided controversy Metadata aids in open science

12 “Planet hidden in Hubble archives” Science News (Feb. 27, 2009)
Metadata: Why Care? A new image processing technique reveals something not before seen in this Hubble Space Telescope image taken 11 years ago: A faint planet (arrows), the outermost of three discovered with ground-based telescopes last year around the young star HR 8799.D. Lafrenière et al., Astrophysical Journal Letters “Planet hidden in Hubble archives” Science News (Feb. 27, 2009) “The first thing it tells you is how valuable maintaining long-term archives can be. Here is a major discovery that’s been lurking in the data for about 10 years!” comments Matt Mountain, director of the Space Telescope Science Institute in Baltimore, which operates Hubble. …Metadata is critical in maintaining data in archives – for understanding data you discover

13 The Value of Metadata Data developers users Metadata helps…
Organizations Metadata helps… Metadata is useful to Data Users, Data Developers, and Organizations. In this era of data sharing, collaboration, and need for information organization, metadata can serve multiple purposes.

14 What is the Value to Data Developers?
Metadata allows data developers to: Avoid data duplication Share reliable information Publicize efforts – promote the work of a scientist and his/her contributions to a field of study Reduce Workload CC image by US Embassy Guyana on Flickr What value does metadata have to Data Developers? Metadata records will help avoid data duplication because researchers can determine if data already exists. Scientists are able to share reliable information about a dataset by creating metadata and passing it along with the dataset. Scientists wishing to reuse a dataset can be confident of its origins, data quality, and other valuable information about the data. Metadata also allow data creators to publicize the valuable data they have collected by making the metadata available on clearinghouses and other publically available venues. Metadata can be used in citation practices, thus increasing the visibility of the data.

15 What is the Value to Data Users?
Metadata gives a user the ability to: Search, retrieve, and evaluate data set information from both inside and outside an organization Find data: Determine what data exists for a geographic location and/or topic Determine applicability: Decide if a data set meets a particular need Discover how to acquire the dataset you identified; process and use the dataset CC image by ASEE on Flickr Metadata allows the user to search for and access data from a variety of sources. A search for metadata can be constricted to a geographic boundary, thus showing the user what data has been collected in a particular region. Metadata records help users determine whether the data will be applicable for use in a particular study. Finally, metadata records are of value to data users, because they determine how a dataset can be acquired, and if there are any restrictions on how the data can be used.

16 What is the Value to Organizations?
Metadata helps ensure an organization’s investment in data: Documentation of data processing steps, quality control, definitions, data uses, and restrictions Ability to use data after initial intended purpose Transcends people and time: Offers data permanence Creates institutional memory Advertises an organization’s research: Creates possible new partnerships and collaborations through data sharing CC image by mambol on Flickr An organization that keeps current metadata can benefit in many ways. Metadata records help ensure the organization’s investment in the data by retaining information about how the data was collected, processed, and quality controlled. This creates a permanent record of the dataset –which is critical institutional memory. When researchers leave or retire, metadata allows the dataset to “live on” for the organization. The data may be reused in another research project in the future, and future researchers in the organization will need to know how the dataset was created. Finally, metadata advertises an organization’s research, creating new potential partnerships and collaboration thru data sharing.

17 When data isn’t well managed…
Time of publication Specific details General details Information Content Retirement or career change Accident This graph illustrates the phenomenon of “information entropy”, associated with research. At the time of the research project, a scientists memory is fresh. Details about the development of the dataset are easily recalled, and it is a good time to document information about the process. Over time, memory of the details begins to fade. A variety of circumstances can intervene, and eventually detailed knowledge about the dataset fades. Without a metadata record, this data might be unusable. A dataset it not considered complete without a metadata record to accompany it. Death Time (Michener et al. 1997)

18 Why? Memory Check 50% change in global average
i checked my archives, and here is what i found out: it appears that the current 3rd generation algorithm was implemented into operations around Oct-Nov 2002 time frame. cannot say more precisely, as all correspondence i am looking at, talks about this indirectly. (maybe it's what's refered to as the Phase II algorithm.) At the same time, we had implemented quite a few other changes fixing data bugs and formats: view angle problem, increased digitization in all channel's reflectances and AODs, etc. The jump is deemed due to introducing 3rd generation algorithm, which replaced the 2nd generation. The new numbers (~0.08) look more realistic than the previous ones (~0.05 or so). The changes seen in the data is close to the expected effect of this change. The 3rd gen alg takes into account the exact spectral response, whereas the 2nd gen is generic ("one size fits all"). hopefully this settles the issue.. Why? 50% change in global average

19 Information Entropy Sound information management, including metadata development, can arrest the loss of dataset detail. DATA DETAILS Sound data management is best achieved with making metadata creation a part of the workflow. Not only can it keep the individual scientist organized, but the data has a much better chance of being re-used by future scientists. TIME

20 Still…There are Occasional Concerns About Creating Metadata
Even if the value of data documentation is recognized, concerns remain as to the effort required to create metadata that effectively describe the data. Even if the value of data documentation is recognized, concerns remain as to the effort required to create metadata that effectively describes the data. CC image by waterlilysage on Flickr

21 Let’s Address these Concerns…
Solution workload required to capture accurate robust metadata incorporate metadata creation into data development process – distribute the effort time and resources to create, manage, and maintain metadata include in grant budget and schedule readability / usability of metadata use a standardized metadata format discipline specific information and ontologies use ‘profile’ standard to require specific information and use specific values Metadata does require time and effort to create. The workload, however, is reduced when metadata creation is incorporated into the data development process and the effort is distributed among data contributors. Metadata creation and management should be treated as a standard data development procedure and resources for staff and time should be included in project and proposal work plans and budgets. The use of a standardized metadata format and the development of discipline specific ‘profiles’ of metadata can enable data users to quickly find needed information and address data developer concerns about metadata use and comprehension.

22 Selecting a Standard

23 Choosing a Metadata Standard
Many standards collect similar information…factors to consider: Your data type: Are you working mainly with GIS data? Rastor/vector or point data? Do you have biological or shoreline information in your dataset? - Consider the FGDC Content Standard for Digital Geospatial Metadata with one of its profiles: the Biological Data Profile or the Shoreline Data Profile. Are you working with data retrieved from instruments such as monitoring stations or satellites? Are you using geospatial data services such as applications for web-mapping applications or data modeling? If so, then consider using the ISO standard Are you mainly working with ecological data? Consider Ecological Metadata Language (EML)

24 Choosing a Metadata Standard
Your organization’s policies: do they state which standard to use? What tools are available to create metadata? Examples of Tools: FGDC CSDGM: Mermaid (NOAA) Metavist (Forest Service) -- Online Metadata Editor (USGS) EML: - Morpho (KNB) ISO: -- XML Spy or Oxygen --- CatMD Other factors: Availability of human support; instructional materials; use of controlled vocabularies; output formats More Factors to consider: Your organization’s policies: do they state which standard to use? What resources are available to create metadata? Examples of Tools: FGDC CSDGM: Mermaid (NOAA) Metavist (Forest Service) Online Metadata Editor (USGS) EML: Morpho ( ISO: ( XML Spy or Oxegyn CatMD Other factors: Availability of human support; instructional materials; use of controlled vocabularies; output formats

25 Writing Quality Metadata

26 Steps to Create Quality Metadata
Organize your information Did you write a project abstract to obtain funding for your proposal? Re-use it in your metadata! Did you use a lab notebook or other notes during the data development process that define measurements and other parameters? Do you have the contact information for colleagues you worked with? What about citations for other data sources you used in your project? Steps in creating quality metadata includes the following: Organize your information. Before you begin gather your resources, particularly anything you may have already written about the dataset for another purpose. For example, a grant proposal that has a well-written abstract and purpose for the research is a great resource. Write your metadata. Review the record for accuracy and completeness. Ask someone else read your record. Revise your information based on comments from your reviewer, then review it once more before you publish it. CC image by on Google Images

27 Steps to Create Quality Metadata
Write your metadata using a metadata tool Submitting to the DAAC? A metadata creation process in in place for you.. Steps in creating quality metadata includes the following: Organize your information. Before you begin gather your resources, particularly anything you may have already written about the dataset for another purpose. For example, a grant proposal that has a well-written abstract and purpose for the research is a great resource. Write your metadata. Review the record for accuracy and completeness. Ask someone else read your record. Revise your information based on comments from your reviewer, then review it once more before you publish it.

28 Steps to Create Quality Metadata
Review for accuracy and completeness Have someone else read your record Revise the record, based on comments from your reviewer Review once more before you publish CC image by Shelly Munkberg on Flickr Steps in creating quality metadata includes the following: Organize your information. Before you begin gather your resources, particularly anything you may have already written about the dataset for another purpose. For example, a grant proposal that has a well-written abstract and purpose for the research is a great resource. Write your metadata. Review the record for accuracy and completeness. Ask someone else read your record. Revise your information based on comments from your reviewer, then review it once more before you publish it. CC image by mujalifah on Flickr

29 Tips for Writing Quality Metadata
Do not use jargon -- define technical terms and acronyms: CA, LA, GPS, GIS : what do these mean? Clearly state data limitations E.g., data set omissions, completeness of data Express considerations for appropriate re-use of the data Use “none” or “unknown” meaningfully None usually means that you knew about data and nothing existed (e.g., a “0” cubic feet per second discharge value) Unknown means that you don’t know whether that data existed or not (e.g., a null value) CC image by kruuscht on Flickr Think about the long-term effects of writing good metadata. Avoid using jargon and take the time to define all technical terms and acronyms. Clearly state data limitations – this may include any omissions to the data, or how complete the dataset is based on the data collection parameters. Define the use of none or unknown: None usually means that you knew about data but nothing existed. Unknown means you don’t know whether that data existed or not.

30 Tips for Writing Quality Metadata
Titles, Titles, Titles… Titles are critical in helping readers find your data While individuals are searching for the most appropriate data sets, they are most likely going to use the title as the first criteria to determine if a dataset meets their needs. Treat the title as the opportunity to sell your dataset. A complete title includes: What, Where, When, Who, and Scale An informative title includes: topic, timeliness of the data, specific information about place and geography Titles are critical in helping researchers find data. While searching for appropriate datasets to include in their research, a researcher is most likely to use the title as the first criteria in determining if a dataset meets their needs. This enables you to treat the title as an opportunity to sell your dataset! A complete title includes the What, Where, When, Who, and Scale about the data. A more informative title will also include topic, timeliness of the data, specific information about place and geography.

31 Tips for Writing Quality Metadata
A Clear Choice: Which title is better? Rivers OR Greater Yellowstone Rivers from 1:126,700 U.S. Forest Service Visitor Maps ( ) Greater Yellowstone (where) Rivers (what) from 1:126,700 (scale) U.S. Forest Service (who) Visitor Maps ( ) (when) CC image by dolfi on Flickr This example illustrates the importance of descriptive titles in metadata records. The title “Greater Yellowstone Rivers from 1:126,720 Forest Visitor Maps ( )”, gives enough detail for a reader to discern whether they might like more information about your data from your metadata record.

32 Tips for Writing Quality Metadata
Be specific and quantify when you can! The goal of a metadata record is to give the user enough information to know if they can use the data without contacting the dataset owner. Vague: We checked our work and it looks complete. Specific: We checked our work using a random sample of 5 monitoring sites reviewed by 2 different people. We determined our work to be 95% complete based on these visual inspections. CC image by PNASH on Flickr One goal of a metadata record is to give a reader enough information about your data, that s/he could re-use it without contacting you, the dataset owner. “We checked our work and it looks complete” is too vague for a reader to assess quality control on the dataset, for example. More specific language gives the reader more information about how the data was collected and analyzed.

33 Tips for Writing Quality Metadata
Use descriptive and clear writing Fully qualify geographic locations Select keywords wisely - use thesauri for keywords whenever possible Example: USGS Biocomplexity Thesaurus (over 9,500 terms) CC image by Marco Arment on Flickr Select your keywords wisely. Think about the many ways someone might search for your data. Use descriptive and clear writing. Fully document geographic locations. Use thesauri whenever possible for keywords. Keywords are essential for locating records in clearinghouses quickly and efficiently. Use of standard thesauri, such as the USGS Biocomplexity Thesaurus, makes selecting keywords easier, and helps keep records consistent in content.

34 Tips for Writing Quality Metadata
Remember: a computer will read your metadata Do not use symbols that could be misinterpreted: Examples: # % { } | / \ < > ~ Do not use tabs, indents, or line feeds/carriage returns When copying and pasting from other sources, use a text editor (e.g., Notepad) to eliminate hidden characters When creating your metadata, keep in mind that a computer will read your metadata record. Therefore, it is important not to use tabs, indents, and special characters because they can be misunderstood by a computer. If you are copying and pasting content from other sources into your metadata record it is prudent to use a text editor as a middle step to prevent applications from adding in unnecessary characters in the background of your text. CC image by Ben on Google Images

35 Tips for Writing Quality Metadata
Fully define entities, attributes, units of measure Ignore temptation to only fill in mandatory fields in the standard -- skipping sections of metadata standard labeled “mandatory if applicable” or “optional” are often critical portions of the standard Example: Seven Major Metadata Sections: Section 1 - Identification Information* Section 2 - Data Quality Information Section 3 - Spatial Data Information Section 4 - Spatial Reference Information Section 5 - Entity and Attribute Information Section 6 - Distribution Information Section 7 - Metadata Information* Three Supporting Sections: Section 8 - Citation Information* Section 9 - Time Period Information* Section 10 - Contact Information* * Minimum required metadata When creating your metadata, keep in mind that a computer will read your metadata record. Therefore, it is important not to use tabs, indents, and special characters because they can be misunderstood by a computer. If you are copying and pasting content from other sources into your metadata record it is prudent to use a text editor as a middle step to prevent applications from adding in unnecessary characters in the background of your text.

36 Share Your Metadata: Distribution
Share your metadata with other researchers Examples of metadata search portals: DAAC Distributed Active Archive for Biogeochemical Dynamics Data.gov Federal e-gov geospatial data portal Metacat Repository for data and metadata DataONE NSF-funded data infrastructure Geospatial data portals are plentiful, and contain easily accessible metadata collections from a variety of institutions.

37 DAAC Search

38 Summary Metadata is documentation of data
A metadata record captures critical information about the content of a dataset Metadata allows data to be discovered, accessed, and re-used A metadata standard provides structure and consistency to data documentation Standards and tools vary – select according to defined criteria such as data type, organizational guidance, and available resources Metadata is of critical importance to data developers, data users, and organizations Writing quality metadata is important because records are expected to last with the data over decades Metadata completes a dataset. Creating robust metadata is in your OWN best interest! Metadata is documentation of data A metadata record captures critical information about the content of a dataset Metadata allows data to be discovered, accessed, and re-used A metadata standard provides structure and consistency to data documentation Standards and tools vary – select according to defined criteria such as data type, organizational guidance, and available resources Metadata is of critical importance to data developers, data users, and organizations Writing quality metadata is important because records are expected to last with the data over decades Metadata completes a dataset.


Download ppt "Science Metadata Viv Hutchison US Geological Survey"

Similar presentations


Ads by Google