Presentation is loading. Please wait.

Presentation is loading. Please wait.

“Scientists seeking data should be able to efficiently and reliably locate LTER datasets through searching, browsing …“  Get feedback on general direction.

Similar presentations


Presentation on theme: "“Scientists seeking data should be able to efficiently and reliably locate LTER datasets through searching, browsing …“  Get feedback on general direction."— Presentation transcript:

1

2 “Scientists seeking data should be able to efficiently and reliably locate LTER datasets through searching, browsing …“  Get feedback on general direction of working group activities  Resolve some specific issues  Decide on “Next Steps”  Products  Comments to be acted on  White paper concerning specific issues and “next steps”

3 TimeActivity 9:00 AM Introductions, Review of Agenda 9:15 AMIntroduction to the LTER Controlled Vocabulary – Past and Future 10:00 AMBreak 10:15 AM Discussion: Locating LTER Data – around-the-room experiences  What are your experiences with finding LTER data?  What would be most helpful in finding data in the future?  Review of “use cases” 11:15 AMTour of draft LTER Controlled Vocabulary 12-noonLunch 1:30 PM Feedback to entire group on things in the controlled vocabulary that need improvement  Things to be removed  Things to be added  Things to be reorganized 2:30 PMBreak 2:45 PM Discussion of specific issues  Core areas  Are related-terms needed, or is a hierarchy sufficient?  Management of the vocabulary – role of researchers 3:00 PM Next Steps How do we engage larger LTER community? How much, and what sort of engagement is needed? 4:00 PMAdjourn

4  Eclectic use of terms to used for discovering LTER data makes it difficult to perform reliable or efficient searches  Often several terms for one concept  One site uses CO2 another Carbon Dioxide, another Carbon- dioxide  Carbon to Nitrogen Ratio, C:N, C:N Ratio, Carbon-to-nitrogen Ratio  No way to relate broader terms with narrower terms  Searching on “Landscape Change” doesn’t find data sets related to “desertification” even though desertification is a kind of landscape change

5 SourceNumber of Terms Number used at 5 or more sites Most Frequently used EML Keywords*2,71186LTER (1002), Temperature (701) EML Titles2,480921And (768), Data (394), LTER (350) DTOC Keywords*2,774103ARC (1645), Temperature (732) Bibliography Titles13,5381,855Of (12,611), Forest (2,050) * Allows multi-word terms Only 3.2%!

6  We started off by surveying what terms were already being used in a variety of LTER documents  Our goal was to see if there were any existing lexical resources that we could simply adopt

7 58% of LTER terms were not found in the NBII Thesaurus Results suggested that we needed to develop our own resource

8  Identify a list of preferred terms that would be used by sites in creating metadata documents  Focus on LTER-wide searches  Want to facilitate cross-site synthesis  People searching LTER Metacat rather than individual sites are interested in relevant data from multiple sites  Want to hit the “sweet spot” for the number of terms  Too many terms make keywording documents difficult, and results in searches with too few datasets  Too few terms make it hard to locate usably small numbers of datasets

9  Assembled list of words already in LTER Metadata (EML documents)  Selected using criteria:  Keywords shared with GCMD and NBII, or  Keywords used at more than one LTER site  Reviewed by Information Managers  Removals and additions were suggested  Edited based on voting  Created a Draft set of Taxonomys  Included some additions and deletions

10  Goal: Improve Searching & Browsing  Reliability (of all the suitable target documents, what percentage did you find)  Efficiency (of the documents your search returned, what percentage were suitable)  A list alone is not sufficient to support browsing and sophisticated searching of data – more structure is needed

11 ListSynonym RingTaxonomyThesaurusOntology = = = = Complexity Multiple taxonomys are a Polytaxonomy

12  Relationships should be independent of context  Must pass “Some-not-all test”  Each taxonomy should include only one type of entity (listed in Z39.19 section 6.3.2)  Things and their physical parts (birds, trees, leaves)  Materials (wood, nitrogen, sand)  Activities or processes (acidification, production)  Events or occurrences (germination, death)  Properties or states of persons, things, materials or actions (age, speed, nitrogen content)  Disciplines or subject fields (ecology, ornithology)  Units of measurement (m, km, miles)  Unique entities (LTER,HJ Andrews Forest)  You can get into trouble if you start “mixing and matching” things within a single taxonomy!

13 GoodBad Forests Boreal Forest Hardwood Forest Grassland Tallgrass Praire Tundra Forests Fire Ecology OK – these are all the same type of entity – all are THINGS Mixing THINGS and PROCESSES and DISCIPLINES Rodents Mice Rats Desert Plants Cacti Grasses OK – Is not dependent on context. Mice and rats are ALWAYS rodents Problem: Context dependent, not all cacti or grasses are desert plants. Some occur in other systems. Fails “Some-not-all” test.

14  The VOCAB Working Group has created a draft set of 10 taxonomys containing 713 terms  Includes additional “broader” terms needed for grouping  Includes synonyms (non-preferred terms)  Some terms originally in the list have been removed because the were perceived to be too ambiguous or context-sensitive to be useful for the purposes of searching or browsing  E.g., “Aboveground”  Some “related” terms have also been identified

15  In 2010 a request for information was forwarded to the LTER Executive Board:  “ The Information Management Committee has studied how keywords are used at LTER sites, how LTER keywords relate to external lexographical resources, and compiled a draft keyword. We request guidance from the LTER Executive Board on how a controlled vocabulary might be implemented within the context of LTER to improve the reliability of data searches. “  The EB generally endorsed the idea of a LTER Controlled Vocabulary, and agreed to help have scientists participate in vetting the list and deciding on next steps (THIS WORKSHOP)

16  Permit use of a browse interface  Make searches more sophisticated  See “Use case” for searching  search includes synonyms plus narrower terms and/or related terms  Develop tools to help in adding keywords to LTER metadata documents  Prototype versions of a couple are already available  See Keywording “Use Case”

17  What are your experiences with finding LTER data?  What would be most helpful in finding data in the future?  Review of “Use Cases”

18  Evaluate the utility of the draft polytaxonomy  Is it better than the existing LTER Metacat interfaces?  Are there large changes that need to be made?  Elimination of specific taxonomys?  Creation of new taxonomys?  Addition of related terms to make a thesaurus?  Are there small changes needed?  Removal or replacement of terms

19  Improvement of existing documents  Review existing keywords and change to preferred forms  Note: even without doing this the synonym ring will help improve searching and browsing  Use preferred terms for new documents  Ideally at least one term from each of the relevant taxonomys  Note: addition of new terms to the list, should require review of all existing documents to see if they should be added – so term additions should be rare  Changes in taxonomys and term relationships do not require re-keywording of existing documents

20 TimeActivity 9:00 AM Introductions, Review of Agenda 9:15 AMIntroduction to the LTER Controlled Vocabulary – Past and Future 10:00 AMBreak 10:15 AM Discussion: Locating LTER Data – around-the-room experiences  What are your experiences with finding LTER data?  What would be most helpful in finding data in the future?  Review of “use cases” 11:15 AMTour of draft LTER Controlled Vocabulary 12-noonLunch 1:30 PM Feedback to entire group on things in the controlled vocabulary that need improvement  Things to be removed  Things to be added  Things to be reorganized 2:30 PMBreak 2:45 PM Discussion of specific issues  Core areas  Are related-terms needed, or is a hierarchy sufficient?  Management of the vocabulary – role of researchers 3:00 PM Next Steps How do we engage larger LTER community? How much, and what sort of engagement is needed? 4:00 PMAdjourn

21  Todd & Margaret  Focus on INTERFACE  Ways to present the data  Allow “query within result set”  Intersect query sets  Group options – by site, by time  side by side comparisons  Be able find where different types of data intersect  Can be very difficult due to missing data etc.  Problem extends beyond query interface  Interface needs to be a higher priority – sooner rather than later  Recommendation to IMC/NISAC/EB

22  Rodger and Kristin  Highest level of hierarchy  Found some things to change or add “root production”, “belowground productivity”  Were generally happy with overall organization  Need system for adding new keywords – this is just a start  Intrigued by theory and where we go from here  How does it matter what is in one place or another?  Want to make sure things are well-organized….  Data vs research question  Does not matter where it is when adding to keyword list  Need to have “best practices” for adding keywords  How will that effect sites?  How many data sets have no preferred terms?

23  At least one word from list  At least one from at least 5 of the 10 taxonomys  Signature datasets should be flagged with “signature dataset” tag  Should include Core area(s)

24  Core area - Problems with definitions  Some datasets are either none, or all core areas  Weather data  Change entities to core areas?  People will want to look for this  Would not have hierarchy?  That would be OK – can have related terms  Could link to signature datasets  Need “signature dataset” keyword – used to weight  Or prioritize signature datasets for adding preferred terms  Treat as unique:  Primary Production (core area)  Data can be applied to MANY core areas - won’t map  e.g. Climate  Try adding core area taxonomy and then add core areas and related terms?????  May not be needed or appropriate – we are asking the data catalog to do too much – need catalog of research topics

25  Want to search for signature datasets at top level of the hierarchy  Needs to be one click away

26  Julia and Don  Would be interesting to tally the number of hits for each keyword for each site  Tally of number of datasets for each site  GIS should be preferred term  Can mean Geographical Information Science

27  Atmospheric processes cross listed under hydrologic properties  Evapotranspiration should be above transpiration and evaporation  Snow not under precipitation  Geographical Properties ->Spatial Properties  Move imagery under that with satellite and photos under that – depricate landsat  Methods – field, spatial, lab, analytical subcategories  Also cores, dendrometers etc. tools could go under this  Entities  For detailed ones, tried to find other homes  Diseases to disease and move under bio processes  Levels of organization for communities, populations, species  Are these useful terms? How often used  Biomes instead of Ecosystems

28  Core areas  Do we need a special taxonomy for core areas?  Are related-terms needed, or is a polytaxonmy (hierarchy) sufficient?  Management of the vocabulary – role of researchers?  Preferred terms – are all really preferred?  E.g., Permanent forest plots

29  How do we engage larger LTER community?  How much, and what sort of engagement is needed?  Requests we should make to the EB or IMC?  Managing the controlled vocabulary  What technology development is needed, and who should pursue it?

30  Anyone can propose adding, editing, deleting or moving terms within the hierarchy, with justification.  Proposals would be evaluated by the Controlled Vocabulary Working Group according to the following criteria:  The proposed terms should provide clear utility for searching and browsing, and not introduce ambiguity  The proposed terms should be suitable for inclusion (e.g., not locations or specific taxonomic identifiers)  Proposed terms should not be redundant with existing term(s) already in the vocabulary  Terms and their proposed places in taxonomys or thesauri should conform in form with NISO Z39.19 2005 and successor documents (e.g., sections 6.5.1, 8.3)

31  Best Practices for adding keywords  Preferred terms (and preferred preferred terms )  Presentation to PIs  Statistics on numbers of hits  Add workshop participants to VOCAB  Put in supplement proposal for development of search interface  Write it up now – Shovel Ready!  Like MALS – need to have all sites sign up with letters of endorsement


Download ppt "“Scientists seeking data should be able to efficiently and reliably locate LTER datasets through searching, browsing …“  Get feedback on general direction."

Similar presentations


Ads by Google