Presentation is loading. Please wait.

Presentation is loading. Please wait.

Controlled Vocabulary Working Group Activities 2005-2006.

Similar presentations


Presentation on theme: "Controlled Vocabulary Working Group Activities 2005-2006."— Presentation transcript:

1 Controlled Vocabulary Working Group Activities 2005-2006

2 The Problem ► Inconsistent, disjunct and sparse keywords negatively impact data discovery 72.2% of all keywords are used at only a single LTER site 90% of all keywords are used at 4 or fewer LTER sites

3 The Problem ► Good “Browse” interfaces require some organization of keywords ► E.g. BIOSPHERE  PLANTS ► VASCULAR PLANTS  OAK ► NON-VASCULAR PLANTS  ANIMALS ► VERTEBRATES ► INVERTEBRATES

4 Possible Solutions 1. Create an LTER Controlled Vocabulary or Thesaurus or Ontology  Advantages: ► Absolute control on contents ► Ability to customize to meet LTER needs  Disadvantages: ► Development will be time and resource expensive ► Such development can be a highly technical field requiring specialists

5 Possible Solutions 2. Adopt an existing controlled vocabulary, thesaurus or ontology  Advantages: ► Minimal cost to LTER ► Aids in linking LTER to a larger world of data systems  Disadvantages: ► Lack of control ► Existing systems may not be suitable for LTER use  Lack desirable terms

6 2005 LTER IM Meeting ► A the 2005 IM meeting we decided that the best option to explore was Option 2 (use an existing resource)  Rationale: ► Could potentially save lots of time, trouble and money! ► Helps forge links with other groups ► Could make LTER systems interact better with other similar systems

7 Plan of Action

8 General Steps ►  Identify existing resources that LTER could use  NBII Thesaurus  GEMET (GEneral Multilingual Environmental Thesaurus)  Global Change Master Directory (GCMD)  SEEK Ontology ► Evaluate the usability of existing systems ► Develop tools and relationships needed to exploit and improve the system(s) of choice

9 Assembling Resources ► assemble list of existing keywords  EML ► Keywords  ► title words  ► attribute definition words  ► taxonomy keywords  ITIS SPIRE web service from UMD.BaltCo....  DTOC   publications titles, keywords and abstracts   Site keyword lists - e.g., AND-LTER   need to count word and site frequency and number of keywords per document

10 Some Statistics Source Number of Terms Number used at 5 or more sites Most Frequently used EML Keywords 2,71186 LTER (1002), Temperature (701) EML Titles 2,480921 And (768), Data (394), LTER (350) EML Attributes 6,318436 The (4,207), Data(1,621), Carbon(328) DTOC Keywords 2,774103 ARC (1645), Temperature (732) Bibliography Titles 13,5381,855 Of (12,611), Forest (2,050)

11 Consolidated List ► The consolidated list includes 21,153 words or terms along with  Number of “lists” on which it appeared (max 5)  Number of sites and uses from each list  Max and Min number of sites using (0-26)  Max and Min number of uses (0-12,611)  Is it a multi-word term?

12 Ranking/Rating Words ► Terms were sorted by:  Number of Lists  Max. number of sites on any single list  Min. number of sites on any single list  Number of uses ► The top 1010 terms were then rated as “useful” (U), “marginal/not sure” (M) or “not useful” (N) by volunteers  Needed for abbreviations e.g., CO2 and words that are too general (e.g., “Above”, “Total”)  The resulting list was then additionally sorted by a term score T=((U*1)+(M*0)+(N*-1))/(U+M+N)  Always “Useful”=1.00, Always “Not Useful”= -1.00

13 Top of the list

14 Bottom of the list

15 Preliminary Evaluation ► Volunteers have used highly ranked words from the “list of 1000” to test retrieval from various thesauri  So far NBII seems to be preferred, but we need additional testers ► Inigo San Gil has been working on automated queries of the of NBII Thesaurus

16 Tasks for this meeting ► Once we have a controlled vocabulary, how shall we use it? What tools do we need to develop? ► What additional testing/evaluation is required (bring in PI’s?)? What institutional relationships need to be pursued? What actions do we need to take to improve the usability of resources for LTER use?


Download ppt "Controlled Vocabulary Working Group Activities 2005-2006."

Similar presentations


Ads by Google