Presentation is loading. Please wait.

Presentation is loading. Please wait.

EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007.

Similar presentations


Presentation on theme: "EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007."— Presentation transcript:

1 EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

2 Discussion Topics… Project Background NBII Thesaurus GEMET Thesaurus Prototype Client Sample Query Results Including no, 1, or both thesauri Overall Findings

3 Biocomplexity Thesaurus http://thesaurus.nbii.gov http://thesaurus.nbii.gov

4 EIONET GEMET Thesaurus http://www.eionet.europa.eu/gemet/webservices?langcode=en

5 NBII/EIONET Thesaurus Web-service 1 Background - collaboration through Ecoinformatics TWG Primary Goal – access distributed multi-lingual thesauri Results – SKOS web-service & client

6 Latest Client & Service capabilities Access to both NBII and GEMET Single language capability Results are provided by source All documentation is completed http://thesaurus.nbii.gov

7 Demo Client

8 Initial Challenges Identified Thesaurus scope, intent, purpose, and coverage is different NBII = sub-discipline of environment Endangered species Broader Terms:Species, Special status species, TaxaSpeciesSpecial status species Taxa EIOINET = broad environment Broader Terms:environmental protectionenvironmental protection

9 Current State Users Most aren’t aware of the underlying vocabulary Vocabulary are often unique to organization and more for “categorization” than retrieval Goal Include all Vocabularies and let Search Engine handle results

10 Demonstration Search Retrieval Created a demonstration datasets NBII Cataloged Resources ~30,000 web-sites, publications, images, maps, etc. Xml structured data – controlled subject NBII FGDC Metadata ~22,000 resources on research studies 150-200 elements Semi-structured with no controlled vocabulary

11 NBII Catalog Records Based on the Dublin Core + 18 elements, of which 10 are mandatory In place since 2002 Used by distributed content managers

12 NBII Metadata CH

13 Process Added thesaurus capabilities to Development Search Engine for: NBII Thesaurus EIONET GEMET Thesaurus Used BT, RT, NT relationships & weighting Performed sample queries within the test repositories for: No thesaurus GEMET only aided searching NBII only aided searching GEMET+NBII aided searching (X)

14 Test Repository 1  NBII Resource Catalog (Dublin Core)

15 No Thesauri – “invasive species”

16 NBII Thesaurus – “invasive species”

17 GEMET Thesaurus – “invasive species”

18 No Thesauri – “Endangered Species”

19 NBII Thesaurus – “endangered species”

20 GEMET Only – “endangered species”

21 No Thesaurus – “rare species”

22 NBII Thesaurus – “rare species”

23 GEMET Thesaurus – “rare species”

24 GEMET Thesaurus – “rare species” (expanded degrees of relevance)

25 No Thesauri – “protected species”

26 NBII Thesaurus – “protected species”

27 GEMET Thesaurus – “protected species”

28 Results – NBII Catalog Resources termNoneNBIIGEMET “invasive species” 2487108022487 “endangered species” 161235321619 “rare species” “rare species” (expanded) 2497186290 5847 “”protected species” 20323451664

29 Results – NBII Resource Catalog

30 Test Repository 2  NBII FGDC Metadata

31 Sample Queries – No vocabularies Metadata CH “ invasive species”

32 Sample Queries – NBII only Metadata CH “ invasive species”

33 Sample Queries – GEMET only Metadata CH “ invasive species”

34 Sample Queries – No vocabularies Metadata CH “endangered species”

35 Sample Queries – NBII only Metadata CH “endangered species”

36 Sample Queries – GEMET only Metadata CH “ endangered species”

37 No Thesauri – Metadata CH “rare species”

38 NBII Thesaurus – Metadata CH “rare species”

39 GEMET Thesaurus – Metadata CH “rare species”

40 Sample Queries – No vocabularies Metadata CH “protected species”

41 Sample Queries – NBII only Metadata CH “ protected species”

42 Sample Queries – GEMET only Metadata CH “ protected species”

43 Results – FGDC Metadata termNoneNBIIGEMET “invasive species” 302 7884302 “endangered species” 100826901019 “rare species”59425964 “protected species” 1121521011

44 Results – NBII Resource Catalog

45 Overall Results General Findings Assumption that a Thesaurus improves “number” of results is valid Degree does vary by the term and mappings Since users search from a # of perspectives, backgrounds, expertise, multiple thesaurus do improve the number of results

46 Overall Results Using only GEMET Terminology Terms not included in the NBII thesaurus that were in GEMET improved search results GEMET strength of broad coverage aided searches In General for the Metadata repository Results varied somewhat, but often same top 10 results

47 Overall Results General Findings With “No thesaurus” test results produced poorer #1 results Thesaurus results for the structured set ordered results list more differently than unstructured set (Metadata)

48 Issues “integrating” multi-scope and purpose thesauri presents challenges: Can’t turn the effort into a thesaurus project Degrees of relevance of terms is an issue Concept matching or different intent Differing classification (RT vs. NT) across thesauri Differing “weighting” algorithms

49 Further Study Options 1.) Take multiple thesauri “as is” 2.) Do some “attempted” concept matching i.e. “endangered animal species” – “endangered animal” 3.) If not match is present, add term and relationship as is 4.) Obtain terms from XMDR

50 Further Study Options – cont. Follow-up with additional repositories Repeat with other query terms Re-look at weighting algorithms Do queries with subset of terms Repeat with completely integrated thesaurus as compared to>>>>>>> Repeat queries with machine integration Complete By June

51 Questions, Comments,

52 GEMET Control file endangered species,category of endangered species[.2],endangered animal species[0.8],endangered plant species[0.8] protected species,category of endangered species[0.2],endangered species [0.2] rare species,category of endangered species[0.2],extinct species[0.2],vanished species[0.2]


Download ppt "EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007."

Similar presentations


Ads by Google