EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007.

Slides:



Advertisements
Similar presentations
Ontology Assessment – Proposed Framework and Methodology.
Advertisements

Dublin Core for Digital Video: Overview of the ViDe Application Profile.
Geoscience Information Network Stephen M Richard Arizona Geological Survey National Geothermal Data System.
Metadata and Search at Boeing Julie Martin Library & Learning Center Services
GEMET human and machine readable interfaces WIKTIONARY Stefan Jensen, EEA, Copenhagen.
Diana Hernandez Integrating the catalogue of Mexican biota: different approaches for different client perspectives.
Larry Fitzwater and Linda Spencer September 29, 1999 SDC JE-1032.
UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.
6. Applying metadata standards: Controlled vocabularies and quality issues Metadata Standards and Applications Workshop.
Leveraging Your Taxonomy to Increase User Productivity MAIQuery and TM Navtree.
Page 1 June 2, 2015 Optimizing for Search Making it easier for users to find your content.
SKOS and Other W3C Vocabulary Related Activities Gail Hodge Information International Assoc. NKOS Workshop Denver, CO June 10, 2005.
Environmental Terminology System and Services (ETSS) June 2007.
Using Metadata in CONTENTdm Diana Brooking and Allen Maberry Metadata Implementation Group, Univ. of Washington Crossing Organizational Boundaries Oct.
Thesaurus Design and Development
Vocabulary Markup Language (Voc-ML) Project Joseph A. Busch Content Intelligence Evangelist Interwoven.
Evolution of NBII Search-Based Technologies Oct 24, 2002 Donna Roy USGS Center for Biological Informatics.
NKOS Thesaurus Registry Update Gail Hodge Information International Associates, Inc. NKOS Workshop, August 14, 1999.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
U of R eXtensible Catalog Team MetaCat. Problem Domain.
A Registry for controlled vocabularies at the Library of Congress
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
Development Principles PHIN advances the use of standard vocabularies by working with Standards Development Organizations to ensure that public health.
Thesaurusmanagement Quickstart Introduction. What are controlled vocabularies? organized arrangement of words and phrases used to index content and/or.
Publishing Digital Content to a LOR Publishing Digital Content to a LOR 1.
1/ 27 The Agriculture Ontology Service Initiative APAN Conference 20 July 2006 Singapore.
Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment.
Digital Asset Management at the University of Washington Libraries: Teamwork and Technology Greg Zick, Professor, Electrical Engineering Director of Center.
Using Taxonomies Effectively in the Organization v. 2.0 KnowledgeNets 2001 Vivian Bliss Microsoft Knowledge Network Group
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
XML DTDs and other Alternatives: Vocabulary Markup Language (Voc-ML) Project & Friends Joseph A. Busch Director, Solutions Architecture NetLab and Friends.
International/Interagency Collaboration – Information Technology for Environmental Information and Environmental Data Exchange Network Thomas F. Lahr,
Developing an Integrated Thesaurus for the Cornell Genomics Initiative Digital Library Jonathan Corson-Rikert Mann Library Cornell University Agricultural.
The Agricultural Ontology Service (AOS) A Tool for Facilitating Access to Knowledge AGRIS/CARIS and Documentation Group Library and Documentation Systems.
Using Taxonomies Effectively in the Organization KMWorld 2000 Mike Crandall Microsoft Information Services
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
EPA’s Environmental Terminology System and Services (ETSS) Michael Pendleton Data Standards Branch, EPA/OEI Ecoiformatics Technical Collaborative Indicators.
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
The UNESCO Thesaurus Meeting for Managers of UNESCO Documentation Networks Meron Ewketu UNESCO Library June
Coastal Atlas Interoperability - Ontologies (Advanced topics that we did not get to in detail) Luis Bermudez Stephanie Watson Marine Metadata Interoperability.
Introduction to metadata
EPA Enterprise Data Architecture Metadata Framework Assessment Kevin J. Kirby, Enterprise Data Architect EPA Enterprise Architecture Team
The CSA/NBII Biocomplexity Thesaurus: Current Initiatives, Future Directions CENDI Terminologies Workshop Washington, DC 16 September 2004 Lisa Zolly NBII.
IABIN Visioning Meeting Washington, D.C. October 2008 Mike Frame.
Thesauri usage in information retrieval systems: example of LISTA and ERIC database thesaurus Kristina Feldvari Departmant of Information Sciences, Faculty.
GEMET GEneral Multilingual Environmental Thesaurus leading the way to federated terminologies Stefan Jensen, Head of information services group with input.
Controlled Vocabulary Giri Palanisamy Eda C. Melendez-Colom Corinna Gries Duane Costa John Porter.
1 Technical Projects Workgroup Report to Plenary Ecoinformatics International Technical Collaboration April 10, 2008 Research Triangle Park, North Carolina,
Metadata and OAI DLESE OAI Workshop April 29-30, 2002 Katy Ginger Presentation available at:
Automatic Metadata Discovery from Non-cooperative Digital Libraries By Ron Shi, Kurt Maly, Mohammad Zubair IADIS International Conference May 2003.
U.S. Department of the Interior U.S. Geological Survey The Biological Data Profile Extending the FGDC Metadata Standard Kirsten Larsen.
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
SDC JE-2031 Linda Spencer U.S. EPA January 19, 2000 Open Forum on Metadata Registries Santa Fe, NM.
ALA Annual Meeting Claire Cocco Global Product Manager CONTENTdm Users Group June 30th, 2008.
Evidence from Metadata INST 734 Doug Oard Module 8.
Terminology Components for Ecoinformatics Sharing Gail Hodge Consultant to USGS BIO/NBII Information International Associates, Inc. 28 January 2004 science.
Margherita Sini, FAO 1 / 19 Using RSS to Share KOS Metadata Margherita Sini, Gauri Salokhe IV Ecoterm Vienna, Austria April.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
UNEP Terminology Workshop - Geneva, April 15, Environmental Terminology & Thesaurus Workshop UN Environment Programme Regional Office of Europe.
IABIN Standards & Protocols Presented by: Mike Frame, USGS NBII Developed by Darrell McClarty IABIN Regional Coordinator.
Automatic vs manual indexing Focus on subject indexing Not a relevant question? –Wherever full text is available, automatic methods predominate Simple.
Semantics and the EPA System of Registries Gail Hodge IIa/ Consultant to the U.S. Environmental Protection Agency 18 April 2007.
National Biological Information Infrastructure (NBII) BioBot & IABIN BioBot Ben Wheeler USGS Biological Informatics Office January 23 rd, 2007.
Update on Ecoinformatics Technical Working Group Activities Larry Fitzwater Computer Scientist US Environmental Protection Agency Rome, Italy – 17 May.
NBII Ecoinformatics Technical Working Group Mike Frame Berkeley, CA Oct 2006.
The Interageny/International Ecoinformatics Cooperation and Applied knowledge management technologies in EEA services (with Antonio de Marinis) Stefan.
The Agricultural Ontology Server (AOS) A Tool for Facilitating Access to Knowledge AGRIS/CARIS and Documentation Group Food and Agriculture Organization.
Extended Metadata Registries and Semantics (Part 2: Implementation) Karlo Berket Ecoterm IV Environmental Terminology Workshop April 18, 2007 Diplomatic.
Geospatial metadata Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
Christian Ansorge Arona, 09/04/2014
Presentation transcript:

EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

Discussion Topics… Project Background NBII Thesaurus GEMET Thesaurus Prototype Client Sample Query Results Including no, 1, or both thesauri Overall Findings

Biocomplexity Thesaurus

EIONET GEMET Thesaurus

NBII/EIONET Thesaurus Web-service 1 Background - collaboration through Ecoinformatics TWG Primary Goal – access distributed multi-lingual thesauri Results – SKOS web-service & client

Latest Client & Service capabilities Access to both NBII and GEMET Single language capability Results are provided by source All documentation is completed

Demo Client

Initial Challenges Identified Thesaurus scope, intent, purpose, and coverage is different NBII = sub-discipline of environment Endangered species Broader Terms:Species, Special status species, TaxaSpeciesSpecial status species Taxa EIOINET = broad environment Broader Terms:environmental protectionenvironmental protection

Current State Users Most aren’t aware of the underlying vocabulary Vocabulary are often unique to organization and more for “categorization” than retrieval Goal Include all Vocabularies and let Search Engine handle results

Demonstration Search Retrieval Created a demonstration datasets NBII Cataloged Resources ~30,000 web-sites, publications, images, maps, etc. Xml structured data – controlled subject NBII FGDC Metadata ~22,000 resources on research studies elements Semi-structured with no controlled vocabulary

NBII Catalog Records Based on the Dublin Core + 18 elements, of which 10 are mandatory In place since 2002 Used by distributed content managers

NBII Metadata CH

Process Added thesaurus capabilities to Development Search Engine for: NBII Thesaurus EIONET GEMET Thesaurus Used BT, RT, NT relationships & weighting Performed sample queries within the test repositories for: No thesaurus GEMET only aided searching NBII only aided searching GEMET+NBII aided searching (X)

Test Repository 1  NBII Resource Catalog (Dublin Core)

No Thesauri – “invasive species”

NBII Thesaurus – “invasive species”

GEMET Thesaurus – “invasive species”

No Thesauri – “Endangered Species”

NBII Thesaurus – “endangered species”

GEMET Only – “endangered species”

No Thesaurus – “rare species”

NBII Thesaurus – “rare species”

GEMET Thesaurus – “rare species”

GEMET Thesaurus – “rare species” (expanded degrees of relevance)

No Thesauri – “protected species”

NBII Thesaurus – “protected species”

GEMET Thesaurus – “protected species”

Results – NBII Catalog Resources termNoneNBIIGEMET “invasive species” “endangered species” “rare species” “rare species” (expanded) “”protected species”

Results – NBII Resource Catalog

Test Repository 2  NBII FGDC Metadata

Sample Queries – No vocabularies Metadata CH “ invasive species”

Sample Queries – NBII only Metadata CH “ invasive species”

Sample Queries – GEMET only Metadata CH “ invasive species”

Sample Queries – No vocabularies Metadata CH “endangered species”

Sample Queries – NBII only Metadata CH “endangered species”

Sample Queries – GEMET only Metadata CH “ endangered species”

No Thesauri – Metadata CH “rare species”

NBII Thesaurus – Metadata CH “rare species”

GEMET Thesaurus – Metadata CH “rare species”

Sample Queries – No vocabularies Metadata CH “protected species”

Sample Queries – NBII only Metadata CH “ protected species”

Sample Queries – GEMET only Metadata CH “ protected species”

Results – FGDC Metadata termNoneNBIIGEMET “invasive species” “endangered species” “rare species” “protected species”

Results – NBII Resource Catalog

Overall Results General Findings Assumption that a Thesaurus improves “number” of results is valid Degree does vary by the term and mappings Since users search from a # of perspectives, backgrounds, expertise, multiple thesaurus do improve the number of results

Overall Results Using only GEMET Terminology Terms not included in the NBII thesaurus that were in GEMET improved search results GEMET strength of broad coverage aided searches In General for the Metadata repository Results varied somewhat, but often same top 10 results

Overall Results General Findings With “No thesaurus” test results produced poorer #1 results Thesaurus results for the structured set ordered results list more differently than unstructured set (Metadata)

Issues “integrating” multi-scope and purpose thesauri presents challenges: Can’t turn the effort into a thesaurus project Degrees of relevance of terms is an issue Concept matching or different intent Differing classification (RT vs. NT) across thesauri Differing “weighting” algorithms

Further Study Options 1.) Take multiple thesauri “as is” 2.) Do some “attempted” concept matching i.e. “endangered animal species” – “endangered animal” 3.) If not match is present, add term and relationship as is 4.) Obtain terms from XMDR

Further Study Options – cont. Follow-up with additional repositories Repeat with other query terms Re-look at weighting algorithms Do queries with subset of terms Repeat with completely integrated thesaurus as compared to>>>>>>> Repeat queries with machine integration Complete By June

Questions, Comments,

GEMET Control file endangered species,category of endangered species[.2],endangered animal species[0.8],endangered plant species[0.8] protected species,category of endangered species[0.2],endangered species [0.2] rare species,category of endangered species[0.2],extinct species[0.2],vanished species[0.2]