Controlled Vocabulary Working Group Activities 2005-2007.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office.
WELCOME to the LTER Data Co-op with PASTA (Provenance Aware Synthesis Tracking Architecture) All Scientists Meeting 2012 Your source for LTER data.
User Interface Structure Design
LTER IM Articulation Work: Developing Community Web Recommendations Nicole Kaplan (SGS), Karen Baker (CCE, PAL), Barbara Benson (NTL), Eda Melendez-Colom.
Aligning Business Needs, Processes and more within the SharePoint platform ITP116, CIO116, PM116, IA116.
2009 Mid–Term Review El Verde Field Station June 4, 2009.
1 Transportation Librarians Roundtable Transportation Research Thesaurus: WSDOT Use Cases February 14, 2008 Andy Everett Metadata Repository Administrator.
Managing Data Resources
1 How Semantic Technology Can Improve the NextGen Air Transportation System Information Sharing Environment 4th Annual Spatial Ontology Community of Practice.
Environmental Terminology System and Services (ETSS) June 2007.
Information Technology in Organizations
Course Instructor: Aisha Azeem
ÆKOS: A new paradigm for discovery and access to complex ecological data David Turner, Paul Chinnick, Andrew Graham, Matt Schneider, Craig Walker Logos.
Long-Term Ecological Research working_groups/controlled_vocabulary Working Group: “Synthesis through data.
CIS 451: eCommerce Application Development Dr. Ralph D. Westfall January, 2009.
Domain Modelling the upper levels of the eframework Yvonne Howard Hilary Dexter David Millard Learning Societies LabDistributed Learning, University of.
COHSE Informed WWW Link Navigation Using Ontologies Prof. Carole Goble, Sean Bechhofer Dr. Leslie Carr, Prof. Wendy Hall, Prof. David De Roure, Steve Harris,
Improving Data Discovery in Metadata Repositories through Semantic Search Chad Berkley 1, Shawn Bowers 2, Matt Jones 1, Mark Schildhauer 1, Josh Madin.
Methods for Data Discovery – Portals Portal facilitates access to and also assimilation of data Portal is not simply a web site: it offers services such.
 Workshops: March & May 2011 and lots of VTCs! Details at:
Using Taxonomies Effectively in the Organization v. 2.0 KnowledgeNets 2001 Vivian Bliss Microsoft Knowledge Network Group
Introduction to OBIS-USA Biological Data, Applications, & Relationships March 14, 2011.
Controlled Vocabulary Working Group PRESENTED BY JOHN PORTER.
LTER IMC Meeting Sept Past Activities Created list of about ~650 terms based on widely-used LTER EML Keywords Autocomplete search aid added to.
“Scientists seeking data should be able to efficiently and reliably locate LTER datasets through searching, browsing …“  Get feedback on general direction.
ZLOT Prototype Assessment John Carlo Bertot Associate Professor School of Information Studies Florida State University.
Domain Modeling In FREMA David Millard Yvonne Howard Hugh Davis Gary Wills Lester Gilbert Learning Societies Lab University of Southampton, UK.
1 Collection Specific Vocabularies March Terminology CB - abbreviation for collection builder CV - abbreviation for controlled vocabulary.
Controlled Vocabulary Working Group Virtual Water Cooler Session April 6-7, 2009 Moderator: John Porter rm.action?confKey=jhp7e.
The Agricultural Ontology Service (AOS) A Tool for Facilitating Access to Knowledge AGRIS/CARIS and Documentation Group Library and Documentation Systems.
Using Taxonomies Effectively in the Organization KMWorld 2000 Mike Crandall Microsoft Information Services
SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.
EPA’s Environmental Terminology System and Services (ETSS) Michael Pendleton Data Standards Branch, EPA/OEI Ecoiformatics Technical Collaborative Indicators.
Design Management: a Collabortive Design Solution ECMFA 2013 Montpellier, France Maged Elaasar (Presenter) Senior Software Engineer, IBM
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Electronic Scriptorium, Ltd. AIIM Minnesota Chapter Metadata and Taxonomy Presentation Copyright Electronic Scriptorium, Ltd. All rights reserved, 1991.
Sharing Design Knowledge through the IMS Learning Design Specification Dawn Howard-Rose Kevin Harrigan David Bean University of Waterloo McGraw-Hill Ryerson.
Are Standards Really Standards Any More? Mélanie F. Meaux NASA / GCMD In response to Wyn Cudlip with regards to an IDN profile of ISO …
 Finalize VOCAB “Terms of Reference”  Define use cases for the keyword database and its development  Develop procedures for capturing and managing.
EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007.
Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.
Controlled Vocabulary Giri Palanisamy Eda C. Melendez-Colom Corinna Gries Duane Costa John Porter.
Advanced Semantics and Search Beyond Tag Clouds and Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
The US Long Term Ecological Research (LTER) Network: Site and Network Level Information Management Kristin Vanderbilt Department of Biology University.
Jane Reid, AMSc IRIC, QMUL, 30/10/01 1 Information seeking Information-seeking models Search strategies Search tactics.
U.S. Department of the Interior U.S. Geological Survey The Biological Data Profile Extending the FGDC Metadata Standard Kirsten Larsen.
Public Access and Spatial Metadata Values: Semantic Network Services Response to EU Directives Maria Rüther Federal Environment Agency,
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
LTER Science 2050: Challenges, Constraints and Opportunities Bill Michener Professor and DataONE Project Director University of New Mexico 12 September.
LTER IM Meeting 2008 – Benson, Boose, Bohm, Gries, Gu, Kaplan, Koskela, Laney, Porter, Remillard, Sheldon and others.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
Terminology Components for Ecoinformatics Sharing Gail Hodge Consultant to USGS BIO/NBII Information International Associates, Inc. 28 January 2004 science.
A System for Automatic Personalized Tracking of Scientific Literature on the Web Tzachi Perlstein Yael Nir.
Controlled Vocabulary Working Group Activities
Santi Thompson - Metadata Coordinator Annie Wu - Head, Metadata and Bibliographic Services 2013 TCDL Conference Austin, TX.
MSG Reuse Catalog T.W. van den Berg 7 April 2010.
Semantics and the EPA System of Registries Gail Hodge IIa/ Consultant to the U.S. Environmental Protection Agency 18 April 2007.
Long Term Ecological Research Network Information System LTER EML Status LTER Information Manager’s Meeting 28 July 2004 Mark Servilla
The Agricultural Ontology Server (AOS) A Tool for Facilitating Access to Knowledge AGRIS/CARIS and Documentation Group Food and Agriculture Organization.
Metadata Schema Registries: background and context MEG Registry Workshop, Bath, 21 January 2003 Rachel Heery UKOLN, University of Bath Bath, BA2 7AY UKOLN.
Managing Data Resources File Organization and databases for business information systems.
Network Information System Advisory Committee (NISAC)
Improving Data Discovery Through Semantic Search
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Taxonomies, Lexicons and Organizing Knowledge
LTER Metadata Query Interface – Current Status and Future Challenges
Tax Software Development in a Multi-Jurisdictional Environment
Chapter 1 Database Systems
LTER Controlled Vocabulary Virtual WaterCooler - July, 2018
Presentation transcript:

Controlled Vocabulary Working Group Activities

The Problem ► Inconsistent, disjunct and sparse keywords negatively impact data discovery 72.2% of all keywords are used at only a single LTER site 90% of all keywords are used at 4 or fewer LTER sites

A Minimal Goal ► Improve “Search” interface capabilities by consistent application of a set of key terms ► Challenge: Need:  A list of terms to be applied  A way to apply those terms to existing datasets

A Modest Goal ► Provide a good “Browse” interface for data discovery ► Challenge: Good “Browse” interfaces require some organization of keywords ► E.g. BIOSPHERE  PLANTS ► VASCULAR PLANTS  OAK = QUERCUS ► NON-VASCULAR PLANTS  ANIMALS ► VERTEBRATES ► INVERTEBRATES

An Ambitious Goal ► Ontology-based controlled vocabulary could be used to facilitate semantic mediation of data discovery and data integration activities ► Challenge: A much larger and more stringent set of relationships needs to be defined

Possible Solutions 1. Create an LTER Controlled Vocabulary or Thesaurus or Ontology  Advantages: ► Absolute control on contents ► Ability to customize to meet LTER needs  Disadvantages: ► Development will be time and resource expensive ► Such development can be a highly technical field requiring specialists

Possible Solutions 2. Adopt an existing controlled vocabulary, thesaurus or ontology  Advantages: ► Minimal cost to LTER ► Aids in linking LTER to a larger world of data systems  Disadvantages: ► Lack of control ► Existing systems may not be suitable for LTER use  Lack desirable terms

Strategy for Evaluation ► Identify a list of keywords that are “important” for describing LTER data ► See whether those words are found in existing lexical resources (e.g., NBII, Gemet and GCMD) ► See how “rich” the context provided by the lexical resources are

Assembling Resources ► assemble list of existing keywords  EML ► Keywords (keywords)  ► title words (Tokens or words)  ► attribute definition words (Tokens)  ► taxonomy keywords  ITIS SPIRE web service from UMD.BaltCo....  DTOC (Keywords)   publications titles, keywords and abstracts (Tokens)   Site keyword lists - e.g., AND-LTER (Keywords) 

Some Statistics Source Number of Terms Number used at 5 or more sites Most Frequently used EML Keywords 2,71186 LTER (1002), Temperature (701) EML Titles 2, And (768), Data (394), LTER (350) EML Attributes 6, The (4,207), Data(1,621), Carbon(328) DTOC Keywords 2, ARC (1645), Temperature (732) Bibliography Titles 13,5381,855 Of (12,611), Forest (2,050) Tokens or words Keywords (may be multiple words)

Ranking/Rating Words ► Keywords were sorted by:  Number of Lists (max 5 for tokens, 2 for multi-word keywords)  Max. number of sites on any single list  Min. number of sites on any single list  Number of uses ► The top 1010 words or tokens were then rated as “useful” (U), “marginal/not sure” (M) or “not useful” (N) by volunteers  Needed for abbreviations e.g., CO2 and words that are too general (e.g., “Above”, “Total”)  The resulting list was then additionally sorted by a term score T=((U*1)+(M*0)+(N*-1))/(U+M+N)  Always “Useful”=1.00, Always “Not Useful”= -1.00

Top of the list

2006 ASM ► The group called for a rethinking of the solutions to the challenge and three simultaneous groups worked on defining a plan for working on a controlled vocabulary

Common Elements of Group Reports ► Groups were complementary ► Effort worthwhile ► Complex – start simple, look at other efforts  How many words are useful ► Need to involve scientists in the process  What are they searching on?  Need to be involved at stage of evaluation ► Will evolve – not a one shot thing  Need to be aware of the work this will require ► Top down vs bottom up ► How broad do we want to to

What do we want to do? ► Enable auditing on metacat to track requests ► Educational activities ► Take closer look at NBII, GCMD etc. are doing/have done ► Everybody compiles list of attributes (already done) and categorize  Work with SEEK KR group to represent attributes in ontology template ► Develop site-specific controlled vocabularies (?????) ► Really like to see best practices type document to inform keywords, attribute name, attribute definitions ► Take 1000 words and compare with KNB browse categories  Role of core areas? Alternative conceptual categories ► Create definitions on keywords/synonymization

2007 Activities ► Inigo San Gil developed a harvest tool for the NBII Thesaurus for testing keywords ► Duane Costa updated raw word lists for EML title and keywords ► Started process aimed at auditing MetaCat queries

This Meeting ► Try to reach consensus on goals ► Recraft Action Plan to meet those goals