Federal Controlled Vocabularies Data Architecture Sub-Committee (DAS) April 8, 2010 Brand K. Niemann
Federal Controlled Vocabularies What Are They Examples Discussion
Why a Controlled Vocabulary? Improve effectiveness of information storage and retrieval systems Knowledge workers spend 25-35% of their time searching for information with 50% success 1 The need for vocabulary control arises from two basic features of natural language, namely : Two or more words or terms can be used to represent a single concept Example: salinity/saltiness VHF/Very High Frequency Two or more words that have the same spelling can represent different concepts Example: Mercury (planet) Mercury (metal) Mercury (automobile) Mercury (mythical being) Tutorial Working Council of CIOs, Business Wire, Feb
Controlled Vocabulary Synonym Ring Authority File Taxonomy Thesaurus + Words with same meaning in a given context + Preferred Terms (USE) + Broader (BT) and Narrower Terms (NT) + Related Terms (RT) {BT, NT, USE} List Set of terms arranged in logical way Increasing structural and semantic complexity Why and when to use: Dimension and Context
Controlled Vocabulary: Dimension and Context Synonym Ring Authority File Taxonomy Thesaurus + Words with same meaning in a given context + Preferred Terms (USE) + Broader (BT) and Narrower Terms (NT) + Related Terms (RT) {BT, NT, USE} List Set of terms arranged in logical way Increasing structural and semantic complexity Dimension and Context (not a definitive list) Organizationhuman resources, marketing, accounting, etc. Function Type employment, staffing, training, etc. Subjectwater pollution, soil pollution, air pollution, etc. Identify a document or database for a data catalog (data.gov, data.gov.uk, etc.) Consistent vocabulary for describing database or document dcat and related, Dublin Core, SKOS, FOAF 1 Identify a data ItemVehicle Identification Number (VIN) Uniform Resource Indicator (URI) Identify a data ElementPatient Person First Name ISO/IEC /UDEF Relate a Resource Relate a Vocabulary 1
Controlled Vocabulary Examples Agency --Context --Dimension DOD - Center for Army Lessons Learned Intended Purpose: Organization of equipment supporting the business -- Functio n (Also by Type) NASA - NASA Thesaurus Intended Purpose: Organization of equipment supporting the business --Type EPA - Data Classes and Areas Intended Purpose: Organization of subject areas supporting the business --Subject IRS -IRS Tax Map Intended Purpose: Organization of topics for answering questions ---Subject Synonyms and Word Equivalent Radio - Radio Detection and Ranging Telescope -scope Manned Lunar Space Vehicle - Apollo 11 Mission Waste - Run-off Amended Tax Return X Authority File Radio Detection Finding - (USE) Radio Scope (USE) Telescope Run-off (USE) Waste Employment Income (USE) Wages and Salary Taxonomy + Broader (BT) and Narrower Terms (NT) {BT, NT, USE, UF} ( BT) Radar (by function) ( NT) aircraft radars (NT) airport radar systems (NT) Ground Based Radar (NT) imaging radar (NT) meteorological radar (NT) missile site radar (NT) search radar (NT) terrain analysis radar ( BT) Instruments (NT) Accelerometers (NT) Acoustic Sensors (NT) etc.. (NT) Telescopes (NT) Optical telescopes (NT) Radio telescopes (BT) Substances (NT) Chemicals (NT) Biological (NT) Contaminants (NT) Wastes (NT) Radiation (NT) Commercial Products (BT) Tax Topics (NT) IRS Help (NT) IRS Procedures (NT) Collection (NT) Alternative Filing Methods (NT) General Information (NT) Which Forms to Use Thesaurus + Related Terms (RT) ( BT) Radar (RT) AN/MPQ-65 (RT) AN/MPQ-65 Radar set (RT) navigation (RT) instruments (RT) noise (radar) (RT) radar scattering (BT) Radio Telescopes (RT) Microwaves (High Energy Radio Telescope) (BT) Wastes ( RT) Garbage (RT) Refuse (RT) Biosolids (RT) Pollution Control Facilities (BT) Itemized Deductions (NT) Should I Itemize? (ET) Publication 501 (RT) Tax Topic 551 Publication (ET) Exemptions, Standard Deduction, and Filing Information
Discussion Topics and General Considerations 1.Sources for Federal Controlled Vocabularies considerations 2.Relate vocabularies across domain considerations – Move from levels of concreteness to abstractness – Understand similarity between domains and differences between domains – Require consistency 3.Your input Language Universals and Linguistic Typology, Comrie, 1989 (Survey of World languages for comparison and classification)
Resources Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies 9 Related Efforts 10 Federal CV Efforts 11 Display Types 12 Automated Example 13 Ontology Spectrum 14 Sample Tools 15
Guidelines ANSI/NISO Z Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies – =7cc9b583cb5a62e8c15d3099e0bb46bbae9cf38a
Related Efforts Universal Data Element Framework (UDEF) Controlled vocabulary for naming data elements based on ISO/IEC Digital Express Research Institute (DERI) Data catalog (dcat) vocabulary RDF Vocabulary for exchange of data catalogs, such as data.gov and data.gov.uk (early draft) Universal Core (UCORE)Agreed upon representations for most commonly shared and understood elements. NIEM IEPDAgreed upon exchange for area of shared interest. etc
Federal CV Efforts USAF Vocabulary OneSource CENDI September 11, 2008 Workshop New Dimensions in Knowledge Organization Systems SKOS for the DoD Metadata Taxonomy Tuesday VoCampDCMay etc
Display Types More Types:
Automated Example
Controlled Vocabulary Courtesy of Leo Obrst, Mitre Corporation
Sample Tools