A Registry for controlled vocabularies at the Library of Congress

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Resource description and access for the digital world Gordon Dunsire Centre for Digital Library Research University of Strathclyde Scotland.
1 ICS-FORTH EU-NSF Semantic Web Workshop 3-5 Oct Christophides Vassilis Database Technology for the Semantic Web Vassilis Christophides Dimitris Plexousakis.
Alexandria Digital Library Project Integration of Knowledge Organization Systems into Digital Library Architectures Linda Hill, Olha Buchel, Greg Janée.
Bibliographic Framework Initiative Approach for MARC Data as Linked Data Sally McCallum Library of Congress.
An Introduction to MODS: The Metadata Object Description Schema Tech Talk By Daniel Gelaw Alemneh October 17, 2007 October 17, 2007.
6. Applying metadata standards: Controlled vocabularies and quality issues Metadata Standards and Applications Workshop.
Corey A Harper DC2006 October 4, 2006 Authority Control for the Semantic Web Encoding Library of Congress Subject Headings (LCSH) in SKOS.
Standards for networked knowledge organisation systems Ron Davies European Library Automation Group Bucharest, April 2006.
SKOS and Other W3C Vocabulary Related Activities Gail Hodge Information International Assoc. NKOS Workshop Denver, CO June 10, 2005.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
© 2006 DCMI DC-2006 – International Conference on Dublin Core and Metadata Applications 3-6 October 2006 Thomas Baker Dublin Core Metadata Initiative.
Metadata : Setting the Scene or a Basic Introduction Wendy Duff University of Toronto, Faculty of Information Studies.
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
Vocabulary Services “Huuh - what is it good for…” (in WDTS anyway…) 4 th September 2009 Jonathan Yu CSIRO Land and Water.
Metadata Standards and Applications 4. Metadata Syntaxes and Containers.
Metadata Standards and Applications 5. Applying Metadata Standards: Application Profiles.
1/ 27 The Agriculture Ontology Service Initiative APAN Conference 20 July 2006 Singapore.
PREMIS Tools and Services Rebecca Guenther Network Development & MARC Standards Office, Library of Congress NDIIPP Partners Meeting July 21,
Information Extraction with Linked Life Data 19/04/2011.
9/10/20151 SKOS. 9/10/20152 SKOS Describes thesauruses and taxonomies Properties: broader, narrower, subject, related Classes: Concept, Collection
Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment.
Metadata: An Overview Katie Dunn Technology & Metadata Librarian
Using Vocabulary Services in Validation of Water Data May 2010 Simon Cox, JRC Jonathan Yu & David Ratcliffe, CSIRO.
DDI-RDF Discovery Vocabulary A Metadata Vocabulary for Documenting Research and Survey Data Linked Data on the Web (LDOW 2013) Thomas Bosch.
The Metadata Object Description Schema (MODS) NISO Metadata Workshop May 20, 2004 Rebecca Guenther Network Development and MARC Standards Office Library.
D4: SKOS and HIVE—Enhancing the Creation, Design and Flow of Information Speakers: Hollie White Jane Greenberg Coordinator: Alan Keely.
SKOS Tutorial Catch Mark van Assem, Antoine Isaac Vrije Universiteit Amsterdam Based on slides by Alistair Miles CCLRC Rutherford Appleton Laboratory
ICS-FORTH June 30, Form and Utility - Knowledge Organisation Systems Center for Cultural Informatics, Institute of Computer Science Foundation for.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
An Introduction to METS Morgan Cundiff Network Development and MARC Standards Office Library of Congress Metadata Encoding and Transmission Standard.
JENN RILEY METADATA LIBRARIAN IU DIGITAL LIBRARY PROGRAM Introduction to Metadata.
MD9.6 Release: Highlights Increased the character limit for all URL resources to 600 characters. Data_Center/Service_Provider Data_Set_Citation/Service_Citation.
Incorporating ARGOVOC in DSpace-based Agricultural Repositories Dr. Devika P. Madalli & Nabonita Guha Documentation Research & Training Centre Indian Statistical.
Implementation of PREMIS in METS Rebecca Guenther Sr. Networking & Standards Specialist, Library of Congress PREMIS Implementation Fair San.
The UNESCO Thesaurus Meeting for Managers of UNESCO Documentation Networks Meron Ewketu UNESCO Library June
Publications Office Metadata Registry (MDR) INSPIRE Registry and Registers Workshop Willem van Gemert Publications Office of the EU Dissemniation and Reuse.
Evolving MARC 21 for the future Rebecca Guenther CCS Forum, ALA Annual July 10, 2009.
PREMIS Controlled vocabularies Rebecca Guenther Sr. Networking & Standards Specialist, Library of Congress PREMIS Implementation Fair San.
ISO 25964: a standard in support of interoperability Stella G Dextre Clarke Project Leader, ISO NP
METS Application Profiles Morgan Cundiff Network Development and MARC Standards Office Library of Congress.
It’s all semantics! The premises and promises of the semantic web. Tony Ross Centre for Digital Library Research, University of Strathclyde
RELATORS, ROLES AND DATA… … similarities and differences.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
Introduction to Metadata Jenn Riley Metadata Librarian IU Digital Library Program.
APAN AG-WG Bangkok Food and Agriculture Organization of the UN Library and Documentation Systems Division Margherita Sini Slide Sustainable.
Metadata Registries Registry: authoritative, centrally controlled store of information – W3C Web Services Glossary, 2004
AGROVOC Thesaurus. 1980s: developed as multilingual structured thesaurus for agricultural terminology (“rice”) : parallel effort to express thesaurus.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Introduction to the Semantic Web and Linked Data
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
THE BIBFRAME EDITOR AND THE LC PILOT Module 3 – Unit 1 The Semantic Web and Linked Data : a Recap of the Key Concepts Library of Congress BIBFRAME Pilot.
Controlled Vocabulary & Thesaurus Design Associative Relationships & Thesauri.
“New Dimensions in KOS” CENDI/NKOS Workshop September 11, 2008 Washington, DC, USA An international conference to share and advance knowledge and experience.
Transitioning from FGDC CSDGM Metadata to ISO 191** Metadata National Coastal Data Development Center A division of the National Oceanographic Data Center.
The Semantic Web. What is the Semantic Web? The Semantic Web is an extension of the current Web in which information is given well-defined meaning, enabling.
PREMIS Controlled vocabularies Rebecca Guenther Sr. Networking & Standards Specialist, Library of Congress PREMIS Implementation Fair Vienna,
Differences and distinctions: metadata types and their uses Stephen Winch Information Architecture Officer, SLIC.
EXtended Knowledge Organization System (XKOS) Prepared by Franck Cotton, Institut National de la Statistique et des Études Économiques Daniel W. Gillman,
SKOS : A language to describe simple knowledge structures for the web
Charlyn P. Salcedo Instructor Types of Indexing Languages.
Ontologies COMP6028 Semantic Web Technologies Dr Nicholas Gibbins
RDFa How and Why Ralph R. Swick World Wide Web Consortium
An Overview of Dublin Core Metadata Schema Registry
Authority Control for the Semantic Web
Introduction to Metadata
PREMIS Tools and Services
Metadata in Digital Preservation: Setting the Scene
Taxonomy of public services
Taxonomy of public services
Presentation transcript:

A Registry for controlled vocabularies at the Library of Congress Rebecca Guenther Network Development & MARC Standards Office, Library of Congress October 29, 2008

Outline of presentation Types of controlled vocabularies Vocabularies maintained at LC An introduction to SKOS Establishing concept databases at LC Examples of concept schemes: ISO 639-2 and PREMIS event type Providing the registry as a web service ASIST 2008 Oct. 29, 2008

Why establish controlled vocabularies? Control values that occur in metadata Document and publish for reuse Reduce ambiguity Control synonyms Establish formal relationships among terms (where appropriate) Test and validate terms Many metadata schemes allow for content from other sources. Some data elements may be more useful if a controlled vocabulary is used. Some are published formally, others are developed and used locally. Formal controlled vocabularies may be used for testing and validation of terms– this is often done in integrated library systems, where bibliographic records may validate against authority records. This is one instance of testing and validation of terms. There is work being done on establishing metadata registries for both documentation and machine validation of both controlled vocabularies and metadata elements/terms. This could be particularly useful for controlled vocabularies, since their usefulness depends on consistency. ASIST 2008 Oct. 29, 2008

Types of Controlled Vocabularies used in metadata standards Lists of enumerated values Code lists (e.g. language, country) Taxonomies Formal Thesauri Locally controlled enumerated lists NISO has a standard for constructing thesauri (free download – in bibliography) ANSI/NISO Z39.19-2005: Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies ASIST 2008 Oct. 29, 2008

Enumerated lists Simple list of terms used in a pull-down menu or Web site pick list Values enumerated in an XML schema Little additional information or structure about each value Examples: Code and value from a MARC 21 fixed field, e.g. code “e” in Leader/06 is “cartographic material” Enumerated value “MD5” for METS CHECKSUMTYPE Enumerated value “born digital” in MODS digitalOrigin ASIST 2008 Oct. 29, 2008

Code lists Some established as ISO standards and used worldwide in many communities for many purposes The standard standardizes the code, not a particular name for it Codes are used as identifiers Examples (maintained by LC): ISO 639-2 (language codes) MARC relator codes MARC country codes ASIST 2008 Oct. 29, 2008

Thesauri A thesaurus is a controlled vocabulary with multiple types of relationships Example: Rice UF paddy BT Cereals BT Plant products NT Brown rice RT Rice straw ASIST 2008 Oct. 29, 2008

Standards maintained at LC that use controlled vocabularies MARC (including code lists) MODS METS MIX (XML schema for Z39.87 Technical metadata for digital still images) PREMIS ISO 639-2 (language codes) Thesaurus of Graphic Materials LCSH … and some others ASIST 2008 Oct. 29, 2008

Simple Knowledge Organisation System(s) SKOS: What is it? Simple Knowledge Organisation System(s) SKOS is … for declaring and publishing taxonomies, thesauri or classification schemes, for use in a distributed, decentralised information system (i.e. a semantic web). for describing Concepts and creating relationships between Concepts and Terms A practical application of RDF a formal language for representing controlled, structured vocabularies ASIST 2008 Oct. 29, 2008 9

The SKOS data model …views a knowledge organization system as a concept scheme comprising a set of conceptual resources (concepts). These concept schemes and conceptual resources are identified by URIs. The model is multilingual and extensible ASIST 2008 10 Oct. 29, 2008

Concepts can be… labeled with any number of strings. One label, in any given language, can be indicated as the "preferred" label for that language, and others as "alternate“ labels, "hidden“ labels, or using a notation: skos:prefLabel skos:altLabel skos:hiddenLabel skos:notation ASIST 2008 11 Oct. 29, 2008

Concepts can be… linked to other concepts within the same concept scheme. Hierarchical links: skos:broader and skos:narrower skos:broaderTransitive and skos:narrowerTransitive Associative links: skos:related ASIST 2008 12 Oct. 29, 2008

Concepts can be… grouped into collections, which can be labeled and/or ordered. A concept can be in one or more collections skos: Collection skos: OrderedCollection skos: member skos: memberList ASIST 2008 13 Oct. 29, 2008

Concepts can be… mapped to other concepts in different concept schemes. Hierarchical mapping: skos:broadMatch skos:narrowMatch Associative mapping: skos:relatedMatch skos:closeMatch skos:exactMatch ASIST 2008 14 Oct. 29, 2008

Advantages to using SKOS SKOS has a defined element set which is particularly relevant for controlled vocabularies Relationships between entries in a thesaurus can be expressed (broader, narrower, etc.) Relationships between entries in different thesauri can be expressed (exactMatch, related) Having a dereferencable URI for concepts and their concept schemes enhances the ability to provide web services for consumers of these standards ASIST 2008 Oct. 29, 2008

Controlled vocabularies registry at LC Library of Congress is establishing databases with controlled vocabulary values for standards that it maintains Controlled lists are represented using SKOS as well as alternative syntaxes Lists currently in progress: ISO 639-2 and MARC language code list MARC geographic area codes MARC country code list MARC relators PREMIS controlled value lists Thesaurus of Graphic Materials Other possibilities Enumerated values in MODS schema Coded and uncoded value lists in MARC ASIST 2008 Oct. 29, 2008

Reasons for developing a registry Facilitate development and maintenance process Make controlled lists openly available Develop a web service where comprehensive information about controlled terms is available Experiment with semantic web technologies Expose vocabularies to a wider communities ASIST 2008 Oct. 29, 2008

http://www.loc.gov:8081/standards/registry/lists.html

Example: ISO 639-2 vocabulary One in the family of ISO 639 language coding standards Has a close relationship with other language coding standards (ISO 639-1 and -3, MARC) LC is maintenance agency The standard is the CODE, not the language name; multiple names are given ASIST 2008 Oct. 29, 2008

ISO 639-2 language code example <rdf:Description rdf:about= "http://www.loc.gov/standards/registry/vocabulary/iso639-2/por"> <rdf:type rdf:resource="http://www.w3.org/2008/05/skos #Concept"/> <skos:prefLabel xml:lang="x-notation">por</skos:prefLabel> <skos:altLabel xml:lang="en-Latn">Portuguese</skos:altLabel> <skos:altLabel xml:lang="fr-Latn">portugais</skos:altLabel> <skos:notation rdf:datatype="xs:string">por</skos:notation> <skos:definition xml:lang="en-Latn">This Concept has not yet been defined.</skos:definition> <skos:inScheme rdf:resource="http://www.loc.gov/standards/registry/vocabulary/iso639-2"/> <vs:term_status>stable</vs:term_status> <skos:historyNote rdf:datatype="xs:dateTime">2006-07-19T08:41:54.000- 05:00</skos:historyNote> <skos:exactMatch rdf:resource= "http://www.loc.gov/standards/registry/vocabulary/iso639-1/pt"/> <skos:changeNote rdf:datatype="xs:dateTime">2008-07- 09T13:49:05.321-04:00</skos:changeNote> </rdf:Description> This is one type of SKOS expression, which is RDF/XML. Tags defined by SKOS a wrapped in an RDF wrapper. skos:prefLabel uses the language code as value so as to not give a preference to a term in a particular language. The skos:altLabel is used here for the various language names skos:inScheme tells you what concept scheme this entry is included in; there can be multiples. skos:exactMatch gives a URI for the other code which is exactly the same; in this case there is the 2-character code “pt” (ISO 639-1), which is equivalent to this one.

PREMIS controlled lists PREMIS Data Dictionary for Preservation Metadata Some semantic units call for controlled vocabularies and have suggested lists A central registry could document and make them available Users could submit their own terms PREMIS schema could be enhanced with enumerated values for validation generated dynamically ASIST 2008 Oct. 29, 2008

PREMIS event type example <rdf:Description rdf:about= "http://www.loc.gov/standards/registry/vocabulary/preservationEvents/creation"> <rdf:type rdf:resource= "http://www.w3.org/2008/05/skos#Concept"/> <skos:prefLabel xml:lang="en-latn"> creation</skos:prefLabel> <skos:narrower rdf:resource= "http://www.loc.gov/standards/registry/vocabulary/preservationEvents/migration"/> <skos:narrower rdf:resource= "http://www.loc.gov/standards/registry/vocabulary/preservationEvents/normalization"/> <skos:definition xml:lang= "en-latn">the act of creating a new object</skos:definition> <skos:inScheme rdf:resource= "http://www.loc.gov/standards/registry/vocabulary /preservationEvents"/> </rdf:Description> This example is from a concept scheme called “preservation events”. The controlled value described is “creation”, which is in the PREMIS data dictionary as a suggested value under eventType. It is a broader term for 2 others on the value list: migration and normalization.

XML Database using XQuery Registry Web service XML Database using XQuery (eXist) RDF Triple Store (Sesame) HTTP request User Runs query Gets results Sends back to database and then to user Interprets URI Formulates SPARQL query

Further development Consider programming changes to improve speed Develop mechanisms to output all public documentation from database Include additional coding about relationships to other concept schemes and controlled vocabularies (facilitating crosswalks) Encourage experimentation ASIST 2008 Oct. 29, 2008

Questions? Contacts: Rebecca Guenther: rgue@loc.gov Clay Redding: cred@loc.gov ASIST 2008 Oct. 29, 2008