Toward an International Sharing and Use of Subject Authority Data

1 Toward an International Sharing and Use of Subject Authority Data
Marcia Lei Zeng Athena Salaba Kent State University FRBR Workshop, OCLC, 2005

2 Outline Background information Current State Authority Data
Sharing Authority Data FRBR Workshop, OCLC, 2005

3 1. Background 1.1 Subject access
Seeking information on a topic is still the predominant user task Subject access includes: Subject searching Keyword searching Subject browsing It is still very problematic for the majority of searchers FRBR Workshop, OCLC, 2005

4 1.2 Functions of a catalog regarding subject access (1)
Cutter (1897) To find a book if the subject is known To show what a library has on a given subject (collocate) To assist in the choice as to its character (identify) FRBR Workshop, OCLC, 2005

5 1.2 Functions of a catalog regarding subject access (2)
FRBR (1998) To find entities of Group 1 that have entities from Group 1, 2, 3 as their subject To identify To select To obtain FRBR Workshop, OCLC, 2005

6 1.3 What is a subject? Group 1 Group 3 Group 2
FRBR – Functional Requirements for Bibliographic Records Group 1 Work Expression Manifestation Item Group 2 Persons Families Corporate bodies Group 3 Concepts Objects Place Event FRBR Workshop, OCLC, 2005

7 Revisiting Group 3? Time Process
Event is a combination of place and time Concrete vs. abstract concept Ranganathan Personality Matter Energy Space FRBR Workshop, OCLC, 2005

8 2. Current State Subject Authority Data
2.1 Structure (heterogeneous) 2.2 Existing Knowledge Organization Systems/Structures/Schemas (KOS) 2.3 Rules and guidelines 2.4 Communication/Encoding FRBR Workshop, OCLC, 2005

9 2.1 Structures Relationship Groups: Classification & Categorization:
Ontologies Semantic networks Thesauri Relationship Groups: Strongly-structured Classification schemes Taxonomies Categorization schemes Classification & Categorization: Subject Headings Synonym Rings Authority Files Glossaries/Dictionaries Gazetteers Weakly-structured Term Lists: Pick lists Natural language Controlled language

10 Structures: Coordination
Pre-coordination ……….. Post-coordination e.g. subject headings e.g. thesauri - LCSH - AAT, INSPEC MeSH FAST UMLS FRBR Workshop, OCLC, 2005

11 2.2 Existing KOS (1) Library of Congress Subject Headings (LCSH)
Medical Subject Headings (MeSH) ERIC Thesaurus (ERIC) Inspec Thesaurus Inspec Classification Dewey Decimal Classification (DDC) Library of Congress Classification (LCC) Universal Decimal Classification (UDC) HEREIN Thesaurus Alexandria Digital Library (ADL) Gazetteer and Thesaurus Schlagwortnormdatei (SWD) Regenburger Verbund Klassifikation (RVK) RAMEAU: repertoire d'authorite de matieres encyclopedique unifie Art and Architecture Thesaurus (AAT) National Agriculture Library Subject Headings … … FRBR Workshop, OCLC, 2005

12 2.2 Existing KOS (2) Combined V C Global environment Verbal based
Language Structure Verbal based AAT LCSH RAMEU INSPEC Thesaurus MeSH Code based UDC RVK DDC LCC Integrated INSPEC MeSH Hierarchy FRBR Workshop, OCLC, 2005

13 2.3 Rules of KOS Construction
Different rules and guidelines AACR2, Z39.19, RAK (Regeln für die alphabetische Katalogisierung), ISO5964, ISO2788, IFLA Principles Underlying Subject Heading Languages (SHLs) … No rules Indirect/Inherent use of rules (by example) FRBR Workshop, OCLC, 2005

14 2.4 Communication/Encoding for authority data
MARC MARC21 (1xx, 2xx, etc.) UNIMARC (1xx, 2xx, etc. different definition) etc. Guidelines for Authority Records and References (GARR) (>, <, >>, <<) NISO Z39.19 (BT, NT, RT, etc.) XML-based: OWL Web Ontology Language, RDF Schema, Voc-ML, etc. Vocabulary ML: Metacode strawman DTD FRBR Workshop, OCLC, 2005

15 3. Authority Data 3.1 Use of authority data
Direct use of authority data Index Identify/Verify Search & Browse the authority data Indirect use of authority data Searching bibliographic file Browsing bibliographic file Users Information professionals Searcher/end-user FRBR Workshop, OCLC, 2005

16 3.2 Common Authority Data Authorized/established term Variations
Related terms Notes Linked/Parallel terms Numbering, International numbering? Other: language, rules, links to external resources, roles, etc. FRBR Workshop, OCLC, 2005

17 Do we need one authorized term?
Keep USER in mind! Preference, language, script Trends: all are preferred Synonym rings (included in NISO Z39.19 now) Sets of terms that are considered equivalent for the purpose of retrieval. Usually found as sets of lists that allow users to access all content containing any of the terms. Used in conjunction with search engines. B and T World Seeds SEARCH and LOAD Synonym list FRBR Workshop, OCLC, 2005

18 3.3 Common Semantic Relationships in Authority Data
Broad categories Equivalence (Use, Used For, UF, See) Hierarchical (BT, NT, see also) Associative (RT, see also) More specific relationships, such as: Is part of Is instance of Agent/process Process/product Need for other types of relationships? ADL, such as: Overlap; administrativePartOf; SubFeatureOf UMLS, such as: Like; Parent; Child; Sibling WordNet, such as: Familiarity; derivationally related ADL: Footprints: Inherent spatial relationships: contains, overlaps, is-contained-by, adjacent (versus explicit statements of relationships) FRBR Workshop, OCLC, 2005

19 Unanswered Question What authority data currently exist in an authority record? or What authority data should be included in an authority record? FRBR Workshop, OCLC, 2005

20 4. Sharing Authority Data in a Global Environment
Combined V C Global environment Language Structure 4.1. Challenges Structures Languages and scripts Rules Encoding FRBR Workshop, OCLC, 2005

21 4.2. Projects Specifically for Subject Authority Data Sharing
Construction (not to be discussed here) Implementation Projects based on different types of structures Projects involving multiple languages FRBR Workshop, OCLC, 2005

22 Marcia Lei Zeng and Lois Mai Chan Trends and issues in establishing interoperability among knowledge organization systems. Journal of American Society for Information Science and Technology (JASIST) 55(5): 377 – 395 2004

23 KOS Vocabularies Bibliographic files Authority files

24 KOS Vocabularies Bibliographic files Authority files KOS Vocabularies Bibliographic files Authority files

25 Sharing at Vocabulary Level
KOS Vocabularies KOS Vocabularies adaptation, extension, extraction, translation, etc. KOS Vocabularies

26 Sharing at Vocabulary Level
National database "Merimee" about the French Heritage KOS Vocabularies The Thesaurus of Architecture (Le thésaurus de l'architecture) was created and mapped to the Art and Architecture Thesaurus (AAT) and the English Heritage Thesaurus (NMR) 1.Direct mapping KOS Vocabularies For the purpose of indexing complexes, buildings and built works described in the national database "Merimee" about the French Heritage, The Thesaurus of Architecture (Le thésaurus de l'architecture) was created and mapped to the Art and Architecture Thesaurus (AAT, published by The J. Paul Getty Trust) and the English Heritage Thesaurus (, published by The National Monuments Record (NMR) ( When mapping from Merimee’s Thesaurus of Architecture to the AAT and NMR, “AND” and “OR” post-coordination are used to indicate the equivalence, in addition to the conventional equivalence types (exact and partial). (See statistics reported in Doerr, 2001.)

27 Sharing at Vocabulary Level
Renardus project “a cross-browsing feature based on the DDC and improved subject searching across distributed and heterogeneous European subject gateways.” KOS Vocabularies 2.Using a switching system KOS Vocabularies KOS Vocabularies Renardus is a EU project (coordinated by the National Library of the Netherlands with partners from Denmark, Finland, Germany, the Netherlands, Sweden, and the UK) with the purpose of producing “a cross-browsing feature based on the DDC and improved subject searching across distributed and heterogeneous European subject gateways.” The initial investigation included the use of classification systems by Renardus partners’ gateways, general mapping approaches and issues, the definition of mapping relationships, and information on technical solutions and the mapping tool. The approach adopted by the project is a harmonization process that maps local class schemes to a common scheme, thereby enabling users to browse a single subject hierarchy. DDC was chosen as the switching language and common browsing structure. Each DDC class in Renardus presents links to "related collections" which allow to jump to the mapped classes in the participating local gateways and to continue browsing there in the local classification structure. In addition, a virtual browsing feature allows the merging of all local related records from all mapped classes into one common Renardus result set. (Koch, Neuroth , and Day, 2001)

28 Sharing at Vocabulary Level
UMLS® Metathesaurus ® KOS Vocabularies Over 1,000,000 concepts and 4.3 million concept names from more than 100 controlled vocabularies, some in multiple languages 3.Creating a superstructure KOS Vocabularies KOS Vocabularies

29 Sharing at Vocabulary Level
UCB Unfamiliar Metadata Vocabularies KOS Vocabularies Accepts query vocabularies and responds with a ranked list of the system’s entry vocabularies– which is an index to five controlled vocabularies. 4.Creating a superstructure (an index) KOS Vocabularies KOS Vocabularies The project "Mapping Entry Vocabulary to Unfamiliar Metadata Vocabularies" ( has been conducted at the University of California, Berkeley in recent years. As stated in the project website, the researchers plan to develop Entry Vocabulary Indexes that accept topical statements in the searcher's terms ("Query vocabularies") and respond with a ranked list of terms in the system's vocabulary ("Entry vocabularies"). The prototype Entry Vocabulary Modules included English language indexes to BIOSIS Concept Codes, INSPEC Thesaurus, U.S. Patent and Trademark Office Patent Classification, and the Standard Industrial Classification (SIC) codes, and a multilingual index (supporting queries in English, French, German, Russian, or Spanish) to the physical sciences sections of the Library of Congress Classification (LCC). When the Entry Vocabulary Module leads users to a promising term in the target metadata vocabulary, a search can be executed using the newly-found metadata against a remote database (Buckland et al., 1999).

30 Sharing at Vocabulary Level
CAMed Cross-thesaurus searching Terms are linked in a temporary union list generated by the software in response to a query. KOS Vocabularies 5.Creating a superstructure (a virtual index) KOS Vocabularies KOS Vocabularies

31 Sharing at Vocabulary Level
UCSB Alexandria Digital Library KOS Vocabularies The Thesaurus Protocol is based on the ANSI/NISO (1993, R2003) Z39.19 thesaurus model and supports downloading, querying, and navigating thesauri. 6. Linking through a thesaurus server protocol KOS Vocabularies KOS Vocabularies ADL-2

32 Sharing at Subject Authority File Level
KOS Vocabularies Bibliographic files Authority files Direct Mapping KOS Vocabularies Bibliographic files Authority files

33 Direct Mapping -- MACS (Multilingual Access to Subjects)
SWD/RSWK (Schlagwortnormdatei / Regeln für den Schlagwortkatalog) for German, Rameau (Répertoire d'autorité-matière encyclopédique et alphabétique unifié) for French, and LCSH for English. The Swiss National Library (SNL) is the project leader, and its partners are: the Bibliothèque nationale de France (BnF), The British Library (BL), and Die Deutsche Bibliothek (DDB).


35 KOS Vocabularies Bibliographic files Authority files Co-occurrence mapping -- works at the application level, i.e., in metadata records, where the group of subject terms can actually result in loosely-mapped terms. Metadata Terms from thesaurus 1 Terms from thesaurus 2 S1 S2

36 So far, Functional Requirements for Authority Records (FRAR) Covers:
Names for persons, families, corporate bodies (Group 2) Titles (Group 1) Projects for Authority Data Sharing focus mainly on Names: ONE Shared Authority Control (ONESAC, ppt) Virtual International Authority File (VIAF) Linking and Exploring Authority Files (LEAF) Hong Kong Chinese Authority (Name) (HKCAN) VIAF: The Virtual International Authority File A joint project with the Library of Congress and Die Deutsche Bibliothek, VIAF explores virtually combining the name authority files of both institutions into a single name authority service. LEAF develops a model architecture for a system that uploads distributed authorities (persons and corporate bodies) to a central system and automatically links those authorities that belong to the same entity. In January 1999, a group of academic libraries in Hong Kong agreed to set up among themselves the Hong Kong Chinese Authority (Name) (HKCAN) Workgroup for establishing a union database that would reflect the unique characteristics of the Chinese authors and organizational names. This project was spearheaded by both Lingnan and Chinese University Libraries.  FRBR Workshop, OCLC, 2005

37 FRSAR: Functional Requirements for Subject Authority Data
Scope: focus on FRBR’s Group 3 entities FRSAR Working Group contact: Marcia Zeng Maja Zumer Athena Salaba FRBR Workshop, OCLC, 2005

38 FRBR Workshop, OCLC, 2005

39 FRSAR terms of reference
build a conceptual model of Group 3 entities within the FRBR framework (Entities in Group 1 and Group 2 can be used as the subjects of works; but further inclusion of them will depend on the outcomes of the work of the FRANAR Working Group); provide a clearly defined, structured frame of reference for relating the data that are recorded in subject authority records to the needs of the users of those records; and assist in an assessment of the potential for international sharing and use of subject authority data both within the library sector and beyond. FRBR Workshop, OCLC, 2005

