Presentation on theme: "Not just numbers on shelves: using the DDC for information retrieval Gordon Dunsire Presented at the Symposium “Bridging the class(ification) divide: the."— Presentation transcript:
Not just numbers on shelves: using the DDC for information retrieval Gordon Dunsire Presented at the Symposium “Bridging the class(ification) divide: the new DDC languages and retrieval possibilities”, 27 April 2010, Bibliotheca Alexandrina, Alexandria, Egypt
Overview “Traditional” uses of the DDC Machine-readability opens up possibilities for subject-based information retrieval Hierarchical and linear browse Keyword search Terminology services (hub-spoke) Multilingual retrieval Semantic web EDUG IT survey
Traditional use of the DDC Shelfmarking Shelf location in a linear sequence Notation can be fitted to a (book) spine Subject grouping Notation brings similar topics together and keeps separate topics apart Collection analysis by subject or discipline Management information by subject Loans, acquisitions, etc.
Digital environment Notation <> Captions Notation in catalogue record can be (automatically) matched to human-friendly caption(s) Opposite of classification process, where caption is matched to notation Sometimes via Relative Index Length of caption not a limiting factor Length of notation also not limiting No need to truncate notation Notation/caption changes (legacy) more easily managed
Information retrieval Notation hierarchy can be used to display caption hierarchy Built notation (i.e. added subdivisions) can be parsed to identify facet captions E.g. Place, time Keywords can be found inside captions Notation can be linked to caption variants Translations of the DDC “Captions” or subject headings outside of the schedules
Linear browse Captions listed in alphabetical order With or without Relative Index Already in alphabetical order Possibility of keyword-in-context (KWIC) or keyword-out-of-context (KWOC) indexes Each significant word in caption rotated to the front (or extracted) and interfiled in alphabetical order Possibility of integration with subject headings Or substitute for headings
Hierarchical browse Captions and/or notations exposed at one “level” only Controlled by numeric notation First digit = level 1; First 2 digits = level 2, etc. Decimal notation so maximum of 10 topics at each level User drills-down in hierarchical order from the top (broadest topic) Or drills-up from specific to general Levels can be expressed as tag clouds Topics weighted by notation (3xx, 32x, )
Keyword retrieval Captions included in: DDC keyword index Subject keyword index E.g. With subject headings General keyword index E.g. With titles, notes, etc. DDC caption terminology distinct from other major subject heading schemes Alternative terms (and spellings) DDC caption: “Acquisition through exchange, gift, deposit” LCSH: “Book donations” [neither term in Relative Index]
Terminology services (1) Captions, headings, terms from any scheme can be “classified” by DDC i.e. Assigned a DDC notation Notation becomes a bridge or link between headings from different schemes Hub-and-spoke, with DDC as the hub and each different scheme as a spoke More efficient that one-one mappings between headings Combinatorial explosion 3 schemes > 3 mappings 4 schemes > 6 mappings...
Terminology services (2) Hub (i.e. DDC notation) is transparent to user Term A > DDC notation < Term B Term A <> Term B Approach used by High-Level Thesaurus (HILT) project Successful, but scalability an issue Even though more efficient that Term-Term approach Scalability might be more achievable in a distributed environment i.e. Semantic Web
Translations Caption to caption translation English caption <> Arabic caption But notation is common, and language-free Non-English translation is similar to non-DDC topic/subject heading scheme Intrinsic hub-spoke architecture Arabic caption <> English caption (= notation) <> German caption Arabic caption <> German caption Translations can be automatically switched “Instance” notation remains the same
DDC and the Semantic Web OCLC is developing a representation of the DDC in resource description framework (RDF) The basis of the semantic web Includes notations, captions, notes, and legacy (audited changes) Only DDC Summaries available so far 11 languages including English Can be added to the linked-data “soup” Distributed processing, development and services
ApplicationCurrentPlannedPossibleNo Shelfmarking27015 Shelf signing and guiding23118 Identifier for online materials83715 Statistics and management information17367 Notation browse (linear) Notation browse (hierarchical)82419 Caption browse (linear)33522 Caption browse (hierarchical)23424 Caption keyword search42621 Caption tag cloud11724 Survey results to 20 Apr 2010
Thank you EDUG IT (links to applications) Dewey.info (DDC in RDF)