Presentation on theme: "Stella G Dextre Clarke Project Leader, ISO NP 25964"— Presentation transcript:
1 Stella G Dextre Clarke Project Leader, ISO NP 25964 ISO the new standard for thesauri and interoperability with other vocabulariesStella G Dextre ClarkeProject Leader, ISO NP 25964
2 Overview What is ISO 25964? Outline of Part 1 Outline of Part 2 More detail on some of the issues dealt with in the standardComment on the need for a standard
3 What is ISO 25964?ISO 25964: Thesauri and interoperability with other vocabulariesPart 1: Thesauri for information retrievalPart 2: Interoperability with other vocabulariesIt updates ISO 2788 and ISO 5964, with some input from BS 8723Information retrieval (indexing/searching) is the overall contextPart 1 covers monolingual and multilingual thesauri (= ISO ISO 5964)Part 2 covers mapping between thesauri and other types of vocabulary
4 What distinguishes ISO 25964-1 from ISO 2788/5964? Clearer differentiation between terms and conceptsClearer guidance on applying facet analysis to thesauriSome changes to the ‘rules’ for compound termsMore guidance on managing thesaurus development and maintenanceRequirements for software to manage thesauriData model and XML schema for data exchangeGeneral overhaul in all areas, e.g. sweeping update of multilingual examples
6 Is there a need for ISO ?“The thesaurus is dead. Long live Google!”But look how many thesauri we see today – alive and growing“Nobody has time to do indexing nowadays”Did anyone ever follow ISO 2788 rigorously?Look at the lack of standardization in today’s thesauri. The ideal thesaurus responds to the special needs of its own users.Consider the demand for networked applications which draw upon multiple heterogeneous resourcesConsider the diversity and evolution of languages/terminology in today’s full textDon’t forget the challenge of searching for images without textSuccessful automated networking depends on standards, or at least predictability in the tools and resourcesISO compliance should enhance predictability in search toolsAnd ISO ?
7 Content of ISO 25964-2 “Interoperability with other vocabularies” No normative statements about building vocabularies other than thesauriHowever, comparisons are made and key features described.Emphasis is on interoperability, especially mapping between different vocabulariesStructural models for mappingRecommended mapping typesHow to handle pre-coordinationPractical aspects of mapping
8 Which “other vocabularies”? Classification schemesBusiness classification schemes for records management (aka file plans)TaxonomiesSubject heading schemesOntologiesTerminologies/Term banksName authority listsSynonym rings
9 Structural models for mapping across vocabularies HEGPQRS
10 The dangers of chain mapping buses → coachescoaches → trainerstrainers → training shoesjob vacancies → jobsjobs → postsposts → postpost → mailAny one of the mappings could be OK in one context, but not when chained.Most howlers can be avoided, but only if you check carefullytimber → woodwood → woodswoods → forestsfirewood → logslogs → recordsrecords → archives
11 The dangers of two-way mappings PoultryParrotsChickensCanariesBirdsDucksBudgiesGeeseVocabulary 1Vocabulary 2Vocabulary 3
12 ISO 25964-2 mapping types Basic mapping types: EquivalenceHierarchicalAssociativeequivalence mappings can also be marked as “Exact” or “Inexact”
14 Subdivisions of ISO 25964-2 mapping types Basic mapping types:EquivalenceSimpleCompoundIntersecting compound equivalenceCumulative compound equivalenceHierarchicalBroaderNarrowerAssociative“Exact” or “Inexact” applies to simple but not compound equivalence
16 Intersecting versus cumulative equivalence Women executives EQ Women + ExecutivesInland waterways EQ rivers | canalsexecutiveswomenwomen executivescanalsinland waterwaysrivers
17 Pre-coordination adds complexity If only we could ignore classification schemes and subject heading schemes!For example:The UDC class :51(mathematics curriculum in primary schools)The LCSH headingAutomobiles--Air conditioning--Maintenance and repair--Periodicals
18 Example: “academic library labor unions in Germany” (- from Marcia Lei Zeng/FRSAD report) DDC: " “– labor unions in industries and occupations other than extractive, manufacturing, construction– academic libraries-0943 – GermanyLCSH:"Library employees--Labor unions--Germany""Universities and colleges--Employees--Labor unions--Germany""Collective bargaining--Academic librarians--Germany""Libraries and labor unions--Germany"UNESCO Thesaurus:“Trade unions” “Academic libraries” “Germany”ILO Thesaurus:“Trade union” “library” “educational institution” “Germany”
19 How to map to and from pre-coordinated classes and synthesized notations? For vocabularies using post-coordination (esp thesauri) mappings between them look feasibleMapping from a pre-coordinated or synthesized class to a thesaurus looks feasible.Mapping to a pre-coordinated class looks more problematic!The same applies to mapping from a synthesized class in one scheme to a differently synthesized class in another schemeComparing subject headings with classification schemes, pre-coordination works in slightly different ways. Can we find common solutions?In any case, should the aim to be to map between schemes, or between the indexes of collections indexed/catalogued with the schemes?
20 In the real world, mapping perfection is elusive… Mapping projects are labour intensive, and often under-resourcedExact equivalence is all too rareEven when exact equivalence seems likely, it is often hard to be sureSome managers assume that mappings can be found by computers without human guidanceOften the vocabularies to be mapped are poorly constructedCompound equivalence is needed commonly, but often unavailableInclusion of pre-coordinate schemes makes it much harderSome systems allow only one mapping per conceptWhile preparing mappings, you can’t make assumptions about capabilities of the search software
21 Is there a need for ISO ?Consider the demand for networked applications which draw upon multiple heterogeneous resourcesFinding equivalent concepts cannot rely on comparison of text words aloneBear in mind the challenges listed abovePractical experience of mapping is not widespreadISO provides guidance on good practice, mostly on the intellectual processes but also on the potential for automation
22 Want a copy of ISO ?A draft is due to appear in early 2011, “ISO DIS ”, with the hope of attracting comments from potential usersThe official way to get it is through your national standards body (e.g. BSI, DIN)Distribution policies vary from one country to another; last time round we found a way to make the draft available online free of charge and free of passwords, on the BSI site.Send me an and I’ll alert you when the DIS is released.
23 Want to get involved?Contact your national standards body, specifically the committee corresponding to ISO TC 46/SC 9/WG817 countries already participate: Belgium, Bulgaria, Canada, China, Denmark, France, Germany, Finland, Korea, New Zealand, Russia, South Africa, Spain, Sweden, UK, Ukraine, USAWhile Part 1 of the standard will be published in 2011, Part 2 is still in draft. There is time for you to contribute ideas on interoperability!