Presentation on theme: "ISO 25964 - the new standard for thesauri and interoperability with other vocabularies Stella G Dextre Clarke Project Leader, ISO NP 25964."— Presentation transcript:
ISO 25964 - the new standard for thesauri and interoperability with other vocabularies Stella G Dextre Clarke Project Leader, ISO NP 25964
Overview What is ISO 25964? Outline of Part 1 Outline of Part 2 More detail on some of the issues dealt with in the standard Comment on the need for a standard
What is ISO 25964? ISO 25964: Thesauri and interoperability with other vocabularies Part 1: Thesauri for information retrieval Part 2: Interoperability with other vocabularies It updates ISO 2788 and ISO 5964, with some input from BS 8723 Information retrieval (indexing/searching) is the overall context Part 1 covers monolingual and multilingual thesauri (= ISO 2788 + ISO 5964) Part 2 covers mapping between thesauri and other types of vocabulary
What distinguishes ISO 25964-1 from ISO 2788/5964? Clearer differentiation between terms and concepts Clearer guidance on applying facet analysis to thesauri Some changes to the rules for compound terms More guidance on managing thesaurus development and maintenance Requirements for software to manage thesauri Data model and XML schema for data exchange General overhaul in all areas, e.g. sweeping update of multilingual examples
Is there a need for ISO 25964-1? The thesaurus is dead. Long live Google! But look how many thesauri we see today – alive and growing Nobody has time to do indexing nowadays Did anyone ever follow ISO 2788 rigorously? Look at the lack of standardization in todays thesauri. The ideal thesaurus responds to the special needs of its own users. Consider the demand for networked applications which draw upon multiple heterogeneous resources Consider the diversity and evolution of languages/terminology in todays full text Dont forget the challenge of searching for images without text Successful automated networking depends on standards, or at least predictability in the tools and resources ISO 25964-1 compliance should enhance predictability in search tools And ISO 25964-2?
Content of ISO 25964-2 Interoperability with other vocabularies No normative statements about building vocabularies other than thesauri However, comparisons are made and key features described. Emphasis is on interoperability, especially mapping between different vocabularies Structural models for mapping Recommended mapping types How to handle pre-coordination Practical aspects of mapping
Which other vocabularies? Classification schemes Business classification schemes for records management (aka file plans) Taxonomies Subject heading schemes Ontologies Terminologies/Term banks Name authority lists Synonym rings
Structural models for mapping across vocabularies E F G H AB CD PQRS
The dangers of chain mapping buses coaches coaches trainers trainers training shoes job vacancies jobs jobs posts posts post post mail Any one of the mappings could be OK in one context, but not when chained. Most howlers can be avoided, but only if you check carefully timber wood wood woods woods forests firewood logs logs records records archives
The dangers of two-way mappings Parrots Canaries Budgies Birds Poultry Chickens Ducks Geese Vocabulary 1Vocabulary 2 Vocabulary 3
ISO 25964-2 mapping types Basic mapping types: Equivalence Hierarchical Associative equivalence mappings can also be marked as Exact or Inexact
Subdivisions of ISO 25964-2 mapping types Basic mapping types: Equivalence Simple Compound Intersecting compound equivalence Cumulative compound equivalence Hierarchical Broader Narrower Associative Exact or Inexact applies to simple but not compound equivalence
Equivalence subdivisions with examples Simple Laptop computers EQ Notebook computers Compound Intersecting compound equivalence Women executives EQ Women + Executives Cumulative compound equivalence Inland waterways EQ rivers | canals
Intersecting versus cumulative equivalence Women executives EQ Women + Executives Inland waterways EQ rivers | canals executiveswomen women executives canals inland waterways rivers
Pre-coordination adds complexity If only we could ignore classification schemes and subject heading schemes! For example: The UDC class 373.3.016:51 (mathematics curriculum in primary schools) The LCSH heading Automobiles--Air conditioning--Maintenance and repair-- Periodicals
Example: academic library labor unions in Germany (- from Marcia Lei Zeng/FRSAD report) DDC: "331.881102770943 331.8811 – labor unions in industries and occupations other than extractive, manufacturing, construction -027.7 – academic libraries -0943 – Germany LCSH: "Library employees--Labor unions--Germany" "Universities and colleges--Employees--Labor unions--Germany" "Collective bargaining--Academic librarians--Germany" "Libraries and labor unions--Germany" UNESCO Thesaurus: Trade unions Academic libraries Germany ILO Thesaurus: Trade union library educational institution Germany
How to map to and from pre-coordinated classes and synthesized notations? For vocabularies using post-coordination (esp thesauri) mappings between them look feasible Mapping from a pre-coordinated or synthesized class to a thesaurus looks feasible. Mapping to a pre-coordinated class looks more problematic! The same applies to mapping from a synthesized class in one scheme to a differently synthesized class in another scheme Comparing subject headings with classification schemes, pre- coordination works in slightly different ways. Can we find common solutions? In any case, should the aim to be to map between schemes, or between the indexes of collections indexed/catalogued with the schemes?
In the real world, mapping perfection is elusive… Mapping projects are labour intensive, and often under- resourced Exact equivalence is all too rare Even when exact equivalence seems likely, it is often hard to be sure Some managers assume that mappings can be found by computers without human guidance Often the vocabularies to be mapped are poorly constructed Compound equivalence is needed commonly, but often unavailable Inclusion of pre-coordinate schemes makes it much harder Some systems allow only one mapping per concept While preparing mappings, you cant make assumptions about capabilities of the search software
Is there a need for ISO 25964-2? Consider the demand for networked applications which draw upon multiple heterogeneous resources Finding equivalent concepts cannot rely on comparison of text words alone Bear in mind the challenges listed above Practical experience of mapping is not widespread ISO 25964-2 provides guidance on good practice, mostly on the intellectual processes but also on the potential for automation
Want a copy of ISO 25964-2 ? A draft is due to appear in early 2011, ISO DIS 25964-2, with the hope of attracting comments from potential users The official way to get it is through your national standards body (e.g. BSI, DIN) Distribution policies vary from one country to another; last time round we found a way to make the draft available online free of charge and free of passwords, on the BSI site. Send me an email and Ill alert you when the DIS is released. email@example.com
Want to get involved? Contact your national standards body, specifically the committee corresponding to ISO TC 46/SC 9/WG8 17 countries already participate: Belgium, Bulgaria, Canada, China, Denmark, France, Germany, Finland, Korea, New Zealand, Russia, South Africa, Spain, Sweden, UK, Ukraine, USA While Part 1 of the standard will be published in 2011, Part 2 is still in draft. There is time for you to contribute ideas on interoperability!