Presentation is loading. Please wait.

Presentation is loading. Please wait.

Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony

Similar presentations


Presentation on theme: "Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony"— Presentation transcript:

1 Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com

2 Background  "Underlying the information architecture for web sites and search are taxonomies. The standards for thesauri, taxonomies, ontologies, semantic web and topic maps are converging.  Where do they differ and where are they the same?  This one hour talk will cover the ISO ANSI/NISO and W3C terminology and controlled vocabulary standards, as well as the differences in the new standards compared to the previous editions.  Finally it will talk about the crosswalks and registries underway between these development communities."

3 What we will cover today  Background  Overview of standards  Specifics on 3 things NISO Z39.19 BSI 8723 IFLA  Thoughts on a registry

4 Why are taxonomies hot?  Search doesn’t work Without tagged data  Websites need them to display information  To tag navigation back to content

5 What’s happening to the business?  Carpet baggers  Differences of opinion  Want to build on existing taxonomies  Need for standards  Need for cross walks  Need for international communication  Need for general registries of taxonomies

6 The Problem – KEEPING UP  Many players we know and don’t know  Between controlled vocabulary standards ISO 2788 and 5964, BSI 8723  Groups developing guidelines and standards W3C with SKOS and OWL Governments world wide developing and mandating taxonomies  Communities increase reuse mapping interoperability between controlled vocabularies.

7 Traditional Standards  ISO TC 46  SC 9 ANSI  NISO Z39.19 BSI  BS 8723  W3C OWL SKOS  US Government Office of Management and Budget  European Union

8 Thesaurus related  NISO Z39.19 2006 www.niso.orgwww.niso.org  BSI (BS 8723) the next revised ISO  ISO 2788 - Monolingual (1986)  ISO 5964 - Multilingual (1985) www.iso.ch/iso/en/ISOOnline.frontpage www.iso.ch/iso/en/ISOOnline.frontpage  ISO 5127, Information and documentation  Vocabulary  OWL from W3C  SKOS the W3C thesaurus standard

9 Thesaurus and Indexing Standards – ANSI/NISO  ANSI/NISO Z39.19 - 2003 Guidelines for the Construction, Format, and Management of Monolingual Thesauri  NISO Z39.19-200x Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies  NISO TR02-1997 Guidelines for Indexes and Related Information Retrieval Devices by James D. Anderson

10 The standards  NISO Z39.19 2006 www.niso.orgwww.niso.org  BSI (BS 8723) - the next revised ISO  ISO 2788 - Monolingual (1986)  ISO 5964 - Multilingual (1985) www.iso.ch/iso/en/ISOOnline.frontpage www.iso.ch/iso/en/ISOOnline.frontpage  ISO 5127 - Information and documentation  Vocabulary  OWL from W3C  SKOS - the W3C thesaurus standard

11 Z39.19 - What’s new? The old standard  Coverage documents  Types of vocabularies Thesauri  Single BT  Post-coordinated  Printed formats  Monolingual vocabularies The revised standard  Coverage Content objects  Types of vocabularies lists, synonym rings, taxonomy  Pre-coordinated  Web format  Multilingual vocabularies (general)  Polyheirachical  Interoperability  Facet analysis

12 British Standards - BS 8723  Structured vocabularies for information retrieval – Guide  Part 1: General  Part 2: Thesauri  Part 3: Vocabularies other than thesauri  Part 4: Interoperability between vocabularies  Part 5: Interoperability with applications

13 ISO TC 37 Scope of ISO TC 37: Standardization of principles, methods and applications relating to terminology and other language resources.  TC 37/SC 1 - Principles and methods  TC 37/SC 2 - Terminography and lexicography  TC 37/SC 3 - Computer applications for terminology  TC 37/SC 4 - Language resource management

14 Other ISO standards: Concept-oriented terminology ISO 704:2000 Terminology work - Principles and methods ISO 860:1996 Terminology work - Harmonization of concepts and terms ISO 1087-1:2000 Terminology work - Vocabulary - Part 1: Theory and application ISO 1087-2:2000 Terminology work - Vocabulary - Part 2: Computer applications ISO 10241:1992 Preparation and layout of international terminology standards

15 Sample ISO - Data Categories  ISO 12200:1999 Computer applications in terminology - Machine-readable terminology interchange format (MARTIF) - Negotiated interchange ISO 12616:2002 Translation-oriented terminography ISO/TR 12618:1994 Computer aids in terminology - Creation and use of terminological databases and text corpora ISO 12620:1999 Computer applications in terminology - Data categories  used to create glossaries

16 ISO Thesaurus and Indexing Standards  ISO 2788:1986 Documentation - Guidelines for the establishment and development of monolingual thesauri  ISO 5964:1985 Documentation - Guidelines for the establishment and development of multilingual thesauri  ISO 5963:1985 Documentation - Methods for examining documents, determining their subjects, and selecting indexing terms  ISO 999:1996 Information and documentation - Guidelines for the content, organization and presentation of indexes

17 ISO TC 46/SC 9  Information and Documentation - Identification and Description  TC 46 is ISO's Technical Committee (TC) for information and documentation standards.  SC 9 is the TC 46 Subcommittee (SC) that develops and maintains ISO standards on the identification and description of information resources.

18 ANSI/NISO Thesaurus and Indexing Standards  ANSI/NISO Z39.19 - 2005 Guidelines for the Construction, Format, and Management of Monolingual Thesauri  NISO Z39.19-200x Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies  NISO TR02-1997 Guidelines for Indexes and Related Information Retrieval Devices by James D. Anderson

19 Reports to use  Report on the Workshop on Electronic Thesauri, November 4-5, 1999 http://www.niso.org/news/events_workshops/th es99rprt.html http://www.niso.org/news/events_workshops/th es99rprt.html  Final Report to the ALCTS/CCS Subject Analysis Committee: Subcommittee on Subject Relationships/Reference Structures June 1997 http://archive.ala.org/alcts/organization/ccs/ sac/rpt97rev.html

20 Other links  http://esw.w3.org/topic/SkosDev/ThesaurusLinks/XmlForm ats  MARC-21 XMLSchema. MARC-21 XMLSchema  Zthes Z39.50 profile for thesaurus navigation (2001). Zthes Z39.50 profile for thesaurus navigation  TML thesaurus markup language (1999). TML thesaurus markup language  ADL Thesaurus Protocol XML formats (2002). ADL Thesaurus Protocol XML formats  MeSH XML format (2001). MeSH XML format  GEMET XML format (2003). GEMET XML format  APAIS XML thesaurus format, an extension of Zthes (2000). APAIS XML thesaurus format  Open University thesaurus schemas (2002). Open University thesaurus schemas  Soergel XML thesaurus specification (2001). Soergel XML thesaurus specification

21 W3C  OWL – Web Ontology Language  RDF – Resource Description Format  Topic Maps  SKOS - Simple Knowledge Organization Systems  Which community to serve?  Build on the current standard  Might make this link next

22 Other things to watch  Other W3C and ISO areas  Support groups Blogs Communities of Practice  SIMILE  Web 2.0 activities  WSDL – Web Services Digital Library

23 Other Relevant ISO & W3C Standards For translation, terminology and applied linguists go to: http://appling.kent. edu/ResourcePag es/LTStandards/C hart/standards.ch art.htm#Ontology Markup Languages Metadata Resources Character Coding Access Protocols and Interoperability Content Creation, Manipulation, and Maintenance Authoring Standards Text and Content Markup Translation Standards Terminology and Lexicography Standards ISO TC 37 Standards Terminology Interchange Standards Controlled Language Standards Taxonomy and Ontology Standards Corpus Management Standards Locale-Related Standards

24 SIMILE  Semantic Interoperability of Metadata and Information in unLike Environments  Forming a data reference for open source taxonomies

25 Revised Standards for Controlled Vocabularies U.S. Standard (NISO Z39.19 - 2005) British Standard (BS 8723 - 2005) IFLA Guidelines - 2005

26 U.S. Standard for Controlled Vocabularies – NISO Z39.19 NISO Z39.19-200x Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies Some of the slides are based on Emily Fayen 2004.6 SLA presentation, Margie Hlava’s talk at 2005 Data Harmony User Group meeting 2005 and Marcia Zeng – NKOS Meeting in Denver

27 A little bit history…  ANSI/NISO Z39.19,Guidelines for the Construction, Format, and Management of Monolingual Thesauri – 1993  The most frequently requested NISO Standard  In spite of its age the Standard is still relevant  1999: NISO Workshop on Electronic Thesauri http://www.niso.org/news/events_workshop/th es99rpt.html http://www.niso.org/news/events_workshop/th es99rpt.html  2002: NISO initiates revision of Z39.19  2004: 1993 reaffirmed  2005 new standard published

28 Scope  Expand beyond thesaurus  Make more user-friendly  Explain important concepts  Explain principles of vocabulary control  Include electronic information environment  Include additional user search methods: Browse Navigate Keyword searching  Expand beyond A & I services  Include Web applications

29 The Team: Vivian Bliss – Microsoft Carol Brent – ProQuest John Dickert – DTIC Lynn El-Hoshy – Library of Congress Marjorie Hlava – Access Innovations Stephen Hearn – ALA Sabine Kuhn – Chemical Abstracts Service Pat Kuhr – H.W. Wilson Company Diane McKerlie – DMA Consulting Peter Morville -- Semantic Studios Stuart Nelson – National Library of Medicine Allan Savage – National Library of Medicine Diane Vizine-Goetz – OCLC Marcia Lei Zeng – Special Libraries Association

30 Z39.19 Chapters 1. Introduction 2. Scope 3. Referenced Standards 4. Definitions, Abbreviations, and Acronyms 5. Controlled Vocabularies – Purpose, Concepts, Principles, and Structure 6. Term Choice, Scope, and Form 7. Compound Terms 8. Relationships 9. Displaying Controlled Vocabularies 10. Interoperability 11. Construction, Testing, Maintenance, and Management Systems

31 Z39.19 - What’s new?  The old standard  Coverage documents  Types of vocabularies Thesauri  Single BT  Post-coordinated  Printed formats  Monolingual vocabularies  The revised standard  Coverage Content objects  Types of vocabularies lists, synonym rings, taxonomy  Pre-coordinated  Web format  Multilingual vocabularies (general)  Poly hierarchical  Interoperability  Facet analysis

32 Principles of Controlled Vocabularies  There are four important principles of vocabulary control that guide their design and development. eliminating ambiguity controlling synonyms establishing relationships among terms where appropriate testing and validation of terms

33 Type of vocabulary control

34 Lists A list is a simple group of terms Example: Alabama Alaska Arkansas California Colorado.. Frequently used in Web site pick lists and pull down menus

35 Synonym Rings A synonym ring is a list of synonyms or near synonyms that are used interchangeably for retrieval purposes

36 Synonym Rings -- Examples Synonym rings are usually found as sets of lists that allow users to access all content containing any of the terms. e.g., cholesterol: Cholesterol Blood Cholesterol Serum Cholesterol Good Cholesterol Bad Cholesterol LDL. -- Frequently used in systems where the content is not indexed or the indexing vocabulary is not controlled

37 An example from International SEMATECH; a search for Silicon would look like this: Your search was submitted as “SILICON” or “SI”

38 Synonym Rings are used--  To expand queries for content objects. any one of these terms retrieves any of the terms in the cluster.  With unstructured natural language format, interface draws together similar terms  With search engines Help control of the diversity of the language

39 Taxonomies A taxonomy is a set of preferred terms, all connected by a hierarchy or polyhierarchy Example: Chemistry Organic chemistry Polymer chemistry Nylon Frequently used in web navigation systems

40 Thesauri A thesaurus is a controlled vocabulary with multiple types of relationships Example: Rice UF paddy BT Cereals BT Plant products NT Brown rice RT Rice straw

41 Thesauri (cont.) Relationship types:  Equivalence (Use/Used For) – indicates preferred term in a synonym relationship  Hierarchy – indicates broader and narrower terms  Associative – almost unlimited types of relationships may be used - related It is the most complex format for controlled vocabularies and widely used.

42 Interoperability  One of the most important issues from the 1999 workshop  Question: How to compare indexes perform searches merge databases that have been developed using different controlled vocabularies?

43 Interoperability (CONT.)  Factors Affecting Interoperability  Multilingual Controlled Vocabularies  Searching  Indexing  Merging Databases  Merging Controlled Vocabularies  Achieving Interoperability  Storage and Maintenance of Relationships among Terms in Multiple Controlled Vocabularies

44 II. The British Standard BS 8723: Structured Vocabularies for Information Retrieval – Guide Slides based on the presentation by Stella G Dextre Clarke, Alan Gilchrist,Leonard Will In ISKO 2004, London

45 Existing BSI/ISO thesaurus standards  ISO 2788-1986 Guidelines for the establishment and development of monolingual thesauri = BS 5723:1987  ISO 5964-1985 Guidelines for the establishment and development of multilingual thesauri = BS 6723:1985

46 What needs updating?  Printed versus electronic application  Guidance on management software  Interoperability: Mapping between thesauri and other types of vocabulary Formats/protocols for data exchange with downstream applications  Applicability to end-user applications, not just those for information professionals

47 Outline of new standard BS 8723: Structured vocabularies for information retrieval – Guide Part 1 - Definitions, symbols and abbreviations Part 2 – Thesauri Part 3 - Vocabularies other than thesauri; Part 4 - Interoperability between vocabularies Part 5 - Interoperation between vocabularies and other components of information storage and retrieval systems

48 Part 3 chapters  Classification schemes  Subject heading lists  Taxonomies  Ontologies  Semantic nets (?)  Search thesauri

49 Issues for Part 3  How much guidance is needed on how to build other sorts of vocabulary?  Should we describe the idiosyncrasies of existing schemes, even where we judge there is a ‘better’ way?  Pick out the characteristics of different vocabulary types that govern when and how you can map them.  But some of the observable characteristics might not be what we’d recommend.

50 Part 4: Interoperability between vocabularies  Huge demand for accessing information indexed with another language and/or vocabulary. ‘Mapping’. The Semantic Web is just one application.  Includes multilingual thesauri special case of mapping between vocabularies.  Applies where more than one language or vocabulary is in use, access to all resources is through one vocabulary

51  BS 8723 part 4 has a wider scope BS 6723, was only with multilingual thesauri.  BS 8723 extends the scope to: thesauri in different dialects of one language different thesauri in a single language situations where a thesaurus interoperates with one or more different types of structured vocabulary, such as classification schemes situations where not all the interoperating vocabularies have the same status and/or function. Part 4: Interoperability between vocabularies (cont.)

52 Part 5: Interoperability with applications  Vocabularies must work with Search software Content Management Systems Web publishing software, etc.

53 Build on existing formats and protocols for data exchange  Z39.50 and Zthes,  XML schema  DTD  MARC  SKOS Core Schema  Topic Map  ADL gazetteer protocol  W3C crosswalks  OMB _ Section 207 of e-gov act

54 Review and Comments  Request a copy for Parts 1, 2, 3 and 4: Parts 1 and 2 numbered 04/30086620 DC and 04/30094113 DC. The documents may be ordered from BSI Customer Services  tel +44(0)208-996-9001 or  email orders@bsi-global.com  Part 5 is out for comment

55 III. IFLA Guidelines for Multilingual Thesauri IFLA Classification and Indexing Section April 2005 released for comments Published 2005

56 World-Wide Review of IFLA Guidelines for Multilingual Thesauri  URL: http://www.ifla.org/VII/s29/pubs/Draft- multilingualthesauri.pdf Add to the ISO 5964 for multilingual Thesauri

57 IFLA Classification and Indexing Section WG on Guidelines for Multilingual Thesauri  Chair: Gerhard J.A. Riesthuis (Netherlands)  Members:  Lois Mai Chan (USA),  Patrice Landry (Switzerland),  Pia Leth (Sweden),  Ia McIlwaine (United Kingdom),  Martin Kunz (Germany),  Dorothy McGarry (USA),  Max Naudi (France),  Marcia Lei Zeng (USA)

58 Three approaches in the development of multilingual thesauri: 1. building a new thesaurus from the bottom up starting with one language and adding another language or languages starting with more than one language simultaneously 2. combining existing thesauri merging two or more existing thesauri into one new (multilingual) information retrieval language to be used in indexing and retrieval linking existing thesauri and subject heading languages to each other; using the existing thesauri and/or subject heading languages both in indexing and retrieval 3. translating a thesaurus into one or more other languages

59 Semantic problems Semantic problems pertain to equivalence relations between terms used as preferred and non- preferred terms in information retrieval languages. Equivalence relations exist not only within each separate language involved, but also between the languages (intra-language equivalence and inter- language equivalence). Intra-language homonymy and inter-language homonymy are also considered semantic questions. Additional problems pertaining to semantics involve the scope, form and choice of thesaurus terms.

60 Structural problems  Structural problems involve hierarchical and associative relations between the terms.  An important question in this respect is whether the structure should be the same or different for each language. In most if not all cases of linking, the structure will most probably not be the same in all the information retrieval languages involved. In the other approaches mentioned it is possible in principle to apply the same structure to all languages.

61 Contents covered by the guidelines  Building multilingual thesauri starting from scratch Structure Morphology and Semantics  Starting from existing thesauri Merging Linking  Glossary  Appendix: An example of a non-symmetrical thesaurus

62 Examples are in multiple languages Cranes is a homograph in English does not necessarily mean that equivalent terms in other languages are also homographs. The Dutch term kranen is a homograph too, but with the meanings cranes (lifting equipment) and taps.

63 What is a taxonomist to do?  Watch the standards  Participate in development  Exceed the guidelines  Comply with all standards – internationally  Promote standards participation  And we do – so far!

64 Controlled vocabularies of all stripes need a place to call home  Open contribution  Thesaurus metadata contributions  Comments on the contributions  Examples of implementation  A clearing house to keep track of  all the initiatives and  suggested standards,  a means to allow input from and to those initiatives, and  publishing of best practices or lessons learned from implementations  perhaps a WikiKOS

65 The Solutions  Registry? NKOS KOS of KOS  SKOS participants  KOS typology - Tudhope  Tesauro.com – Spanish - Salama  Kent.edu site – Marcia Zeng  Taxonomy Warehouse – Factiva - Clarke  UMLS - Unified Medical Language System

66 More Solutions  Semantic Interoperability of Metadata and Information in unLike Environments (Open Source  UK HILT - Dennis Nicholson

67 Good starts  Link to each other  Include Thesauri Taxonomies Semantic webs Classification systems Subject headings SKOS OWL and Ontologies Other KOS

68 What about?  Authority Files  Other pick lists  Roget's and other synonym rings  Dictionaries  Gazetteers  Glossaries  Etc.

69 Discussion?? Thank you for your attention! Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com


Download ppt "Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony"

Similar presentations


Ads by Google