Presentation is loading. Please wait.

Presentation is loading. Please wait.

Controlled Vocabularies Ilia State University, July 2010 Elisabeth Jijavadze, Natia Gabrichidze 1.

Similar presentations


Presentation on theme: "Controlled Vocabularies Ilia State University, July 2010 Elisabeth Jijavadze, Natia Gabrichidze 1."— Presentation transcript:

1 Controlled Vocabularies Ilia State University, July 2010 Elisabeth Jijavadze, Natia Gabrichidze 1

2 ROSENFELD, L., & MORVILLE, P. (2006). Information architecture for the World Wide Web. Farnham, O'Reilly. (Hereafter ROSENFELD & MORVILLE, 2006)

3 What we will speak about? 1. Metadata 2. Controlled Vocabularies o Synonym Rings o Authority Files o Classification Schemes o Thesauri o Semantic Relationships o Preferred Terms o Polyhierarchy o Faceted Classification 3

4 What is metadata? 4

5 5

6 Metadata In data processing, meta-data is definitional data that provides information about or documentation of other data managed within an application or environment. For example, meta-data would document data about data elements or attributes (name, size, data type...); records or data structures (length, fields, columns...); data (where it is located, how it is associated, ownership...). Meta-data may include descriptive information about the context, quality and condition, or characteristics of the data. 6

7 Metadata Metadata tags are used to describe documents, pages, images, software, video and audio files, and other content objects for the purposes of improved navigation and retrieval. The HTML keyword meta tag used by many web sites provides a simple example. Authors can freely enter words and phrases that describe the content. These keywords are not displayed in the interface but are available for use by search engines. 7

8 Metadata Standards wikipedia.org

9 Why Controlled Vocabularies? 9

10 Controlled Vocabularies Vocabulary control comes in many shapes and sizes. At its most vague, a controlled vocabulary is any defined subset of natural language. At its simplest, a controlled vocabulary is a list of equivalent terms in the form of a synonym ring, or a list of preferred terms in the form of an authority file. Define hierarchical relationships between terms (e.g., broader, narrower) and you've got a classification scheme. Model associative relationships between concepts (e.g., see also, see related) and you're working on a thesaurus. Figure 9-1 illustrates the relationships between different types of controlled vocabularies. 10

11 Controlled vocabularies  Synonym rings  Synonym Rings  Authority Files  Classification Schemes  Thesauri  Faceted Classification

12 Types of controlled Vocabularies 12

13 Synonym Rings A synonym ring connects a set of words that are defined as equivalent for the purposes of retrieval. In practice, these words are often not true synonyms. Synonym rings are a simple, useful form of vocabulary control. There is really no excuse for the conspicuous absence of this basic capability on many of today's largest web sites. 13

14 Example 1 computershopper.com Query “pocketpc” results – 0 Query “pocket PC” results – 100 (ROSENFELD & MORVILLE, 2006)

15 Example 2 europa.eu Query - Kingdom of Netherlands matched 7758 of 7758 documents. Query – Netherlands, Kingdom of matched 7758 of 7758 documents Query – Kingdom of Netherland matched 18 of 18 documents Query – Nederland matched 3748 of 411044 documents. Query – Netherland matched 410 of 411044 documents. Query - Koninkrijk der Nederlanden matched 24 of 24 documents. Query - Nederlanden, Koninkrijk der matched 24 of 24 documents Query – Kingdome of Nederland matched 24 of 24 documents

16 Synonym Ring NetherlandNederland Koninkrijk der Nederlanden Netherlands, Kingdom of Kingdom of Netherlands

17 Problems with Synonym Rings If the query term expansion operates behind the scenes, users can be confused by results that don't actually include their keywords. In addition, the use of synonym rings may result in less relevant results. This brings us back to the subject of precision and recall.

18 Precision and Recall Ratio

19 Solution  Good interface design and an understanding of user goals can help strike the right balance. For example, you might use synonym rings by default but order the exact keyword matches at the top of the search results list.  Or, you might ignore synonym rings for initial searches but provide the option to "expand your search to include related terms" if there were few or no results.

20 Synonim Rings In summary, synonym rings are a simple, useful form of vocabulary control. There is really no excuse for the conspicuous absence of this basic capability on many of today's largest web sites.

21 Authority Files Strictly defined, an authority file is a list of preferred terms or acceptable values. It does not include variants or synonyms. Authority files have traditionally been used largely by libraries and government agencies to define the proper names for a set of entities within a limited domain. In practice, authority files are commonly inclusive of both preferred and variant terms. In other words, authority files are synonym rings in which one term has been defined as the preferred term or acceptable value. 21

22 What relations between Synonym Rings and Autority files? 22

23 Classification Schemes We use classification scheme to mean a hierarchical arrangement of preferred terms. These days, many people prefer to use taxonomy instead. Either way, it's important to recognize that these hierarchies can take different shapes and serve multiple purposes, including: A frontend, browsable Yahoo-like hierarchy that's a visible, integral part of the user interface A backend tool used by information architects, authors, and indexers for organizing and tagging documents 23

24 Thesauri A controlled vocabulary in which equivalence, hierarchical, and associative relationships are identified for purposes of improved retrieval. Integrated within a web site or intranet to improve navigation and retrieval, shares a common heritage with the familiar reference text but has a different form and function. Like the reference book, is a semantic network of concepts, connecting words to their synonyms, homonyms, antonyms, broader and narrower terms, and related terms. 24

25

26 Types of thesauri 26

27 Thesauri A controlled vocabulary in which equivalence, hierarchical, and associative relationships are identified for purposes of improved retrieval. Integrated within a web site or intranet to improve navigation and retrieval, shares a common heritage with the familiar reference text but has a different form and function. Like the reference book, is a semantic network of concepts, connecting words to their synonyms, homonyms, antonyms, broader and narrower terms, and related terms. 27

28 Thesaurus Standards ISO 2788 (1974, 1985, 1986, International) BS 5723 (1987, British) AFNOR NFZ 47-100 (1981, French) DIN 1463 (19871993, German) ANSI/NISO Z39.19 (1994, 1998, 2005, United States)

29 Thesaurus Standards www.webmaps.ashcomp.net

30 Semantic Relationships

31 Equivalence 31

32 Equivalence The equivalence relationship is employed to connect preferred terms and their variants. "synonym management” equivalence is a broader term than synonymy. "equivalent for the purposes of retrieval." This may include o Synonyms o near-synonyms o Acronyms o Abbreviations o lexical variants, o and common misspellings;

33 Equivalence – Example car = auto = automobile = machine = motorcar

34 Equivalence The goal is to create a rich entry vocabulary that serves as a funnel, connecting users with the products, services, and content that they're looking for and that you want them to find.

35 Hierarchical 35

36 Hierarhical The hierarchical relationship divides up the information space into categories and subcategories, relating broader and narrower concepts through the familiar parent-child relationship There are many different ways to hierarchically organize any given information space (e.g., by subject, by product category, or by geography). faceted thesaurus supports the common need for multiple hierarchies. You also need to deal with the tricky issues of granularity, defining how many layers of hierarchy to develop.

37 Hierarchical Example Car (BT) Broader Term motor vehicle automotive vehicle (NT) Narrower Term Sports car Police car Race car

38 Associative 38

39 Associative Car RT (Related Term) Accelerator Truck Bumper Buffer

40 Preferred Terms Term form – Grammatical form – Spelling – Singular and plural form – Abbreviations and acronyms Term Selection Term Definition Term Specificity 40

41 Polyhierarchy 41

42 Faceted Classification 42

43 What is difference between traditional Classification systems and Facet Classification?

44 The old model asks the question, "Where do I put this?" It's more closely tied to our experience in the physical world, with the idea of one place for each item. In contrast, the faceted approach asks the question, "How can I describe this?"

45

46

47 marinemetadata.org/guides/vocabs/voctypes/vocca

48 Flat Vocabularies Authority FileAuthority File * Glossary * Dictionary * Gazetteer * Code ListGlossaryDictionaryGazetteerCode List All flat vocabularies contain a label and a value. These lists match the acceptable values with the appropriate metadata label.labelvaluesmetadata No relationships are established, no hierarchies are setup, and no complicated matrices are necessary.relationships Flat vocabularies are sets of three or four pieces of information - a label, a value, and possibly a definition and/or additional information.

49 Multi-Level Vocabularies Taxonomy * Subject Heading TaxonomySubject Heading A multi-level vocabulary is essentially a way to group terms into classes. A classification tells more about the terms by placing them into well-thought-out categories.classes In a multi-level vocabulary, you can examine in which category a term belongs, and you can examine the relationships between categories as well. In some Multi- Level Vocabularies, the only connection between categories is a narrower-than / broader-than comparison (taxonomy). In others, you can compare like categories across broader categories (subject heading).

50 Relationship List Thesaurus * Semantic Network * Ontology ThesaurusSemantic NetworkOntology Relational Vocabularies contain a mechanism to connect terms. The relations are prescribed by various standards and protocols. There are a variety of relationships described for thesauri in the ANSI/NISO Z39.19 - 2005 standard, including broader than, narrower than, used for, and related. In a relationship list, there are critical connections that are made in a standard way. Relational VocabulariesprotocolsthesauriANSINISO

51

52 Thank you!


Download ppt "Controlled Vocabularies Ilia State University, July 2010 Elisabeth Jijavadze, Natia Gabrichidze 1."

Similar presentations


Ads by Google