Presentation is loading. Please wait.

Presentation is loading. Please wait.

Controlling values for information organization 384C – Organizing Information Spring 2016 Karen Wickett School of Information University of Texas at Austin.

Similar presentations


Presentation on theme: "Controlling values for information organization 384C – Organizing Information Spring 2016 Karen Wickett School of Information University of Texas at Austin."— Presentation transcript:

1 Controlling values for information organization 384C – Organizing Information Spring 2016 Karen Wickett School of Information University of Texas at Austin

2 The vocabulary problem What is this?

3 Synonymy Restroom, bathroom, toilet, ladies' room, mens' room, the facilities, WC,... Synonymy: Using different words to identify the same concept

4 Another vocabulary problem What is mercury? What is bank? What is python? What is java?

5 Polysemy Polysemy: Using the same word* to identify different concepts – *morphologically speaking Java: – Island in Indonesia – variety of coffee bean – generic term for coffee – object-oriented programming language

6 More vocabulary problems The White House has been lobbying Congress to support the proposed budget... Freedom of the press is an important value in the United States... I'm tired of taking the bus; I need some new wheels...

7 Metonymy and synecdoche Metonymy: Using a related concept to stand for another concept. Synecdoche: Using the word for part of something to stand for the entire thing.

8 Do people apply terms consistently? No. Furnas and colleagues asked people (including subject experts) to label a variety of items – recipes, text editing operations, content objects There was little agreement on the names submitted by participants Conclusion: "The idea of an 'obvious,' 'self-evident,' or 'natural term is a myth! Since even the best possible name is not very useful, it follows that there can exist no rules, guidelines or procedures for chosing a good name, the sense of 'accessible to the unfamiliar user."

9 What, then, shall we do? Furnas and friends suggest that interface designers: – Implement unlimited aliasing – Disambiguate terms that can be used in multiple senses by presenting possiblities to users and asking them to select the appropriate one

10 Limitations of the Furnas study Partipants were asked to label objects, not how they would search for objects The study assumes a search interface, not a browsing (or menu-driven) interface – In a search interface, users must recall or guess an object's name – In a browsing interface, users merely need to recognize the appropriate term

11 Vocabulary problems and information systems Designers of information organization systems have long grappled with the ambiguities of language Synonymy, polysemy, etc, all complicate the goal to collocate—bring together—similar items in an information system

12 Vocabulary control In LIS, vocabulary control is similar to Furnas's idea of aliasing – concepts are associated with their synomyms One term is designated as preferred: – this is the term used in a display – other labels associated with the concept are used in searching – e.g. Search Nordstrom.com for "frock" results are returned for "dress"

13 Example of a controlled term Preferred term: bathroom Equivalent terms: restroom, loo, toilet, tiolet, WC, ladies' room, ladys room, lady's room,...

14 Equivalence can be relative Similar concepts may be treated as equivalent – this is a design decision by the vocabulary creator Example – Vocabulary includes the preferred term: Beer – Terms designated as equivalent: ale, porter, stout, pilsner, bock, IPA, malt liquor, barley wine

15 Disambiguation in vocabularies Polysemous terms are often identified by added qualifying terms in parentheses – Mercury (chemical element) – Mercury (god in Greek mythology) Search engines may ask users to select the sense they want

16 Library access and control Library catalogs have three traditional access points: author, title, subject In the card catalog system, these were the three ways that users could search Each of these access points has associated vocabulary control

17 Access points From Bearman's Power of the Principle of Provenance: – a characteristic which can be used in conjunction with other characteristics to identify a set of objects for examination an essential guiding question in the design and implementation of an information organization system: – which characteristics will prove most discriminating and most useful to searchers 17

18 Control of names In library cataloging, controlled vocabularies for authors, titles, and subjects are called authority files. Authority files – disambiguate names that identify multiple people or items – and group variations for the same person or item – i.e. they manage (control) both polysemy and synonymy

19 Authority file examples In the UT author authority file headings for Patricia Williams Names are disambiguated by using middle initials and dates of birth Cross references are used for some authors There may still be two headings for one person

20 Pseudonyms in the catalog The current catalog maintains pseudonymous identities (in older catalogs, everything went under the author's real name). For example, – "Carolyn Keene" is maintained as an author entity in the authority file – https://en.wikipedia.org/wiki/Carolyn_Keene https://en.wikipedia.org/wiki/Carolyn_Keene

21 Thesauri Thesauri are a type of controlled vocabulary that include relationships between terms – equivalence, hierarchical, associative Thesauri can also be faceted – i.e. represent multiple aspects of a concept Thesauri are often developed to deal with subjects of documents – we will dive into this in depth in a few weeks

22 Example thesaurus entry Dark chocolate BT Chocolate RT Single-origin chocolate UFSemisweet chocolate Baker's chocolate Sweet chocolate SNChocolate without milk solids and with less than 70 percent chocolate mass

23 Thesaurus abbreviations BT: broader term, one level up in a hierarchy RT: related term, in another facet or hierarchical branch UF: Use For; synonyms, or non-preferred terms SN: Scope note: definitions or usage guidelines.

24 Example: MeSH and PubMed The Medical Subject Headings (MeSH) index journal articles for the PubMed database Keyword searches in PubMed are automatically expanded with MeSH. – Searches can also be explicitly limited to MeSH terms, which can increase precision Before next week: experiment with searches in PubMed with MeSH, and compare to a search interface like Google Scholar.

25 More examples Entry for Austin, TX in TGN AAT

26 Summing up Controlled vocabularies increase precision and recall in searching by identifying equivalent terms Authority files are types of controlled vocabularies Thesauri are subject-based controlled vocabularies that include hierarchical and associative relationships in additionto equivalence relationships – Thesauri can also be used as browsing interfaces


Download ppt "Controlling values for information organization 384C – Organizing Information Spring 2016 Karen Wickett School of Information University of Texas at Austin."

Similar presentations


Ads by Google