Presentation is loading. Please wait.

Presentation is loading. Please wait.

Module 7b: Extracting/Controlling Terms and Semantic Relationships IMT530: Organization of Information Resources Winter 2007 Michael Crandall.

Similar presentations


Presentation on theme: "Module 7b: Extracting/Controlling Terms and Semantic Relationships IMT530: Organization of Information Resources Winter 2007 Michael Crandall."— Presentation transcript:

1 Module 7b: Extracting/Controlling Terms and Semantic Relationships IMT530: Organization of Information Resources Winter 2007 Michael Crandall

2 IMT530- Organization of Information Resources2 Steps in Constructing CVs Define your domain Gather concepts –From user interviews, search logs, content analysis, preexisting vocabularies Select your approach Extract terminology Control your terms Organize your terms Maintain, maintain, maintain

3 IMT530- Organization of Information Resources3 Elements of Building CVs Select your approach –Pre- or post-coordinated (sixteenth century lute music or sixteenth century and lutes and music) –Open or closed (indexers can add terms or not) –Enumeration vs. synthesis (facets) Extract terms –Warrant (from users or domain or both) Control terms –Specificity (cats or Siamese cats?) –Control of homographs (qualifications) –Term consistency and word form (plurals, etc.) –Multiword/phrase sequence and form (inverted, normal form?) –Term definitions (scope notes) –Syntax (citation order) –Semantic factoring Organize terms –Semantic relationships

4 Extracting Terminology

5 IMT530- Organization of Information Resources5 Sources and Origins of Terminology Where do you get terms for a controlled vocabulary? Sources and origins of terminology may come from explicit statements of warrant Making a conscious decision about warrant demonstrates that as a CV designer you are aware of the different possibilities and have made considered choices

6 IMT530- Organization of Information Resources6 Warrant Warrant is “the authority that is used to justify decisions about what is included in a system,” (Clare Beghtol) Types of warrant: –Literary warrant –User warrant –Scholarly warrant –Cultural warrant (Beghtol, 2002)

7 IMT530- Organization of Information Resources7 Literary & User warrant Literary Warrant –terms or organization reflect or are taken directly from resources themselves; this includes dictionaries, encyclopedias, etc. on a topic User (aka Use, Enquiry) Warrant –terms or organization reflect use; user terminology may (or may not) be taken directly from logs of system use or from personal interactions with users

8 IMT530- Organization of Information Resources8 Scholarly & Cultural Warrant Scholarly Warrant –terms or organization reflect the opinions of a panel of human experts Cultural Warrant –terms or organization derived from cultural practice or understanding; for example, Dewey and LCSH reflect American/Western cultural bias; Colon Classification reflects Indian/Eastern cultural bias (this also can be partly a function of literary warrant…)

9 Term Control

10 IMT530- Organization of Information Resources10 Term control –Specificity (cats or Siamese cats?) –Control of homographs (qualifications) –Term consistency and word form (plurals, etc.) –Multiword/phrase sequence and form (inverted, normal form?) –Term definitions (scope notes) –Syntax (citation order) –Semantic factoring

11 IMT530- Organization of Information Resources11 Specificity Depends on user needs and time available Should be consistent throughout CV to avoid user confusion May be influenced by choice of approach –If faceted some facets may be more specific than others –If hierarchical you should be consistent throughout

12 IMT530- Organization of Information Resources12 Homographs Sometimes a single word or phrase has multiple meanings: e.g., “power”, “drum”, “Java”, “Jupiter” Controlled vocabularies “disambiguate” these terms to make each term have a single meaning –In thesauri & subject heading lists, parenthetical qualifiers are added, e.g. these LCSH terms “Power (Mechanics)”; “Power (Christian theology)”; “Power (Social Sciences)”; Power (Philosophy)” –In taxonomies and classifications, the meaning of homographs is contextualized by placement in a particular hierarchy (following the example above, Power will appear in the Philosophy, Christianity, Social Sciences, and Mechanics hierarchies and the terms themselves, by virtue of their location (thus, different notation), will be disambiguated)

13 IMT530- Organization of Information Resources13 Word Form Single word form should be consistent –Choose verbs or nouns –Singular or plural –Standard form Phrases should be standard form –Either direct (Constitutional government) –Or inverted (government, constitutional) Allows closer grouping of like terms in alphabetic display- not used much anymore

14 IMT530- Organization of Information Resources14 Scope Notes Scope notes are term definitions in a thesaurus or controlled vocabulary Scope notes are useful for indexers to let them know what the precise meaning of the term is; and for users to help them know if they are searching on the correct term

15 IMT530- Organization of Information Resources15 Syntax Syntax describes how terms are built (especially, how multiple concepts may be combined), and citation order (order of facets) –Syntax is an issue when concepts are pre- coordinated in an indexing term (whether the syntax is consistent or not) –Syntax is an issue for CVs that use synthesis with facets in that rules for synthesis (also called citation order in classification schemes) determine term syntax

16 IMT530- Organization of Information Resources16 Semantic Factoring “The process of analyzing some or all of the categories of an ontology into a collection of primitives” Sowa, J. F. (2003). Ontology. Glossary. http://www.jfsowa.com/ontology/gloss.htmhttp://www.jfsowa.com/ontology/gloss.htm Essentially, you are trying to decompose terms into their elemental concepts, to minimize duplication and maximize reuse –For example: ship = vehicle+water transport –Not always possible, especially with non-concrete concepts “Creating a thesaurus without doing semantic factoring is like trying to put together furniture from Ikea without following the instructions. You will get interesting configurations, but you will not save time.” Ezzo, J. (2005) Bella and Yakov and Tillie's Panties: What I Learned in “Construction and Maintenance of Indexing Languages and Thesauri” Bulletin of the American Society for Information Science and Technology 31(4) April/May 2005. http://www.asis.org/Bulletin/Apr-05/ezzo.htmlhttp://www.asis.org/Bulletin/Apr-05/ezzo.html

17 Relationships in CVs

18 IMT530- Organization of Information Resources18 Relationships in Controlled Vocabularies There are three major types of relationships between subject concepts –Equivalence Relationships –Hierarchical Relationships –Associative Relationships

19 IMT530- Organization of Information Resources19 Equivalence Relationships In natural language one word or phrase can refer to one or more concepts; and multiple terms can refer to a single concept In other words, there is no one-to-one correspondence between words/phrases and concepts

20 IMT530- Organization of Information Resources20 Preferred Terms and Cross references (Synonyms) Controlled vocabularies create one-to- one relationships between synonyms – multiple words or phrases that share similar meaning To do this we: –Select Preferred term (descriptor, subject heading) –Create cross references from non-preferred terms (entry vocabulary, lead-in terms)

21 IMT530- Organization of Information Resources21 Example Equivalence Display Sample display for descriptor (preferred term) “Creativity” from the ERIC Thesaurus: Creativity UF Creative ability Originality If you searched on “Originality” or “Creative ability” in the ERIC database, you would see these references: –“Creative ability” see “Creativity” OR –“Originality” use “Creativity” In other words, you would be led from the unused (lead-in) terms to the used (preferred) term.

22 IMT530- Organization of Information Resources22 Equivalence Relationships - Summary Exist between words or phrases that share the same (or similar) meaning Equivalent terms are considered synonymous (whether they actually are or are not) When controlling vocabulary, one equivalent term is selected as a preferred term (e.g., descriptor); the other equivalent terms are treated as “lead in” terms or cross references References used in the CV to show equivalence relationships include: “UF” (use for); and “Use” “See”; and “Search under”

23 IMT530- Organization of Information Resources23 Hierarchical Relationships Hierarchical Relationships: –May be strictly defined as: Genus-species (also called class inclusion or “is-a”) relationships Whole-part relationships (sometimes these are treated as associative relationships)

24 IMT530- Organization of Information Resources24 Hierarchical Relationships Hierarchical Relationships: –May be illustrated by set notation: Set G (green) is a subset of Set B (blue) –All Gs are also Bs (in other words, a G is a B) –Using a real-world analogy, if Gs are gorillas, and Bs are animals, all gorillas are animals

25 IMT530- Organization of Information Resources25 Ideal CV Hierarchical Relationships Ideally, all hierarchical relationships indicated in a controlled vocabulary are also controlled and defined as genus- species (and sometimes also whole- part) relationships ALL other relationships between terms are associative relationships In real life CVs, this is not always the case!

26 IMT530- Organization of Information Resources26 References for Hierarchical Relationships Hierarchically related terms are shown by the BT (broader term), NT (narrower term), and sometimes See also/Search also references. Examples of two entries in the ERIC thesaurus: Creativity BT Psychological characteristics Psychological characteristics NT Creativity Intelligence Cognitive style

27 IMT530- Organization of Information Resources27 BTs & NTs In the previous slide, both Creativity and Psychological characteristics are preferred terms Each has its own display; the Creativity display (Creativity as a preferred term display) shows the reference to the broader, preferred term “Psychological characteristics”

28 IMT530- Organization of Information Resources28 Testing for Hierarchical Relationships To test for a hierarchical relationship between terms, use the ‘is-a’ test. The relationship between “robin” and “bird”? (A robin is a (type of) bird, so the relationship is hierarchical; Bird is the broader term, Robin is the narrower) The relationship between Water and Hydronomy? (Water is not a hydronomy or a type of hydronomy; Hydronomy is not a water or a type of water; so the relationship here is an associative relationship)

29 IMT530- Organization of Information Resources29 Examples of Hierarchical Relationships What is the relationship between these sets of terms? –books and library materials –water and floods –buildings and chimneys –painting and acrylic paints –water and groundwater

30 IMT530- Organization of Information Resources30 Answers Books and Library materials (hierarchical) Water and floods (associative because a flood is not the same type of thing as water--one way you can tell is that one is a count noun, and the other is not--but maybe hierarchical is ok depending on context) Buildings and chimneys (hierarchical if you include whole-part relationships; associative if you don’t) Painting and acrylic paints (associative) Water and ground water (hierarchical)

31 IMT530- Organization of Information Resources31 More on Hierarchical Relationships A characteristic of the hierarchical relationship between terms that are strictly hierarchically related (genus-species only, not whole part) is Hierarchical Force When a narrower term is hierarchically related to a broader term, the narrower terms (NT) inherits all of the characteristics of the terms above it in a hierarchy

32 IMT530- Organization of Information Resources32 Associative Relationships Include all relationships not encompassed by equivalence and hierarchical relationships In Controlled Vocabularies, these relationships are shown by the following references: –Related Term (RT), see also (SA) Examples of types of associative relationships (there are many of these!): –Thing and property (rubber, elasticity) –Complementary activities (teaching, learning) –Agent and activity (artist, painting)

33 IMT530- Organization of Information Resources33 Associative Relationships Many of these are semantic relationships Some of these are syntactic relationships too: –Children see related term Games Problems – when to stop? How close in meaning or syntactic relation do two terms have to be to show them in a CV? Note: associative relationships are rarely shown in classifications & taxonomies

34 IMT530- Organization of Information Resources34 Example Associative Relationship Display From the ERIC thesaurus: Comprehension RT Concept formation Misconceptions Scientific literacy Thinking skills Again, remember that both Comprehension and all of the RTs are preferred terms; however, this is the display for the preferred term Comprehension

35 IMT530- Organization of Information Resources35 Some Guidelines Does the taxonomy cover the domain appropriately? Is it within scope? Do draft definitions for concepts express them clearly? Are duplicate concepts removed? Are basic-level concepts represented? Does extracted terminology express them? Is the structure useful and sensible?

36 IMT530- Organization of Information Resources36 Questions? If not, take a break!!!

37 IMT530- Organization of Information Resources37 Exercise 7b Take your term lists from last week, and use those in Exercise 7b to begin building a controlled vocabulary Turn in your initial controlled vocabularies before Tuesday via email


Download ppt "Module 7b: Extracting/Controlling Terms and Semantic Relationships IMT530: Organization of Information Resources Winter 2007 Michael Crandall."

Similar presentations


Ads by Google