Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multiple Classification of Content with TopicMaps Improving Navigation Accuracy and Achieving Serendipity While Retaining Classification Accuracy through.

Similar presentations


Presentation on theme: "Multiple Classification of Content with TopicMaps Improving Navigation Accuracy and Achieving Serendipity While Retaining Classification Accuracy through."— Presentation transcript:

1 Multiple Classification of Content with TopicMaps Improving Navigation Accuracy and Achieving Serendipity While Retaining Classification Accuracy through XTM Steve Carton November 17, 2004 Retrieval Systems Corporation

2 “classification” The act of forming into a class or classes; A distribution into groups, as classes, orders, families, etc., according to some common relations or affinities. Webster's Revised Unabridged Dictionary, © 1996, 1998 MICRA, Inc.

3 Content Classification Presentation Topical indexes are very common Can include multiple levels Inferior levels qualify superior terms The entire tree is required to properly classify content. Used in print or electronic content Navigation includes browsing and searching techniques.

4 Some Problems What online and print methods add in completeness (and even retrieval speed), they tend to lose in serendipity. Can’t usually see how a common concept is used in several hierarchies. How do I get from “Tchaikovsky” to “Tzaikovsky” It is harder to “stumble” into the content you want, especially as the size of the content and classification scheme grows. It isn’t easy to see that “breast cancer” appears under “alternative medicine>cancer” and also under “women's health” in an index.

5 Consider These Topical Index Hierarchies Let’s not pick too much on the EPA!

6 EPA Topical Index

7 Human Health > Exposure

8 Human Health > Exposure > Route

9 Radiation And Radioactivity > Exposure

10 Radiation And Radioactivity > Exposure > Pathways

11 Multiple Parents Multiple Children We can see that “Exposure” is a child in two index hierarchies. –Human Health –Radiation and Radioactivity And we can see that there are children of exposure. –Exposure Pathways –Exposure Route And these are (arguably) the same concept.

12 Serendipity “Luck, or good fortune, in finding something good accidentally.” Webster’s New World College Dictionary, Fourth Edition. New York: MacMillan,

13 Consider These Topical Index Hierarchies Human Health Human Health > Exposure Human Health > Exposure > Exposure Route Radiation And Radioactivity Radiation And Radioactivity > Exposure Radiation And Radioactivity > Exposure > Exposure Pathways

14 “Can’t get thaya from hea” Suppose the user only knows the concept “exposure.” If we started down the path of “Human Health”, we might never stumble into “Radiation and Radioactivity.”

15 A More Complex Example Consider some more complex topical hierarchies: alternative medicine>cancer>breast cancer>therapies women's health>breast cancer>treatments radiation treatments>breast cancer>different types chemotherapy>breast cancer>different types breast cancer>treatments>chemotherapy breast cancer>treatments>radiation breast cancer>treatment>surgery breast cancer>treatment>medication It seems clear that “browsing” through any of these hierarchies should meaningfully allow access to any of the others.

16 Other Ways In fact, EPA has a “Keyword Index” and a “Keyword Search” to the Topical Index solving the problem for simple cases – like “Exposure.”

17 EPA Keyword Index

18 EPA Keyword Search

19 But there are problems… These fail when the common concept cannot be identified by a computer as common forms of the same concept. A search for “therapies” won’t find “treatments” Topic Maps provide an answer through variant base names and associations between topics.

20 Let’s take a look at the content… We’ve identified several topical hierarchies: 1.Human Health > Exposure 2.Human Health > Exposure > Exposure Route 3.Radiation And Radioactivity > Exposure 4.Radiation And Radioactivity > Exposure > Exposure Pathways There is no content indexed by the first level topical terms.

21 1. Indexed Content

22 2. Indexed Content

23 3. Indexed Content

24 4. Indexed Content

25 Content Types We can glean four content “types”: 1.Guides – Information overview pages 2.Databases – Databases at EPA website 3.Fact Sheets – Factual summaries 4.Reports – published reports Very simple, single-level classifications. Only one content type per content item.

26 Content Types, Applied Fact Sheets –Health Effects Notebook for Hazardous Air Pollutants –Chemicals in the Environment: OPPT Chemical Fact Sheets Guides –Understanding Radiation: Exposure Pathways –Effects of Radiation Type and Exposure Pathway –Emergency Response Program: Exposure Pathways Databases –IRIS: Integrated Risk Information System –Human Exposure Database System (HEDS) Reports –Draft Report on the Environment: Human Health: Environmental Pollution and Disease

27 So, What Meta Data Do We Have? IRIS: Integrated Risk Information System A database of human health effects that may result from exposure to various substances found in the environment. Database Radiation And Radioactivity > Exposure Human Health > Exposure

28 An Important Note We would like to navigate within the hierarchy by individual terms or phrases. But, the relationship between a content item and its classification is not just between the item and the leaf of the hierarchy, it is between the item and the entire hierarchy. In our example content, it isn’t enough to know that “Understanding Radiation: Exposure Pathways” is classified under “Exposure Pathways.” That isn’t completely accurate – it is classified under: “Radiation And Radioactivity > Exposure > Exposure Pathways” This is important in preserving the intellectual work applied in classifying that content item.

29 Content, Summarized 1. IRIS: Integrated Risk Information System A database of human health effects that may result from exposure to various substances found in the environment. TYPE: Database TOPICS: Radiation And Radioactivity > Exposure Human Health > Exposure 2. Health Effects Notebook for Hazardous Air Pollutants The fact sheets available on this Web page describe the effects on human health of substances that are defined as hazardous. TYPE: Fact Sheet TOPICS: Radiation And Radioactivity > Exposure Human Health > Exposure 3. Understanding Radiation: Exposure Pathways This page provides information about different exposure pathways to radiation. TYPE: Guide TOPICS: Radiation And Radioactivity > Exposure Radiation And Radioactivity > Exposure > Exposure Pathways 4. Draft Report on the Environment: Human Health: Environmental Pollution and Disease There is an association between environmental exposure and certain diseases. TYPE: Report TOPICS: Radiation And Radioactivity > Exposure Human Health > Exposure 5. Effects of Radiation Type and Exposure Pathway The type of radiation to which a person is exposed and its exposure pathway influences health effects. TYPE: Guide TOPICS: Radiation And Radioactivity > Exposure > Exposure Pathways 6. Chemicals in the Environment: OPPT Chemical Fact Sheets These fact sheets provide a brief summary of information on selected chemicals. TYPE: Fact Sheet TOPICS: Human Health > Exposure 7. Human Exposure Database System (HEDS) An integrated database system that contains chemical measurements, questionnaire responses, documents, and other information related to EPA research studies of the exposure of people to Environmental contaminants. TYPE: Database TOPICS: Human Health > Exposure 8. Emergency Response Program: Exposure Pathways An exposure pathway refers to the way in which a person may come into contact with a hazardous substance. TYPE: Guide TOPICS: Human Health > Exposure > Exposure Route

30 Constructing a Topic Map Requirements – Our Topic Map must: –Allow free navigation between all elements of the hierarchy. –Navigate from a “leaf” to a content item. –Retain intellectual classification of content items. –Permit hierarchy navigation to be limited by a particular content type. TM Experts please note the lack of PSIs – they are left out for brevity. A real-world application would have PSIs for each topic class.

31 Three Topic Types (and a Scope) Topical Index Phrases – representing the individual terms or phrases making up the topical index hierarchy. Content Types – representing the unique types of content. Contents – representing individual content items. Content Description – specifying the scoped base name for the description metadata.

32 XTM Topic Classes Topical Index Phrases Content Types Contents Content Description

33 Topical Index Topic Examples Human Health Exposure Exposure Route

34 Topic Type Topic Examples Guide Database

35 A Content Topic Example IRIS: Integrated Risk Information System A database of human health effects that may result from exposure to various substances found in the environment. Remember our metadata? Note the “TITLE”, “DESCRIPTION” and “OCCURENCE” TITLE DESCRIPTION OCCURENCE

36 What about Associations? Three Association Types: 1.Topical Index – tying the topics of the topical index together. 2.Content Type to Content – mapping the content type to the content item. 3.Topical Index to Content – binding the leaf of the topical index to the content item.

37 Roles… Two Member Types to Create the Topical Index: –Superior Phrase – a parent index item –Inferior Phrase – a child index item

38 Topical Index Association Example

39 More Roles… Two Member Types to Create the Content Type to Content Associations: –Content Types – the content type topic being referenced. –Content Item – a content item being referenced.

40 Content Type to Content Association Example Note that the “ContentType” and “Content” classes are being reused here.

41 And Yet More Roles… Four Member Types to Create the “Topical Index to Content” Associations: –Topical Index Leaf – a leaf node, referencing content items. –Content Item – a content item being referenced. –Top Level Topical Index Phrase – used within a “Topical Index to Content” association to indicate the top of the classification hierarchy for this content item. –Second Level Topical Index Phrase – used within a “Topical Index to Content” association to indicate the second level of the classification hierarchy for this content item.

42 Huh? Here’s the rub – we want to make a link from a leaf node to any content referenced under that node. So, “Exposure” will point us to six content items. But, three are classified under “Radiation And Radioactivity > Exposure” The other three are classified under “Human Health > Exposure” So, in addition to creating members for the leaf role and the content item role, we also create members for each level of the hierarchy above this leaf node.

43 Topical Index to Content Association Example This says that there is a relationship between the leaf term “I2” (Exposure) and the content item “C2” (Health Effects Notebook for Hazardous Air Pollutants), and that this content item is fully indexed as “I4” > “I2” (Radiation And Radioactivity > Exposure).

44 Building an Interface Interface Requirements: –A browsing tool for the Topical Index Hierarchy –A searching tool –Ability to show Topical Index terms with superior and inferior topics navigable. –Ability to limit the display of the Topical Index Hierarchy by specific content types. –Ability to link to content items.

45 Tools Ontopia Knowledge Suite (OKS) Framework Retrieval Systems Tractare CMS

46 Application Framework

47 Topical Index Browsing Wow! Maybe Radiation is more on point… At this point, we’ve turned the traditional hierarchy into a navigable network – serendipity is improved!

48 Content Type Access Note Original Topical Index Classifications

49 Associated Content Note Original Topical Index Classifications

50 Scoped Associated Content Content under “Exposure”, limited to “Fact Sheets”

51 In Conclusion We’ve shown how to construct a topic map to support these requirements: –Allow free navigation between all elements of the hierarchy. –Navigate from a “leaf” to a content item. –Retain intellectual classification of content items. –Permit hierarchy navigation to be limited by a particular content type. We’ve built a demonstration application to deliver these requirements: –A browsing tool for the Topical Index Hierarchy –A searching tool –Ability to show Topical Index terms with superior and inferior topics navigable. –Ability to limit the display of the Topical Index Hierarchy by specific content types. –Ability to link to content items.

52 In Short We can achieve serendipity while retaining classification accuracy by using topic maps to represent traditional topical indices. Thank you for your time and attention.


Download ppt "Multiple Classification of Content with TopicMaps Improving Navigation Accuracy and Achieving Serendipity While Retaining Classification Accuracy through."

Similar presentations


Ads by Google