Presentation is loading. Please wait.

Presentation is loading. Please wait.

Semantics-Empowered Text Exploration for Knowledge Discovery Delroy Cameron, Pablo N. Mendes, Amit P. Sheth Knowledge Enabled Information and Services.

Similar presentations


Presentation on theme: "Semantics-Empowered Text Exploration for Knowledge Discovery Delroy Cameron, Pablo N. Mendes, Amit P. Sheth Knowledge Enabled Information and Services."— Presentation transcript:

1

2 Semantics-Empowered Text Exploration for Knowledge Discovery Delroy Cameron, Pablo N. Mendes, Amit P. Sheth Knowledge Enabled Information and Services Science Center (Kno.e.sis) Department of Computer Science and Engineering Wright State University Dayton, OH Victor Chan Division of Biosciences and Performance Human Effectiveness Directorate Air Force Research Lab (AFRL) Wright-Patterson Air Force Base Dayton, OH 48 th ACM Southeast Conference. ACMSE 2010. Oxford, Mississippi. April 15-17, 2010.

3 OUTLINE  Background  Paradigm Shift  Demo  Architecture  Experimental Results  Future Work  Conclusion 3

4 BACKGROUND  IR Systems - Interaction Paradigm Manually seek information Hyperlinked Documents Document-Centric Model  Basis - Interaction Paradigm Keyword Search Document Browsing 4

5 S BACKGROUND  Interaction Sequence 1. Assemble Keywords and Search 2. Document Selection 3. Document Inspection 4. Aggregation/Organization 5 Information Need What is the role of Magnesium in relation to Migraine? Magnesium migraine Search

6 LIMITATIONS  Query Reformulations Impatient users Recognition over Recall  Constrained navigation Hyperlink dependent - apriori  Fuzzy User Interests Haiti Earthquake – Recovery, Relief, Political Climate, Crime  Ineffective for Exploratory Search Search-and-Sift Query: Father of the Web Answer: Sir Tim Berners-Lee Amit P. Sheth, Cartic Ramakrishnan: Relationship Web: Blazing Semantic Trails between Web Resources. IEEE Internet Computing 11(4): 77-81 (2007)

7 MOTIVATION  Users are A priori hyperlink dependent  Semantic Web Standards Entity Identification (Semantic Annotations) Relationship and Triple Identification Explore documents/information via relationships information seekers Informationdocumentsis embedded in 7

8 PARADIGM SHIFT Search Hit > Annotated Hit  Bag of annotated words/phrases  Annotated phrase is known entity  Entity is Subject/Object of Triple Navigation driven by relationships Entity[Document]RelationshipEntity[Document]  Entity[Document]  Relationship  Entity[Document] Contextual Navigation (relationships as context) 8

9 CONTRIBUTIONS 1. Novel Information Exploration Paradigm Data-Centric Model 2. Demonstrate use of background knowledge Named Entities, Relationships 3. Prototype Implementation Semantic annotations for navigation 4. Aggregation Utilities Saving, bookmarking, publishing etc 9

10 DEMO 10

11 Trie-based Spotter for Named Entity Identification used ultimately for document annotation Semantic Browser Controlled Vocabulary 992,281 DBpedia terms 15,742 HPCO terms 5,232 UMLS terms Controlled Vocabulary 992,281 DBpedia terms 15,742 HPCO terms 5,232 UMLS terms Medline (19 million Abstracts) Medline (19 million Abstracts) Spotter Module Document Corpus Linked Open Data SavePublishOrganize Utilities provided for promoting, bookmarking, and saving search results Search Workbench (SERP) Annotated entities provide anchors that serve as entry points to navigation Semantic Trail Log Sequential record of each triple navigated by a user Yahoo (indexed documents accessed as a Web Service using Yahoo Search Boss) Yahoo (indexed documents accessed as a Web Service using Yahoo Search Boss) Articles saved using Lucene. Indexed as of Aug. 2009 Figure 1: System Components and Architecture ARCHITECTURE 1 2 1 2 3 4 3 4 5 6 7 8 Background Knowledge HCPO Ontology UMLS

12 IMPLEMENTATION Spotter Module Dietary restriction with hypomagnesia is normally associated with diminished urinary excretion. magnesium UMLS Controlled Vocabulary Entity LabelPubMed ID Magnesium Deficiency C0024473 Dietary restriction with hypomagnesia C0024467Magnesium EntityID: This process is called Spotting and uses a Trie data structure. 12 magnesium

13 ARCHITECTURE  Document Corpus Medline Lucene Index - 19 million abstracts Aug 2009. REST Endpoint: http://knoesis1.wright.edu/IndexWrapper http://knoesis1.wright.edu/IndexWrapper XML Response (or JSON) Keyword queries, Document IDs  Background Knowledge UMLS (Unified Medical Language System)  5,232 entities and 16,540 triples HPCO (Human Performance & Cognition Ontology)  15,742 entities and 22,298 triples 13

14 Rank Feature on [1-5] scale Normalized Relative Aggregated Scores EVALUATION Evaluation Metrics Search User Interfaces Semantic Browser (Medline + UMLS) PubMedYahoo Interface Design0.930.881.00 Useful Features1.000.670.65 Motivation to Explore1.000.580.65 Information Novelty1.000.760.79 Effectiveness of Task outcome1.000.650.80 Required Cognitive Load1.000.600.64 Overall Satisfaction1.000.620.78 14

15 CONCLUSION Novel Information Exploration Paradigm Semantic Browser support Contextual Navigation Identify Named Entities and Relationships Provide Semantic Annotations Utilities for Aggregation Semantic Trails to Knowledge Discovery 15

16 x Formal Model for Paradigm Shift Improved Spotter – Additional Vocabularies, Context, Rule Based Relationship Ranking Document Re-ranking Trail Logs Analysis FUTURE WORK 16

17 ACKNOWLEDGEMENTS People Cartic Ramakrishnan Bilal Gonen, Aditya Dhoke Wesley Workman, Rodrigo Gama, Guilherme de Napoli Air Force Research Lab Human Effectiveness Directorate Wright-Patterson Air Force Base National Science Foundation Award SemDis: Discovering Complex Relationships in the Semantic Web. No. 071441 Wright State University No. IIS-0325464 to University of Georgia 17

18 QUESTIONS 18

19 Semantic Web extension of the current web common vocabulary machine processable Semantic Web – is an extension of the current web in which data is expressed in a common vocabulary making such that the data becomes machine processable. Ontology conceptsrelationships Ontology – is a specification of concepts and relationships between them. Triple subject-predicate-object Triple - a ternary relation containing an entity pair and a relationship that expresses the link between them i.e. subject-predicate-object Entity/Concept thing Entity/Concept – an instance of a thing URI URI – a unique identifier for any resource/entity/thing on the web LOD LOD - a semantic web initiative to provide a repository of semantically connected datasets TERMINOLOGY 19


Download ppt "Semantics-Empowered Text Exploration for Knowledge Discovery Delroy Cameron, Pablo N. Mendes, Amit P. Sheth Knowledge Enabled Information and Services."

Similar presentations


Ads by Google