Download presentation
Presentation is loading. Please wait.
Published byEstefany Curley Modified over 9 years ago
1
Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas 1
2
Knowledge Enabled Information and Services Science Overview Knowledge Representation –GlycO – Complex Carbohydrates domain ontology Information Extraction –Taxonomy creation (Doozer/Taxonom.com) –Fact Extraction (Doozer++) Validation 2
3
Knowledge Enabled Information and Services Science Circle of knowledge on the Web 3
4
Knowledge Enabled Information and Services Science 4
5
Circle of knowledge on the Web 5 What is knowledge? How do we turn propositions/belie fs into knowledge? How do we acquire knowledge?
6
Knowledge Enabled Information and Services Science Background Knowledge [15] Christopher Thomas and Amit Sheth, “On the Expressiveness of the Languages for the Semantic Web–Making a Case for ‘A Little More,’”in Fuzzy Logic and the Semantic Web, Eli Sanchez (Ed.), Elsevier, 2006. [11] Amit Sheth, Cartic Ramakrishnan, and Christopher Thomas, “Semantics for The Semantic Web: the Implicit, the Formal and the Powerful,”International Journal on Semantic Web & Information Systems, 1 (no. 1), 2005, pp. 1–18. 6
7
Knowledge Enabled Information and Services Science Different Angles Social construction –Large scale creation of knowledge vs. –Small communities define their domains Normative vs. Descriptive =Top-Down vs. Bottom-Up Formal vs. Informal =Machine-readable vs. human-readable 7
8
Knowledge Enabled Information and Services Science Community-created knowledge Descriptive Bottom-up Formally less rigid May contain false information If a statement in the world is in conflict with the Ontology, both may be wrong or both may be right Good for broad, shallow domains Good for human processing and IR tasks 8
9
Knowledge Enabled Information and Services Science Wikipedia and Linked Open Data Created by large communities Constantly growing Domains within the linked data are not always easily discernible Contain few axioms and restrictions –Little value to evaluation using logics 9
10
Knowledge Enabled Information and Services Science Formal - Modeling deep domains Prescriptive / Normative Top-down Contains “true knowledge” If a statement in the world is in conflict with the Ontology, the statement is false Good for scientific domains Good for computational reasoning/inference Usually created by small communities of experts Usually static, little change is expected 10
11
Knowledge Enabled Information and Services Science Example: GlycO Created in collaboration with the Complex Carbohydrate Research Center at the University of Georgia on an NCRR grant. Deep modeling of glycan structures and metabolic pathways [6] Christopher Thomas, Amit P. Sheth, and William S. York, “Modular Ontology Design Using Canonical Building Blocks in the Biochemistry Domain,”in Formal Ontology in Information Systems (FOIS 2006) [5] Satya S. Sahoo, Christopher Thomas, Amit P. Sheth, William York, and Samir Tartir, “Knowledge Modeling and Its Application in Life Sciences: A Tale of Two Ontologies,”15th International World Wide Web Conference (WWW2006), 11
12
Knowledge Enabled Information and Services Science GlycO 12
13
Knowledge Enabled Information and Services Science N-Glycosylation metabolic pathway GNT-I attaches GlcNAc at position 2 UDP-N-acetyl-D-glucosamine + alpha-D-Mannosyl-1,3-(R1)-beta-D-mannosyl-R2 UDP + N-Acetyl-$beta-D-glucosaminyl-1,2-alpha-D-mannosyl-1,3-(R1)-beta-D-mannosyl-$R2 GNT-V attaches GlcNAc at position 6 UDP-N-acetyl-D-glucosamine + G00020 UDP + G00021 N-acetyl-glucosaminyl_transferase_V N-glycan_beta_GlcNAc_9 N-glycan_alpha_man_4 13
14
Knowledge Enabled Information and Services Science N. Takahashi and K. Kato, Trends in Glycosciences and Glycotechnology, 15, 2003: 235-251 b-D-Manp-(1-6)+ | b-D-Manp-(1-4)-b-D-GlcpNAc-(1-4)-D-GlcNAc | b-D-Manp-(1-3)+ Glycan Structures for the ontology Import structures from heterogeneous databases Possible connections modeled in the form of GlycoTree Match structures to archetypes 14
15
Knowledge Enabled Information and Services Science Interplay of extraction and evaluation Errors in the source databases are propagated through various new databases comparing multiple sources fails for error correction Less than 2% of incorrect information makes a database useless for automatic validation of hypotheses The ontology contains rules on how carbohydrate structures are known to be composed By mapping information in databases to the ontology and analyzing how successful the mapping was, we can identify possible errors. 15
16
Knowledge Enabled Information and Services Science Database Verification using GlycO N. Takahashi and K. Kato, Trends in Glycosciences and Glycotechnology, 15, 2003: 235-251 b-D-Manp-(1-6)+ | a-D-Manp-(1-4)-b-D-GlcpNAc-(1-4)-D-GlcNAc | b-D-Manp-(1-3)+ a-D-Manp-(1-4) is not part of the identified canonical structure for N-Glycans, hence it is likely that the database entry is incorrect 16
17
Knowledge Enabled Information and Services Science Pathway Steps - Reaction Evidence for this reaction from three experiments Pathway visualization tool by M. Eavenson and M. Janik, LSDIS Lab, Univ. of Georgia 17
18
Knowledge Enabled Information and Services Science 18
19
Knowledge Enabled Information and Services Science Summary - GlycO The amount of accuracy and detail that can be found in ontologies such as GlycO could most likely not be acquired automatically Only a small community of experts has the depth of knowledge to model such scientific ontologies 19
20
Knowledge Enabled Information and Services Science Summary - GlycO However, the automatic population shows that a highly restrictive, expert-created rule set allows for automation or involvement of larger communities. Frame-based population of knowledge The formal knowledge encoded in the ontology serves to acquire new knowledge The circle is completed 20
21
Knowledge Enabled Information and Services Science Summary Background Knowledge Large amounts of information and knowledge are available Some machine readable by default Others need specific algorithms to extract information The more available information we can use, the better the extraction of new information will be. 21
22
Knowledge Enabled Information and Services Science Circle of knowledge on the Web 22 What is knowledge? How do we turn propositions into knowledge? Part 2 How do we acquire knowledge?
23
Knowledge Enabled Information and Services Science Model Creation [3] Christopher Thomas, Pankaj Mehra, Roger Brooks and Amit Sheth. Growing Fields of Interest -Using an Expand and Reduce Strategy for Domain Model Extraction. Web Intelligence 2008, pp. 496-502 [2] Christopher Thomas, Wenbo Wang, Delroy Cameron, Pablo Mendes, Pankaj Mehra and Amit Sheth, What Goes Around Comes Around - Improving Linked Open Data through On-Demand Model Creation, WebScience 2010 [1] Christopher Thomas, Pankaj Mehra, Wenbo Wang, Amit Sheth, Gerhard Weikum and Victor Chana Automatic Domain Model Creation Using Pattern-Based Fact Extraction, Knoesis Center technical report. Knowledge Acquisition through [3] [2] [1] 23
24
Knowledge Enabled Information and Services Science First create a domain hierarchy Example: a hierarchy for the domain of Human Performance and Cognition 24
25
Knowledge Enabled Information and Services Science Connect with learned facts 25
26
Knowledge Enabled Information and Services Science Example: strongly connected component 26
27
Knowledge Enabled Information and Services Science Excerpt: strongly connected component 27
28
Knowledge Enabled Information and Services Science Expert evaluation of facts in the ontology 7-9: Correct Information not commonly known 1-2: Information that is overall incorrect 3-4: Information that is somewhat correct 5-6: Correct general Information 28
29
Knowledge Enabled Information and Services Science Technical Details 29
30
Knowledge Enabled Information and Services Science Domain hierarchy creation Input terms e.g. related to Human Performance and Cognition Hierarchy is automatically carved from articles and categories on Wikipedia Step 1 30
31
Knowledge Enabled Information and Services Science Overview - conceptual Expand and Reduce approach –Start with ‘high recall’ methods Exploration - Full text search Exploitation – Node Similarity Method Category growth –End with “high precision” methods Apply restrictions on the concepts found Remove unwanted terms and categories 31
32
Knowledge Enabled Information and Services Science Graph-based expansion 32 Expand - conceptually Full text search on Article texts Delete results with low confidence score
33
Knowledge Enabled Information and Services Science 33 Collecting Instances
34
Knowledge Enabled Information and Services Science 34 Creating a Hierarchy
35
Knowledge Enabled Information and Services Science 35
36
Knowledge Enabled Information and Services Science Extracting from Plain text or hypertext Informal, human-readable presentation of information Vast amounts of information available –Web –Scientific publications –Encyclopediae Need sophisticated algorithms to extract information 36
37
Knowledge Enabled Information and Services Science Pattern-based Fact Extraction Learn textual patterns that express known relationship types Search the text corpus for occurrences of known entities (e.g. from domain hierarchy) Semi-open –Types are known and limited –Types are automatically expanded when LOD grows Vector-Space Model Probabilistic representation 37
38
Knowledge Enabled Information and Services Science Training Relationship data in the UMLS Metathesaurus or the Wikipedia Infobox-data provide a large set of facts in RDF Triple format –Limited set of relationships that can be arranged in a schema –Semi-open Types are known and limited Types are automatically expanded when LOD grows 38
39
Knowledge Enabled Information and Services Science Training procedure Iterate through all facts (S->P->O triples) Find evidence for the fact in a corpus –Wikipedia, WWW, PubMed or any other collection –If triple subject and triple object occur in close proximity in text, add the pattern in-between to the learned patterns Combined evidence from many different patterns increases the certainty of a relationship between the entities 39
40
Knowledge Enabled Information and Services Science Overview – initial computations Fact Collection Text Corpus EntropySVD/LSI CP2PCP2P mod CP2P R2P Modifications * Pertinence R2P Matrix Computations * R2P mod 40
41
Knowledge Enabled Information and Services Science 41 Training procedure cont’d Canberra::Australia Canberra, the Australian capital city Canberra, capital of the Commonwealth of Australia Canberra, the Australian capital Canberra, the Australian capital city, the capital city, capital of the Commonwealth of, the capital 111
42
Knowledge Enabled Information and Services Science Relationship Patterns X, the Y capital city X, capital of the Commonwealth of Y X, the Y capital Capital_of111 X, the Y capital city X, capital of YX, the Y capital Capital_of111 Extracted Synonyms X, the Y capital *X, capital of Y Capital_of21 Generalize 42
43
Knowledge Enabled Information and Services Science Relationship Patterns X, the Y capital * X, capital of Y X, * * YX, predecessor of Y Capital_of2220 predecessor0022 X, the Y capital * X, capital of Y X, * * YX, predecessor of Y Capital_of1.0 0.50 predecessor000.51.0 43
44
Knowledge Enabled Information and Services Science Resolve Relationships X, the Y capital * X, capital of Y X, * * YX, predeces- sor of Y Capital_ of 1.0 0.50 predece ssor 000.51.0 0.5X, the Y capital * 0.25X, capital of Y 0.25X, * * Y 0X, predecesso r of Y x 44
45
Knowledge Enabled Information and Services Science Resolve Relationships X, the Y capital * X, capital of Y X, * * YX, predecessor of Y Capital_ of 1.0 0.50 predece ssor 000.51.0 0.5X, the Y capital * 0.25X, capital of Y 0.25X, * * Y X, predecesso r of Y x Capital_ofpredecessor 0.8750.125 45
46
Knowledge Enabled Information and Services Science Advanced Computations Fact Collection Text Corpus EntropySVD/LSI CP2PCP2P mod CP2P R2P Modifications * Pertinence R2P Matrix Computations * R2P mod 46
47
Knowledge Enabled Information and Services Science Advanced Computations EntropySVD/LSIPertinence R2P Matrix Computations * R2P mod LSI to determine relationship similarities Reduces sparsity in the matrix and makes relationship rows more comparable Allows better use of pertinence computation Entropy Increase weights for more unique patterns Pertinence Smoothing of pattern occurrence frequencies 47
48
Knowledge Enabled Information and Services Science Example Output (DBPedia) 48
49
Knowledge Enabled Information and Services Science Pertinence for Relations Looking at fact extraction as a classification of concept pairs into classes of relations Class boundaries are not clear cut E.g. has_physical_part has_part don’t punish the occurrence of the same pattern with relationship types that are similar 49
50
Knowledge Enabled Information and Services Science Relationship Patterns X, the Y capital * X, capital of Y X, * * YX, located in Y Capital_of2222 Located_in0024 X, the Y capital * X, capital of Y X, * * YX, located in Y Capital_of1.0 0.20.5 Located_in000.20.9 50
51
Knowledge Enabled Information and Services Science Resolve Relationships X, the Y capital * X, capital of Y X, * * YX, located in Y Capital_ of 1.0 0.20.5 Located _in 000.20.9 0.4X, the Y capital * 0.1X, capital of Y 0.3X, * * Y 0.2X, located in Y x Capital_ofLocated_in 0.660.24 51
52
Knowledge Enabled Information and Services Science Evaluation of the fact extraction - DBPedia 52 Precision / Recall Confidence Threshold Strict evaluation: Only 1 st ranked extracted relation is compared to gold-standard. Averaged over relation types. 60% training set, 40% testing, DBPedia Infobox fact corpus, Wikipedia text corpus
53
Knowledge Enabled Information and Services Science Evaluation of the fact extraction - UMLS 53 Precision / Recall Confidence Threshold Strict evaluation: Only 1 st ranked extracted relation is compared to gold-standard. Averaged over relation types. 60% training set, 40% testing, UMLS fact corpus, MedLine text corpus
54
Knowledge Enabled Information and Services Science Manual Evaluation strategy (DBPedia) 54
55
Knowledge Enabled Information and Services Science Manual Evaluation strategy (UMLS) poisoning, fluoride::teeth[finding_site_of]finding_site_of1 polyneuritis, endemic::vitamin b 1[associated_with]has_form0 polyp of cervix nos (disorder)::768 polyps[associated_with]associated_with1 polyp of cervix nos (disorder)::neck of uterus[location_of]finding_site_of1 polyp of colon::benign neoplasms[related_to]associated_with0.5 brain::brain contusion [has_location] associated_ morphology_of0.25 brain::brain ischemia [has_finding_site]location_of0.5 polyp of colon::gastrointestinal tract, nos[is_primary_anatomic_site_of_disease]location_of0.5 polyvesicular vitelline tumor::gamete structure (cell structure)[is_normal_cell_origin_of_disease] is_normal_cell_ origin_of_disease1 proptosis::apert syndrome[has_manifestation]has_manifestation1 55
56
Knowledge Enabled Information and Services Science Manually evaluated precision for different confidence values 56
57
Knowledge Enabled Information and Services Science Manually evaluated precision, confidence > 0.5 (on UMLS – MedLine corpus) 57
58
Knowledge Enabled Information and Services Science Summary Model Creation Using background knowledge in the form of a fact corpus and a text corpus, we can suggest new facts/propositions Possible to try all combinations of known concepts (e.g. Read-the-Web project), but huge validation backlog Letting users drive the model creation focuses the creation on the parts that are of common interest Willingness to help validate facts 58
59
Knowledge Enabled Information and Services Science Circle of knowledge on the Web 59 What is knowledge? How do we turn propositions/belie fs into knowledge? How do we acquire knowledge? Part 3
60
Knowledge Enabled Information and Services Science Evaluation and Use Christopher Thomas, Wenbo Wang, Delroy Cameron, Pablo Mendes, Pankaj Mehra and Amit Sheth, What Goes Around Comes Around - Improving Linked Open Data through On-Demand Model Creation, to appear in WebScience 2010 60 Current Work
61
Knowledge Enabled Information and Services Science Explicit evaluation “Evaluate for evaluation’s sake” –Domain-experts rank the value of a proposition –Committees of experts and/or laymen vote on the correctness of propositions 61
62
Knowledge Enabled Information and Services Science Explicit evaluation in the Semantic Browser The user can vote on facts Some facts are presented randomly Most facts are presented after the user (by browsing) showed interest in –The full triple –Subject/Object of the triple 62
63
Knowledge Enabled Information and Services Science Implicit evaluation Evaluation that does not explicitly involve a vote on the extracted information Use the Wisdom of the Crowds Users show support for a proposition by performing an action Every action taken on a piece of information is recorded and analyzed The cumulative behavior of the users gives an indication of which propositions are correct or interesting 63
64
Knowledge Enabled Information and Services Science Implicit evaluation in the Semantic Browser The user simply searches and browses The search history and the click-stream provide information about whether a page transition using an extracted triple was successful Assumption: on average, a successful trail- browsing session includes valid triples Problem: requires extensive use 64
65
Knowledge Enabled Information and Services Science Implicit evaluation in the Semantic Browser 65 1 st triple2 nd triple Triples
66
Knowledge Enabled Information and Services Science Conclusion Creating domain models gives a way of selectively adding knowledge to a system We showed that it is possible to automatically create such models with high accuracy The models immediately impact users Willingness to help evaluate Evaluation becomes integral part in knowledge lifecycle 66
67
Knowledge Enabled Information and Services Science ? 67
68
Knowledge Enabled Information and Services Science Thank you 68
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.