Download presentation
Presentation is loading. Please wait.
Published byArlene White Modified over 9 years ago
1
Building Ontologies Automatically Theory and Demonstration Dan Moldovan Human Language Technology Research Institute University of Texas at Dallas
2
2ABBYY - 2012 Outline Introduction to Ontologies Automatic Ontology Building Applications OWL/RDF Representation Jaguar-Jager Demo CHiPS Demo
3
3ABBYY - 2012 Ontology An ontology is an organization of concepts and semantic relations within a given domain Ontologies explicitly represent knowledge about domains of interest; i.e. what concepts are important and how do they relate to each other Ontologies serve as the backbone of semantic technologies and applications Ontologies can help users achieve an unified understanding of concepts Ontologies facilitate dealing with acronyms Ontologies can be used as interchange formats to enable common access to data
4
4ABBYY - 2012 Ontologies facilitate exchange of knowledge between machines and between people and machines Ontologies allow easier visualization of documents; i.e. which concepts are important and how far semantically they are Once an ontology is created, it can be used to tag new texts to enable better retrieval and further processing [this is the idea of the semantic web] Ontologies help browsing, searching and question answering; it is possible to understand questions and provide semantic connections between question concepts and text words Ontology
5
5ABBYY - 2012 Ontologies for Question Answering QP: determine the expected answer type and select the keywords used to retrieve relevant passages Question classification Answer type detection PR: retrieve and rank passages that are relevant to the input question Query formulation Keyword expansion AP: extract an exact answer by evaluating all answer candidates Answer surface form Answer redundancy
6
6ABBYY - 2012 How to Create an Ontology? Manual ontology creation Time consuming Error prone Requires subject matter experts The end product is difficult to maintain Hard to cope with the rapidly changing and vast amount of information available for a domain Automatic/Semi-automatic ontology generation Leverage existing domain models to seed the process of extracting semantically rich ontologies from unstructured text Automatically update the ontology when new documents are made available or the domain model changes Communicate ontology content across multiple applications using OWL/RDF as the common interchange format Allow the user to easily review, update, and maintain the ontology Customize ontology relations using semantic calculus and/or user defined rules
7
7ABBYY - 2012 Ontologies for Question Answering QA system integrated with an automatic ontology building system
8
8ABBYY - 2012 Outline Introduction to Ontologies Ontologies for Question Answering Automatic Ontology Building Applications OWL/RDF Representation Jaguar- Jager Demo CHiPS Demo
9
9ABBYY - 2012 Knowledge Acquisition from Text KAT: automatically builds ontologies and knowledge bases (KBs) from concepts and semantic relationships found in text Constituents of an ontology/KB Concepts/Vocabulary Key domain concepts (often missing from general purpose machine- readable dictionaries, e.g., WordNet) “weapon”, “WMD”, “launcher” Relations between ontological concepts “anthrax” ISA “biological weapon”, “anthrax” CAUSE “death” Organization of Relations Hierarchical (universally true transitive relations, e.g. ISA, PART- WHOLE) Contextual (text-conveying relations identified by a semantic parser)
10
10ABBYY - 2012 Types of Knowledge Universal (or ontological) Represented in hierarchies Simple binary relations between concepts “Chemical weapons such as nerve gas, …” Contextual Represented in individual (semantic) contexts Groups of relations centered on a common concept “The forces launched a full- scale attack on Monday”
11
11ABBYY - 2012 Knowledge Base Constituents
12
12ABBYY - 2012 Knowledge Acquisition from Text Functionality 1. Produce ontologies 2. Link concepts and relations to text 3. Visualize ontology 4. Edit ontology 5. Enhance an existing ontology 6. Merge two ontologies into a consistent ontology
13
13ABBYY - 2012 Ontology/KB creation Knowledge extraction from text Pattern recognition; semantic parsing Knowledge representation and storage Contextual vs. universal XML; relational database Knowledge base maintenance Conflict resolution Ontology mapping; ontology merging User interaction; ontology modification Automatically Building Ontologies
14
14ABBYY - 2012 KAT Modules – Text Processing Input: Documents, Seeds 1. Extract “concepts” of interest 2. Extract binary relations (universal) 3. Use semantic parser to obtain contextual knowledge Output: Concepts, Contexts, Binary Relations The rebels had access to chemical weapons, such as nerve gas and other poisonous gases.
15
15ABBYY - 2012 Text Processing 1. Candidate concepts: NPs that contain seed concepts (e.g., ) and NPs semantically linked to seed concepts 2. Concept selection: discard candidates that match certain criteria( e.g. 3. Seed enrichment: enhance the current set of seeds with Step 2’s domain concepts and return to Step 1 4. Relation selection: collect all semantic relations that link domain concepts with other concepts (in- or out-of-the- domain). The relations between domain concepts will become part of the ontology.
16
16ABBYY - 2012 Semantic Relations Stored in KB Relation (Code)DefinitionExample Agent (AGT) X is the agent for Y; X is prototypically a person. [XY] [John] [eats] eggs and ham Cause (CAU)X causes Y[XY] [Drinking] causes [accidents]. Influence (IFL) X caused something to happen to Y [XY] [The war] had an impact on [the Economy] Instrument (INS)X is an instrument in Y [YX] John [broke] the window with [a hammer]. [YX] John [played] the Brandenburg Concerto on [the harmonica] ISAX is a (kind of) Y[XY] [John] is a [person]. Location/Direction/ Source/Path (LOC) X is the location of Y or where Y take place [YX] There is [a cat] on [the roof] [YX] The hurricane [passes] through [Galveston]. Make-Produce (MAK)X makes Y[XY] [GM] manufactures [cars]. Manner (MNR) X is the manner in which Y happens [YX] John [read] [carefully]; [ran] [quickly]; [spoke] [hastily] Part-Whole (PW)X is a part of Y [YX] [faculty] [professor]; [XY] [door] of the [car] Property Type (PRO)X is a property type of Y[XY] [The color] of [the car] is blue. Attribute/Value (VAL)X is a attribute/value of Y [YX] [The car] is [blue] [YX] [The color] of the car is [blue].
17
17ABBYY - 2012 Semantic Relations Stored in KB Relation (Code)DefinitionExample Purpose (PRP) X is the purpose for Y; Y did something because this person wanted X [YX] John [swims] for [fun]; Mary [works] part-time [to earn some extra money] Quantification/ Extent (QNT) X is a quantification of Y; Y can be an entity or event [XY] [XY] John saw [three] [hurricanes]. [Y X] The budget [increased] with [10%] Synonymy/Name (SYN) X is a synonym/name/equal for/to Y [XY] [FBI] ([Federal Bureau of Investigation]) [YX] [This car] is called ["Johann"] Temporal (TMP) X is the time of Y (when Y take place) [XY] John [woke up] at [noon] Theme/Patient/ Result/Consumed (THM) X is the theme/patient/result/ consumed in/from/of Y [YX] John [painted] [his truck]. [YX] John [baked] [a cake].
18
18ABBYY - 2012 Examples of Semantic Relations in text Semantic Relations are the interconnections between words or concepts that define the meaning of text. They are used as elements of knowledge bases. Example : John went to the park yesterday because he saw hot air balloons taking off from there Agent(John, went) At-Location(went, to the park) At-Time(went, yesterday) Cause(saw, went) Experiencer(He, saw) Stimulus(hot air balloons taking off from there, saw) Value(hot, air) Part-Whole(hot air, balloons) Is-A(hot air balloons, balloons) Experiencer(hot air balloons, taking off) At-Loc(taking off, from there) Johnwentto the parkyesterdaybecause hesawtaking offballoonsfrom there Value ISA Experiencer Stimulus Experience At-Location Agent At-Location At-Time airhot Part-Whole Cause
19
19ABBYY - 2012 Semantic Parser Various syntactic patterns: verb-argument, complex nominals, genitives, adjectival phrases/clauses, etc. Semantic restrictions on relation arguments R(x,y) Domain and range restrictions defined using an ontology of sorts KINSHIP: [AnimateConcreteObject] [AnimateConcreteObject] Filter relations that cannot exist between certain arguments
20
20ABBYY - 2012 Semantic Parser Bracketer – determine semantic dependencies between compound nouns with three or more nouns Sugar industry analyst vs. Female industry analyst Argument detection – identify argument pairs likely to encode a semantic relation based on lexico-syntactic patterns Domain and range filtering – filter candidate arguments based on their semantic classes and relation definitions Feature extraction – extract features corresponding to each pattern Semantic class of modifier noun, syntactic path, voice, etc. Machine learning classifiers – per-relation and per-pattern approaches Support vector machines, Decision trees, Naïve Bayes, Semantic Scattering Conflict resolution – resolve relation conflicts between classifiers
21
21ABBYY - 2012 KAT Modules – Classification/Hierarchy Creation Input: Concepts, Binary Relations Classify each concept against every other using defined procedures, obtaining set of ISA relations Add all ISA and other binary relations to the hierarchy using conflict resolution Output: Hierarchy of relations “Scud missile” ISA “missile” “Iraqi standing_army” ISA “Asian army” “weapons inspection team” ISA “inspection team”
22
22ABBYY - 2012 Subsumption used for Knowledge Classification Proposition Let C = A 1 ⊓ ⋯ ⊓ A m ⊓ ∀ R 1. C 1 ⊓ ⋯ ⊓ ∀ R n.C n be the normal form of the concept description C, and D = B 1 ⊓ ⋯ ⊓ B k ⊓ ∀ S 1. D 1 ⊓ ⋯ ⊓ ∀ S l.D l be the normal form concept description D. Then C ⊑ D iff both conditions hold. (1) For all i, 1 ≤ i ≤ k, there exists j, 1 ≤ j ≤ m such that B i = A j (2) For all i, 1 ≤ i ≤ l, there exists j, 1 ≤ j ≤ n such that S i = R j and C j ⊑ D i This formulation of subsumption is Sound (the “if” part holds) Complete (the “only if” part holds) Algorithm has a polynomial complexity.
23
23ABBYY - 2012 Classification/Hierarchy Creation Classification procedures For domain concepts modifier 1 head 1 and modifier 2 head 2, create If ISA(modifier 1,modifier 2 ) and ISA(head 1,head 2 ), then ISA(modifier 1 head 1, modifier 2 head 2 ) Japan discount rate ISA Asian country interest rate If ISA(modifier 1,modifier 2 ) and SYNONYMY(head 1,head 2 ), then ISA(modifier 1 head 1, modifier 2 head 2 ) Japan discount rate ISA Asian country discount rate If SYNONYMY(modifier 1,modifier 2 ) and ISA(head 1,head 2 ), then ISA(modifier 1 head 1, modifier 2 head 2 ) Japan discount rate ISA Japan interest rate If SYNONYMY(modifier 1,modifier 2 ) and SYNONYMY(head 1,head 2 ), then SYNONYMY(modifier 1 head 1, modifier 2 head 2 )
24
24ABBYY - 2012 Classification/Hierarchy Creation Classification procedures For domain concepts modifier head and head, create ISA(modifier head, head) relation nontaxable dividends ISA dividends For domain concepts modifier 1 modifier 2 head, create If modifier 1 head exists, then ISA(modifier 1 modifier 2 head, modifier 1 head) nuclear weapon testing ISA nuclear testing If modifier 2 head exists, then ISA(modifier 1 modifier 2 head, modifier 2 head) nuclear weapon testing ISA weapon testing
25
25ABBYY - 2012 Classification/Hierarchy Creation Textual entailment for concept subsumption monetary policy ? fiscal policy ISA economic policy ISA policy (WordNet hierarchy)
26
26ABBYY - 2012 Domain Ontology/KB Creation - Example
27
27ABBYY - 2012 Domain Ontology/KB Creation - Example
28
28ABBYY - 2012 “Our Balancing Act” Quantity Making sure that the available information is actually extracted Beauty Making sure that the ontology concepts are real concepts, not just sentence fragments Relevance Not including every concept mentioned in a sentence
29
29ABBYY - 2012 “Striking the Balance” Tuning text exploration aggressiveness Pruning sentence phrases down to the “real concept” Filtering out “ugly” sentence fragments Handling conjunctions “Tom and Bill” went to “Dallas and Fort Worth” “Hank or Susan” went to “Chicago or New York”
30
30ABBYY - 2012 Ontology - Example International Economics Ontology Document collection: International Economics Book 2.8 MB of plain text Seed ontology: economics reference taxonomy 558 seed concepts, e.g. aggregate demand, ATC curve, budget deficit, commodity money, etc. 791 semantic relations 5,678 ontological concepts 13,878 semantic relations AGENT, CAUSE, INFLUENCE, INSTRUMENT, ISA, AT- LOCATION, MAKE-PRODUCE, MANNER, PROPERTY, PURPOSE, PART-WHOLE, QUANTITY, SYNONYMY, THEME, AT-TIME, VALUE
31
31ABBYY - 2012 KAT Modules – Knowledge Base Maintenance Knowledge base merging Visualization Knowledge base editing User interaction Modifications
32
32ABBYY - 2012 Knowledge Base Maintenance New concept integration: concepts and relations extracted from incoming documents are added to the existing ontology Establish a mapping between the new set of concepts/relations and the existing ontology Add non-mapped concepts and relations to the ontology Ontology mapping: identify a set of rules that link concepts from one ontology to analogous concepts (in another ontology) Calculate semantic similarity of concepts Similarity between the semantic models of concepts Degree of textual entailment between the concepts’ glosses Concept label-based similarity Calculate semantic similarity of relations Function of their arguments’ similarity degree
33
33ABBYY - 2012 Knowledge Base Maintenance Ontology merging: create a new ontology by combining information from two or more ontologies Map the ontologies (two at a time) Combine domain concepts (use a single copy for mapped concepts) Merge the relation sets of mapped concepts Conflict resolution algorithm Re-classify the new set of ontological concepts Classification/hierarchy creation procedures
34
34ABBYY - 2012 Conflict Resolution Approach used – prevention Start from an empty hierarchy and an input relation set Add a relation from the input set to the hierarchy, if It does not form a cycle It is not redundant (does not duplicate a path) Remove jump links Properties of hierarchical relations Transitive If R(A,B) and R(B,C), then R(A,C) ISA(cat,mammal) and ISA(mammal,animal) ISA(cat,animal) Strictly non-symmetric If R(A,B), then NOT R(B,A) ISA(cat,mammal) ¬ISA(mammal,cat)
35
35ABBYY - 2012 Types of Conflict Inconsistencies Simple loops Cycles Redundancies Duplicate relations Jump links
36
36ABBYY - 2012 Jump Links Multiple paths from one node to another are acceptable As long as no single link duplicates a path Jump link removal When it is safe to add R(A,B), remove links from direct descendents of B to B, if they have a path to A
37
37ABBYY - 2012 Do fewer links mean fewer knowledge? Number of links: 4 Assertions 1. a b 2. a c 3. b d 4. c d 5. a d Number of links: 3 Assertions 1. a b 2. b c 3. c d 4. a c 5. b d 6. a d
38
38ABBYY - 2012 Ontology Merging - Example
39
39ABBYY - 2012 Compare KAT’s automatically generated ontologies against gold annotations Evaluation focuses on Lexical level Vocabulary/data layer level Other semantic relations level Viewing an ontology as a set of semantic relations between two concepts, the human annotators: Labeled an entry correct if the concepts and the semantic relation are correctly detected by the system, else marked the entry as incorrect Labeled a correct entry as irrelevant if any of the concepts or the semantic relation are irrelevant to the domain Added new entries for concepts and semantic relations omitted by KAT (from input documents) Domain Ontology/KB Evaluation
40
40ABBYY - 2012 Ontology/KB Evaluation - Metrics N K (*) gives the counts from KAT’s output N G (*) correspond to counts from gold annotations
41
41ABBYY - 2012 Domain Ontology/KB Evaluation - Results
42
42ABBYY - 2012 Jager™: Ontology Visualization and Editing Web application - scalable, multi-user visualization and editing of KAT’s ontologies/KBs Based on the Django framework and written in a mix of Python, HTML and Javascript Jager (pronounced yeager) is a corruption of the German word Jäger (hunter) Capabilities Jager admin tool Import/Export/Delete/Trim ontology Compare two ontologies Edit ontology name For a given ontology Edit/Delete/Insert concept/semantic relation
43
43ABBYY - 2012 Jager™: Ontology Visualization and Editing
44
44ABBYY - 2012 Outline Introduction to Ontologies Ontologies for Question Answering Automatic Ontology Building Applications OWL/RDF Representation Jaguar – Jager Demo CHiPS Demo
45
45ABBYY - 2012 Collaborative High Precision Search CHiPS™: ontology-guided search More powerful than keyword search Search from the perspective of a given ontology Document matching Semantic profiles are generated for documents based on a given ontology Ontology concepts are identified in the text Each identified concept is assigned a weight Semantic profile matching Semantic profiles for each document in a repository are generated in advance Semantic profile for input search text is generated on the fly Search algorithm finds a list of repository documents whose profiles most closely match that of the input search text profile
46
46ABBYY - 2012 CHiPS™ Architecture
47
47ABBYY - 2012 Document Similarity Possible applications in medical domain For diagnosis – patient data vs medical knowledge For research – text snippet vs Medline Match decision rules to KB Others Approaches Statistical approaches: Latent Dirichlet Allocation, Pachinko Allocation, others Semantic approaches: Event based Ontology based – outlined here Others
48
48ABBYY - 2012 Sample Search Search: The patient’s eye pain was associated with the surgical procedure and poly-L- lactic acid Result: She describes this area as looking like a "bug bite" & was located "on top of" (above) gortex implant, near the lateral canthus. Its shape is round about one-fourth inch in diameter w/a rise w/a peak "maybe" one-eighth of an inch in height total. She said her phys has treated the "bug bite" area w/an unknown type of steroid injection, w/o effect. He now wants to remove this surgically, however, she is not certain if she wants this done. She noted that she did not massage for first week, as had no instruction to do so; she also had lid lift surgery at the time (of the face lift,) & surgeon did not want any pressure on surgical site. She reported her concomitant medications as estradiol, gabapentin (neurontin), for trigeminal neuralgia & facial non-specific neuralgia; also a multivitamin. Add'l medical history included trigeminal neuralgia & facial non-specific neuralgia both following the accident. No further medical info reported. Add'l info for sculptra from ptc report case (b)(4) dated (b)(6)2008, received by (b)(6) on 25mar08: b/c no lot # is available, an investigation has been performed on the documentation of all potentially involved manufactured batches. The review of the device history reports & of the analytical results of these batches did not show any anomaly that could be related to the event which occurred. Repository: Manufacturer and User Facility Device Experience (MAUDE)
49
49ABBYY - 2012 Sample Search – Supporting Ontologies Medical Subject Headings (MeSH) controlled vocabulary Encyclopedic knowledge
50
50ABBYY - 2012 CHiPS™ Demo Hybrid MeSH-MedRA ontology NIH Medical Subject Headings (MeSH) taxonomy http://www.nlm.nih.gov/mesh/ Medical Dictionary for Regulatory Activities (MedRA) http://www.meddramsso.com/ 29,302 concepts 38,828 semantic relations (ISA) Document repositories FDA MAUDE document repository Manufacturer And User facility Device Experience Database of adverse medical events http://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfmaude/search.cfm NIH MEDLINE document repository journal citations and abstracts for biomedical literature from around the world http://www.nlm.nih.gov/bsd/pmresources.html
51
51ABBYY - 2012 Outline Introduction to Ontologies Ontologies for Question Answering Automatic Ontology Building Applications OWL/RDF Representation Jaguar Demo CHiPS Demo
52
52ABBYY - 2012 Conversion to OWL/RDF World Wide Web Consortium (W3C) standard formats Resource Description Framework (RDF) XML/N-Triples http://www.w3.org/TR/rdf-syntax-grammar Subject-predicate-object expressions (triples) to represent information “The sky is blue” (sky,hasColor,blue) triple Web Ontology Language (OWL) http://www.w3.org/TR/owl-features Designed to represent ontologies; creates RDF-XML-compatible semantic models Goal: Define a schema encodes the semantic markup without creating an intractable number of RDF and OWL relations Increase interoperability Facilitate integration of KAT’s ontologies into application systems
53
53ABBYY - 2012 Ontology to OWL Translation Definition of domain concepts and properties of concepts (lexeme, sense number)
54
54ABBYY - 2012 Ontology to OWL Translation Definition for concept part-of-speech noun verb adjective adverb
55
55ABBYY - 2012 Ontology to OWL Translation Definition for PART-WHOLE semantic relation
56
56ABBYY - 2012 Ontology to OWL - Example ISA(F-16,fighter_aircraft) fighter aircraft noun 1 F-16 noun 1
57
57ABBYY - 2012 Converting Relations into RDF Ontology is transformed into RDF triples Semantic relations from text are transformed into RDF triples Millions of Americans went to the polls on Tuesday to elect a president. MEASURE(Millions, American) AGENT(American, go) LOCATION(go, poll) TEMPORAL(go, Tuesday) PURPOSE(go, elect) THEME(elect, president) AGENT(American, elect)
58
58ABBYY - 2012 Conclusions We presented a generalized and improved procedure to automatically extract deep semantic information from text resources A methodology to rapidly create semantically-rich domain ontologies while keeping the manual intervention to a minimum We defined evaluation metrics to assess the quality of the ontologies and presented evaluation results for a subset of the intelligence and financial ontology libraries, semi-automatically created using freely- available textual resources from the Web The results show that a decent amount of knowledge can be accurately extracted while keeping the manual intervention in the process to a minimum.
59
59ABBYY - 2012 Thank You! Discussion
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.