Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ontology Learning For the Semantic Web. The Paper Itself Based around two products OntoEdit and Text- to-Onto. Based around two products OntoEdit and.

Similar presentations


Presentation on theme: "Ontology Learning For the Semantic Web. The Paper Itself Based around two products OntoEdit and Text- to-Onto. Based around two products OntoEdit and."— Presentation transcript:

1 Ontology Learning For the Semantic Web

2 The Paper Itself Based around two products OntoEdit and Text- to-Onto. Based around two products OntoEdit and Text- to-Onto. A rather foundational approach to the problems surrounding information extraction. A rather foundational approach to the problems surrounding information extraction. Occasionally, some really weak sentence structure. Occasionally, some really weak sentence structure. Problems with inconsistent example use. Problems with inconsistent example use. A frustrating exercise in presenting questions and not the answers. A frustrating exercise in presenting questions and not the answers.

3 The Web (our semantic battleground) The Web was created as a free-form information space. The Web was created as a free-form information space. Made for human comprehension, not machine understanding. Made for human comprehension, not machine understanding. From experience with the web there seems to be some inherent aversion to correct speeling or gRamar. From experience with the web there seems to be some inherent aversion to correct speeling or gRamar.

4 Machine Semantics “Computers, while originally designed to understand a series of electrical pulses, have had that same vocabulary expanded to be able to also evaluate letters, booleans, and numbers.” “Computers, while originally designed to understand a series of electrical pulses, have had that same vocabulary expanded to be able to also evaluate letters, booleans, and numbers.” -Brian Goodrich -Brian Goodrich

5 Ontologies Ontologies are metadata schemas. Ontologies are metadata schemas. –Controlled vocabulary of concepts –Machine understandable semantics –Define shared domain conceptualizations  (e.g. website to website, people to machines, etc.)

6 The Assumption “If every internet webpage had an associated perfect ontology that was just as accessible as the selfsame webpage, the creation of the semantic web would be only as far away as the creation of a browser that can find and interpret those ontologies and extract information based upon those models.” “If every internet webpage had an associated perfect ontology that was just as accessible as the selfsame webpage, the creation of the semantic web would be only as far away as the creation of a browser that can find and interpret those ontologies and extract information based upon those models.” -Brian Goodrich

7 The Knowledge Bottleneck “…manual acquisition of ontologies still remains a tedious cumbersome task resulting in a knowledge acquisition bottleneck.” “…manual acquisition of ontologies still remains a tedious cumbersome task resulting in a knowledge acquisition bottleneck.” –Steffen Staab

8 The Challenge In Overcoming this knowledge acquisition bottleneck the authors took a three-fold approach: In Overcoming this knowledge acquisition bottleneck the authors took a three-fold approach: –Time (Can you develop an ontology fast?) –Difficulty (Is it difficult to build an ontology?) –Confidence (How do you know that you’ve got the ontology right?)

9 OntoEdit OntoEdit supports the development and maintenance of ontologies using graphical means. It supports RDF-Schema, DAML- ONT, OIL and F-Logic. OntoEdit supports the development and maintenance of ontologies using graphical means. It supports RDF-Schema, DAML- ONT, OIL and F-Logic. Has many of the same features as our Ontology Editor. Has many of the same features as our Ontology Editor.  Cardinality Restrictions  Keyword Associations  Value Phrase Restrictions

10 Assaulting the walls of Jericho Multi-disciplinary approach Multi-disciplinary approach –Machine learning (human assisted) Five Phase approach: Five Phase approach: Import and Re-use existing ontologies Import and Re-use existing ontologies Data Extraction uses machine learning to sculpt major sections of the target ontology. Data Extraction uses machine learning to sculpt major sections of the target ontology. Target ontology is “pruned” Target ontology is “pruned” Refinement(?) (automatically and incrementally maintained by evaluating “quality” of proposals) [ Refinement(?) (automatically and incrementally maintained by evaluating “quality” of proposals) [ Hahn & Schnattinger] Validation using prime target application as a measure for success of the ontology.

11 Another Wonderful Graph Import/ReUse Import/ReUse Extract Extract Prune Prune Refine Refine Validate Validate Legacy data: reference to archaic “databasing” techniques Text-to-Onto

12 Components for Learning Ontologies by Staab and Maedche  Management Component  Resource Processing Component  Algorithm Library  GUI for Manual Engineering

13 Ontology Primitives a set of strings that describe lexical entries L for concepts and relations; a set of concepts 2 — C ; a taxonomy of concepts with multiple inheritance (heterarchy) H C ; a set of non-taxonomic relations — R — described by their domain and range restrictions; a heterarchy of relations, i.e. a set of taxonomic relations H R ; relations F and G that relate concepts and relations with their lexical entries, respectively; a set of axioms A that describe additional constraints on the ontology and allow to make implicit facts explicit;

14 Management Component OntoEngineer uses to select desired XML/HTML pages, document type definitions, databases, or pre-existing ontologies OntoEngineer uses to select desired XML/HTML pages, document type definitions, databases, or pre-existing ontologies Selects methods for the Resource Processing Component and Algorithms for the Library Component Selects methods for the Resource Processing Component and Algorithms for the Library Component Also includes a crawler that can find legacy data relevant to creation of the ontology on the web. (used for training data) Also includes a crawler that can find legacy data relevant to creation of the ontology on the web. (used for training data)

15 Resource Processing Component HTML documents may be indexed and reduced to free text. Semi-structured documents, like dictionaries, may be transformed into a predefined relational structure. Semi-structured and structured schema data (like DTD’s, structured database schemata, and existing ontologies) are handled following different strategies that may (or may not) be discussed later. For processing free natural text our system accesses the natural language processing system SMES (Saarbr¨ucken Message Extraction System), a shallow text processor for German. SMES comprises a tokenizer based on regular expressions, a lexical analysis component including various word lexicons, a morphological analysis module, a named entity recognizer, a part-of-speech tagger and a chunk parser.

16 Algorithm Library Component This is the actual ontology builder and where we revisit our previous model: This is the actual ontology builder and where we revisit our previous model: –Import/ReUse –Extraction –Pruning –Refining And then almost introduce some actual algorithms these phases use. And then almost introduce some actual algorithms these phases use.

17 Import/Reuse Recovering Conceptualizations: Recovering Conceptualizations: –First, schema structures are identified and imported separately. This may be done manually or using reverse engineering tools. –Second, merging and aligning.  This is a HUGE body of research that is largely ignored by this document: “While the general research issue concerning merging and aligning is still an open problem, recent proposals (e.g., [8]) have shown how to improve the manual process of merging/aligning.”

18 Extraction Lexical Entry & Concept Extraction Lexical Entry & Concept Extraction Hierarchical Concept Clustering Hierarchical Concept Clustering Dictionary Parsing Dictionary Parsing Association Rules Association Rules

19 Lexical Entry & Concept Extraction Uses statistical technique (N-grams) similar to the product from Cui’s presentation on the BioMedicine data extractor to group multi-word nouns together and associate them with their corresponding verbs Uses statistical technique (N-grams) similar to the product from Cui’s presentation on the BioMedicine data extractor to group multi-word nouns together and associate them with their corresponding verbs Every time a new lexical entry is introduced to L the OntoEngineer must decide whether to include the entry in an existing concept domain or to introduce a new one. Every time a new lexical entry is introduced to L the OntoEngineer must decide whether to include the entry in an existing concept domain or to introduce a new one.

20 Hierarchical Concept Clustering A useful way of creating a taxonomic classification of concepts. A useful way of creating a taxonomic classification of concepts. Done automatically Text-to-Onto clusters concepts by adjacency of terms and syntactical relationships. Done automatically Text-to-Onto clusters concepts by adjacency of terms and syntactical relationships. Done by a cooperative machine learning system, ASIUM, presented by Faure & Nedellec. Uses the verb to noun and noun to verb association method. Done by a cooperative machine learning system, ASIUM, presented by Faure & Nedellec. Uses the verb to noun and noun to verb association method. “Thus, they cooperatively extend the lexicon, the set of concepts, and the concept heterarchy. (L, C, H C ) “Thus, they cooperatively extend the lexicon, the set of concepts, and the concept heterarchy. (L, C, H C )

21 Dictionary Parsing This is really only one step further than what we are doing with the lexicons in our own Ontology Editor. The identified dictionary words are used with the concept clustered verb and noun associations to infer relationships between lexical entries. This is really only one step further than what we are doing with the lexicons in our own Ontology Editor. The identified dictionary words are used with the concept clustered verb and noun associations to infer relationships between lexical entries.

22 Association Rules These algorithms are usually used for data mining. These algorithms are usually used for data mining. Works by using the taxonomy heterarchy to generalize the lexical entries and thereby draw conclusions about their use. Works by using the taxonomy heterarchy to generalize the lexical entries and thereby draw conclusions about their use. –“Snacks are purchased together with drinks” -instead of- “Lay’s chips are purchased with Sprite.”

23 Example Output from Text-to-Onto

24 Completeness vs. Scarcity (Pruning) Pruning the Ontology Pruning the Ontology – “ – “It is a widely held belief that targeting completeness for the domain model on the one hand appears to be practically unmanageable and computationally intractable, and targeting the scarcest model on the other hand is overly limiting with regard to expressiveness. Hence, what we strive for is the balance between these two, which is really working.” – –Staab and Maedche

25 Import and ReUse, as well as the different Extraction methods we’ve discussed all tend to introduce unfocused elements into the ontology, as more general rules satisfy the conditional statements much more often. Import and ReUse, as well as the different Extraction methods we’ve discussed all tend to introduce unfocused elements into the ontology, as more general rules satisfy the conditional statements much more often. Pruning is the art of diminishing the ontology to more specific rules. Pruning is the art of diminishing the ontology to more specific rules. –First, must evaluate how removal of item from C (the set of concepts) will affect the rest of the ontology. (Petersen [9], no dangling or broken links) –Second, based on absolute or relative counts of frequency determine which ontology items are to be either kept or pruned. (Kietz [13])

26 Refine Hahn and Schnattinger Hahn and Schnattinger Incremental approach to updating an ontology “centered around linguistic and conceptual “quality” of various forms of evidence [i.e. conflicting and analogous semantic structures] underlying the generation and refinement of concept hypothesis.” Incremental approach to updating an ontology “centered around linguistic and conceptual “quality” of various forms of evidence [i.e. conflicting and analogous semantic structures] underlying the generation and refinement of concept hypothesis.”

27 Conclusions Ontology learning a significant leverage to Semantic Web. Ontology learning a significant leverage to Semantic Web. –Propels propagation of ontologies Multi-disciplinary approach to the problem Multi-disciplinary approach to the problem

28 Further Challenges in Learning XML namespace mechanisms will turn the web into an “amoeba-like” structure, with ontologies supporting and referring to each other (ReUse and Import) Not clear yet on what will be the semantic result of this evolution. XML namespace mechanisms will turn the web into an “amoeba-like” structure, with ontologies supporting and referring to each other (ReUse and Import) Not clear yet on what will be the semantic result of this evolution. This examination has been restricted almost entirely to RDF-Schema. Additional layers of RDF (future OIL or DAML-ONT) will require new means for improved Ontology engineering. This examination has been restricted almost entirely to RDF-Schema. Additional layers of RDF (future OIL or DAML-ONT) will require new means for improved Ontology engineering.

29 Questions?


Download ppt "Ontology Learning For the Semantic Web. The Paper Itself Based around two products OntoEdit and Text- to-Onto. Based around two products OntoEdit and."

Similar presentations


Ads by Google