Ontology Learning For the Semantic Web. The Paper Itself Based around two products OntoEdit and Text- to-Onto. Based around two products OntoEdit and.

Slides:



Advertisements
Similar presentations
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.
Text mining Extract from various presentations: Temis, URI-INIST-CNRS, Aster Data …
FCA-MERGE: Bottom-up Merging of Ontologies
Information Retrieval in Practice
Human Language Technologies. Issue Corporate data stores contain mostly natural language materials. Knowledge Management systems utilize rich semantic.
Xyleme A Dynamic Warehouse for XML Data of the Web.
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,
Information Extraction and Ontology Learning Guided by Web Directory Authors:Martin Kavalec Vojtěch Svátek Presenter: Mark Vickers.
Bootstrapping an Ontology-based Information Extraction System Alexander Maedche, Günter Neumann, Steffen Staab (presented by D. Lonsdale) CS 652 – June.
Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding.
XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References.
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
Alternatives to Metadata IMT 589 February 25, 2006.
1 CS 502: Computing Methods for Digital Libraries Lecture 11 Information Retrieval I.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Overview of Search Engines
OIL: An Ontology Infrastructure for the Semantic Web D. Fensel, F. van Harmelen, I. Horrocks, D. L. McGuinness, P. F. Patel-Schneider Presenter: Cristina.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
On Roles of Models in Information Systems (Arne Sølvberg) Gustavo Carvalho 26 de Agosto de 2010.
ICS-FORTH May 25, The Utility of XML Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Heraklion, May.
Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction  Being serious about the semantic web  Living with heterogeneity  Heterogeneity problem.
Knowledge Discovery in Ontology Learning A survey.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
Chapter 7A Semantic Web Primer 1 Chapter 7 Ontology Engineering Grigoris Antoniou Frank van Harmelen.
Name : Emad Zargoun Id number : EASTERN MEDITERRANEAN UNIVERSITY DEPARTMENT OF Computing and technology “ITEC547- text mining“ Prof.Dr. Nazife Dimiriler.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lecture 5, Jan 23 th, 2003 Lotzi Bölöni.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Towards the Semantic Web 6 Generating Ontologies for the Semantic Web: OntoBuilder R.H.P. Engles and T.Ch.Lech 이 은 정
CREAM: Semantic annotation system May 24, 2013 Hee-gook Jun.
Strategies for subject navigation of linked Web sites using RDF topic maps Carol Jean Godby Devon Smith OCLC Online Computer Library Center Knowledge Technologies.
Problems with XML & XML Schemas XML falls apart on the Scalability design goal. 1.The order in which elements appear in an XML document is significant.
Dictionary based interchanges for iSURF -An Interoperability Service Utility for Collaborative Supply Chain Planning across Multiple Domains David Webber.
Information Retrieval
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
An Ontological Approach to Financial Analysis and Monitoring.
WonderWeb. Ontology Infrastructure for the Semantic Web. IST Project Review Meeting, 11 th March, WP2: Tools Raphael Volz Universität.
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Of 24 lecture 11: ontology – mediation, merging & aligning.
Information Retrieval in Practice
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Search Engine Architecture
Text Based Information Retrieval
Lecture #11: Ontology Engineering Dr. Bhavani Thuraisingham
Presented by: Hassan Sayyadi
Semantic Web - Ontologies
How to publish in a format that enhances literature-based discovery?
CS246: Information Retrieval
ONTOMERGE Ontology translations by merging ontologies Paper: Ontology Translation on the Semantic Web by Dejing Dou, Drew McDermott and Peishen Qi 2003.
Information Retrieval
Presentation transcript:

Ontology Learning For the Semantic Web

The Paper Itself Based around two products OntoEdit and Text- to-Onto. Based around two products OntoEdit and Text- to-Onto. A rather foundational approach to the problems surrounding information extraction. A rather foundational approach to the problems surrounding information extraction. Occasionally, some really weak sentence structure. Occasionally, some really weak sentence structure. Problems with inconsistent example use. Problems with inconsistent example use. A frustrating exercise in presenting questions and not the answers. A frustrating exercise in presenting questions and not the answers.

The Web (our semantic battleground) The Web was created as a free-form information space. The Web was created as a free-form information space. Made for human comprehension, not machine understanding. Made for human comprehension, not machine understanding. From experience with the web there seems to be some inherent aversion to correct speeling or gRamar. From experience with the web there seems to be some inherent aversion to correct speeling or gRamar.

Machine Semantics “Computers, while originally designed to understand a series of electrical pulses, have had that same vocabulary expanded to be able to also evaluate letters, booleans, and numbers.” “Computers, while originally designed to understand a series of electrical pulses, have had that same vocabulary expanded to be able to also evaluate letters, booleans, and numbers.” -Brian Goodrich -Brian Goodrich

Ontologies Ontologies are metadata schemas. Ontologies are metadata schemas. –Controlled vocabulary of concepts –Machine understandable semantics –Define shared domain conceptualizations  (e.g. website to website, people to machines, etc.)

The Assumption “If every internet webpage had an associated perfect ontology that was just as accessible as the selfsame webpage, the creation of the semantic web would be only as far away as the creation of a browser that can find and interpret those ontologies and extract information based upon those models.” “If every internet webpage had an associated perfect ontology that was just as accessible as the selfsame webpage, the creation of the semantic web would be only as far away as the creation of a browser that can find and interpret those ontologies and extract information based upon those models.” -Brian Goodrich

The Knowledge Bottleneck “…manual acquisition of ontologies still remains a tedious cumbersome task resulting in a knowledge acquisition bottleneck.” “…manual acquisition of ontologies still remains a tedious cumbersome task resulting in a knowledge acquisition bottleneck.” –Steffen Staab

The Challenge In Overcoming this knowledge acquisition bottleneck the authors took a three-fold approach: In Overcoming this knowledge acquisition bottleneck the authors took a three-fold approach: –Time (Can you develop an ontology fast?) –Difficulty (Is it difficult to build an ontology?) –Confidence (How do you know that you’ve got the ontology right?)

OntoEdit OntoEdit supports the development and maintenance of ontologies using graphical means. It supports RDF-Schema, DAML- ONT, OIL and F-Logic. OntoEdit supports the development and maintenance of ontologies using graphical means. It supports RDF-Schema, DAML- ONT, OIL and F-Logic. Has many of the same features as our Ontology Editor. Has many of the same features as our Ontology Editor.  Cardinality Restrictions  Keyword Associations  Value Phrase Restrictions

Assaulting the walls of Jericho Multi-disciplinary approach Multi-disciplinary approach –Machine learning (human assisted) Five Phase approach: Five Phase approach: Import and Re-use existing ontologies Import and Re-use existing ontologies Data Extraction uses machine learning to sculpt major sections of the target ontology. Data Extraction uses machine learning to sculpt major sections of the target ontology. Target ontology is “pruned” Target ontology is “pruned” Refinement(?) (automatically and incrementally maintained by evaluating “quality” of proposals) [ Refinement(?) (automatically and incrementally maintained by evaluating “quality” of proposals) [ Hahn & Schnattinger] Validation using prime target application as a measure for success of the ontology.

Another Wonderful Graph Import/ReUse Import/ReUse Extract Extract Prune Prune Refine Refine Validate Validate Legacy data: reference to archaic “databasing” techniques Text-to-Onto

Components for Learning Ontologies by Staab and Maedche  Management Component  Resource Processing Component  Algorithm Library  GUI for Manual Engineering

Ontology Primitives a set of strings that describe lexical entries L for concepts and relations; a set of concepts 2 — C ; a taxonomy of concepts with multiple inheritance (heterarchy) H C ; a set of non-taxonomic relations — R — described by their domain and range restrictions; a heterarchy of relations, i.e. a set of taxonomic relations H R ; relations F and G that relate concepts and relations with their lexical entries, respectively; a set of axioms A that describe additional constraints on the ontology and allow to make implicit facts explicit;

Management Component OntoEngineer uses to select desired XML/HTML pages, document type definitions, databases, or pre-existing ontologies OntoEngineer uses to select desired XML/HTML pages, document type definitions, databases, or pre-existing ontologies Selects methods for the Resource Processing Component and Algorithms for the Library Component Selects methods for the Resource Processing Component and Algorithms for the Library Component Also includes a crawler that can find legacy data relevant to creation of the ontology on the web. (used for training data) Also includes a crawler that can find legacy data relevant to creation of the ontology on the web. (used for training data)

Resource Processing Component HTML documents may be indexed and reduced to free text. Semi-structured documents, like dictionaries, may be transformed into a predefined relational structure. Semi-structured and structured schema data (like DTD’s, structured database schemata, and existing ontologies) are handled following different strategies that may (or may not) be discussed later. For processing free natural text our system accesses the natural language processing system SMES (Saarbr¨ucken Message Extraction System), a shallow text processor for German. SMES comprises a tokenizer based on regular expressions, a lexical analysis component including various word lexicons, a morphological analysis module, a named entity recognizer, a part-of-speech tagger and a chunk parser.

Algorithm Library Component This is the actual ontology builder and where we revisit our previous model: This is the actual ontology builder and where we revisit our previous model: –Import/ReUse –Extraction –Pruning –Refining And then almost introduce some actual algorithms these phases use. And then almost introduce some actual algorithms these phases use.

Import/Reuse Recovering Conceptualizations: Recovering Conceptualizations: –First, schema structures are identified and imported separately. This may be done manually or using reverse engineering tools. –Second, merging and aligning.  This is a HUGE body of research that is largely ignored by this document: “While the general research issue concerning merging and aligning is still an open problem, recent proposals (e.g., [8]) have shown how to improve the manual process of merging/aligning.”

Extraction Lexical Entry & Concept Extraction Lexical Entry & Concept Extraction Hierarchical Concept Clustering Hierarchical Concept Clustering Dictionary Parsing Dictionary Parsing Association Rules Association Rules

Lexical Entry & Concept Extraction Uses statistical technique (N-grams) similar to the product from Cui’s presentation on the BioMedicine data extractor to group multi-word nouns together and associate them with their corresponding verbs Uses statistical technique (N-grams) similar to the product from Cui’s presentation on the BioMedicine data extractor to group multi-word nouns together and associate them with their corresponding verbs Every time a new lexical entry is introduced to L the OntoEngineer must decide whether to include the entry in an existing concept domain or to introduce a new one. Every time a new lexical entry is introduced to L the OntoEngineer must decide whether to include the entry in an existing concept domain or to introduce a new one.

Hierarchical Concept Clustering A useful way of creating a taxonomic classification of concepts. A useful way of creating a taxonomic classification of concepts. Done automatically Text-to-Onto clusters concepts by adjacency of terms and syntactical relationships. Done automatically Text-to-Onto clusters concepts by adjacency of terms and syntactical relationships. Done by a cooperative machine learning system, ASIUM, presented by Faure & Nedellec. Uses the verb to noun and noun to verb association method. Done by a cooperative machine learning system, ASIUM, presented by Faure & Nedellec. Uses the verb to noun and noun to verb association method. “Thus, they cooperatively extend the lexicon, the set of concepts, and the concept heterarchy. (L, C, H C ) “Thus, they cooperatively extend the lexicon, the set of concepts, and the concept heterarchy. (L, C, H C )

Dictionary Parsing This is really only one step further than what we are doing with the lexicons in our own Ontology Editor. The identified dictionary words are used with the concept clustered verb and noun associations to infer relationships between lexical entries. This is really only one step further than what we are doing with the lexicons in our own Ontology Editor. The identified dictionary words are used with the concept clustered verb and noun associations to infer relationships between lexical entries.

Association Rules These algorithms are usually used for data mining. These algorithms are usually used for data mining. Works by using the taxonomy heterarchy to generalize the lexical entries and thereby draw conclusions about their use. Works by using the taxonomy heterarchy to generalize the lexical entries and thereby draw conclusions about their use. –“Snacks are purchased together with drinks” -instead of- “Lay’s chips are purchased with Sprite.”

Example Output from Text-to-Onto

Completeness vs. Scarcity (Pruning) Pruning the Ontology Pruning the Ontology – “ – “It is a widely held belief that targeting completeness for the domain model on the one hand appears to be practically unmanageable and computationally intractable, and targeting the scarcest model on the other hand is overly limiting with regard to expressiveness. Hence, what we strive for is the balance between these two, which is really working.” – –Staab and Maedche

Import and ReUse, as well as the different Extraction methods we’ve discussed all tend to introduce unfocused elements into the ontology, as more general rules satisfy the conditional statements much more often. Import and ReUse, as well as the different Extraction methods we’ve discussed all tend to introduce unfocused elements into the ontology, as more general rules satisfy the conditional statements much more often. Pruning is the art of diminishing the ontology to more specific rules. Pruning is the art of diminishing the ontology to more specific rules. –First, must evaluate how removal of item from C (the set of concepts) will affect the rest of the ontology. (Petersen [9], no dangling or broken links) –Second, based on absolute or relative counts of frequency determine which ontology items are to be either kept or pruned. (Kietz [13])

Refine Hahn and Schnattinger Hahn and Schnattinger Incremental approach to updating an ontology “centered around linguistic and conceptual “quality” of various forms of evidence [i.e. conflicting and analogous semantic structures] underlying the generation and refinement of concept hypothesis.” Incremental approach to updating an ontology “centered around linguistic and conceptual “quality” of various forms of evidence [i.e. conflicting and analogous semantic structures] underlying the generation and refinement of concept hypothesis.”

Conclusions Ontology learning a significant leverage to Semantic Web. Ontology learning a significant leverage to Semantic Web. –Propels propagation of ontologies Multi-disciplinary approach to the problem Multi-disciplinary approach to the problem

Further Challenges in Learning XML namespace mechanisms will turn the web into an “amoeba-like” structure, with ontologies supporting and referring to each other (ReUse and Import) Not clear yet on what will be the semantic result of this evolution. XML namespace mechanisms will turn the web into an “amoeba-like” structure, with ontologies supporting and referring to each other (ReUse and Import) Not clear yet on what will be the semantic result of this evolution. This examination has been restricted almost entirely to RDF-Schema. Additional layers of RDF (future OIL or DAML-ONT) will require new means for improved Ontology engineering. This examination has been restricted almost entirely to RDF-Schema. Additional layers of RDF (future OIL or DAML-ONT) will require new means for improved Ontology engineering.

Questions?