Presentation is loading. Please wait.

Presentation is loading. Please wait.

Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding.

Similar presentations


Presentation on theme: "Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding."— Presentation transcript:

1 Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding

2 2 Motivation The representation of web content limits its usability A machine understandable web –Shared, explicit, formal conceptualizations (ontologies) –The semantic web

3 3 A Problem How to transform current web to be the semantic web?

4 4 A Solution: Semantic Annotation Add explicit, formal, and unambiguous metadata to web documents Explicit: publicly accessible Formal: publicly agreeable Unambiguous: publicly identifiable

5 5 Annotation Representation Explicit Annotation Implicit Annotation

6 6 Semantic Annotation Current Research Status Manual annotation through friendly interfaces [Annotea, etc.] Automatic annotation with ontology generation [SCORE] Automatic annotation using automated IE tool based on pre-defined ontologies [SemTag, MnM, etc.]

7 7 Current Automatic Annotator a typical paradigm Domain Ontology Non-ontology-based IE Wrapper Rules and extracting categories Document (1) Extraction (2) Alignment (3) Annotation

8 8 Current Automatic Annotator Problems Domain Ontology Document (1) Problem of data recognition (2) Problem of concept disambiguation (3) Problem of Annotation formatting, storing, indexing, sharing (4) Problem of Assembling ontologies Non-ontology-based IE Wrapper Rules and extracting categories

9 9 “Main Drawback of Using Automated IE” [Kiryakov04] “none of these approaches expects an input or produces output with respect to ontologies” “a set of heuristics for post-processing and mapping of the IE results to an ontology … not sufficient for large-scale, domain-independent semantic annotation.” “IE and wrapper induction techniques need to use the ontology more directly during the process of extraction.”

10 10 Ontology-driven Paradigm (Data-Extraction Ontology) for Semantic Annotation Document Non-ontology-based IE Wrapper Ontology-based IE Wrapper Document

11 11 Ontology-driven Paradigm for Semantic Annotation Some Arguments Resiliency w.r.t. web page layouts (helps scale to large set of web pages) Adpativeness w.r.t. domain specifications (helps scale to large size domains) Creation of ontologies: still a problem but no longer a drawback Speed of execution: still a drawback (but we are going to propose a solution next)

12 12 Two-Layer Annotation Model Conceptual Annotator using an ontology-based IE tool Document Structural Annotator Sample Annotation Process Similar Documents Massive Annotation Process

13 13 Structural Annotator Major components –HTML hierarchical path that leads to concept locations –Local context around locations –Dependencies among multiple semantic categories Significance –Identify both categories and their semantic meanings

14 14 Ontology Factors in Semantic Annotation Tasks Knowledge specification –Semantic web community –Web Ontology Language (OWL) Knowledge instantiation –IE and database community –Object-oriented System Model in XML (OSMX)

15 15 Ontology Conversion Similarities (OWL vs. OSMX) –Class vs. object set –ObjectProperty vs. relationship set –Cardinality restriction vs. participation constraint –subclassOf vs. is-a relationship Unique features –OWL subpropertyOf symmetric and transitive property namespace declaration ontology importing –OSMX arbitrary n-ary relationship sets data frames general constraints

16 16 Ontology Construction An Unavoidable Problem Semantic annotation tasks require ontologies. The ontology for a specific semantic annotation task is not promised to be available all the time.

17 17 Ontology Construction General and Special Generally speaking –Until now, main stream, manual construction –Automatic and semi-automatic ontology generation, many research papers, few or none practical, a very hard problem Special to semantic annotation purpose –Very dynamic and variant domains –Much overlapped information –Limited size of scope for one web page –Flat structure

18 18 Ontology Construction Knowledge Reusing “What has been will be again, what has been done will be done again; there is nothing new under the sun.” (The Holy Bible, Ecclesiastes, 1:9, NIV translation) A “new” ontology is a new assembly with unions and projections of several pre- existed ontologies.

19 19 Architecture on Dynamically Assembling Domain of Interest Web Page (1) (2) (1)Knowledge-component selection (2)Ontology assembly …… Collection of Knowledge Selected Knowledge Components … Assembled Ontology …

20 20 Thesis Statement Propose a new solution to perform semantic annotation on normal HTML web pages, specifically 1.apply ontology-based automatic IE techniques 2.augment OWL with knowledge recognition extension 3.combine conceptual annotator and layout-based annotator 4.assemble a new domain ontology for an annotation task dynamically

21 21 Standard Evaluation Annotation performance –Precision –Recall –Speed of execution Testing bed –5 ~ 10 different domains, with over 10 lexical concepts in each domain ontology –20 ~ 50 web pages on each domain

22 22 Ontology Converter Test A complete and sound checking is costly and difficult to implement. Our simple test A –Start with an OSMX ontology A B –Covert it to OWL and then transform it back to be OSMX ontology B AB –Process both A and B to annotate a same set of web pages (say 30 – 50 web pages) –Annotation results should be identical

23 23 Two-Layer Annotation Model Evaluation Standard evaluation In addition –About five large web sites with machine-generated web pages, each of which contains at least dozens of web pages

24 24 Dynamic Ontology Assembler Evaluation Regular precision and recall study according to selected knowledge components A pilot study on when ontology assembler works better than manual ontology construction –Record the time to use a tool to create an ontology from scratch –Record the time to assemble a same ontology –Compare their differences and the special conditions for each case –Make empirical suggestions about how to build a knowledge base that favors ontology assembly

25 25 Delimitations Automatic ontology creation from scratch Annotation storing, indexing, and sharing mechanisms Semantic annotation for multimedia content Parallel or distributional computing to further scale the semantic annotation system to a large number of web pages

26 26 Contributions To convert current web pages into machine-understandable semantic web pages Producing a pure ontology-driven semantic annotator using ontology-based IE wrapper Proposing a novel two-layer annotation model to do fast, accurate, and resilient annotation Studying a dynamic ontology assembler that helps maximize the reuse of existing knowledge and minimize the load of manual ontology creation Implementing an ontology converter so that this work is useful to the rest of the semantic web society.


Download ppt "Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding."

Similar presentations


Ads by Google