Presentation is loading. Please wait.

Presentation is loading. Please wait.

XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References.

Similar presentations


Presentation on theme: "XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References."— Presentation transcript:

1 XML on Semantic Web

2 Outline The Semantic Web Ontology XML Probabilistic DTD References

3 The Semantic Web (1/4) The first generation Web The second generation Web : current Web The third generation Web : Semantic Web The conceptual structuring of the Web in an explicit machine-readable way Requirements : Universal expressive power 、 Support for syntactic Interoperability 、 Support for Semantic Interoperability

4 The Semantic Web (2/4) Syntactic interoperability talks about parsing the data, and semantic interoperability means to define mappings between unknown terms and known terms in the data Semantic interoperability : requires standards syntactic form of document and semantic content A further representation and inference layer is needed on top of the currently available layers of the WWW : Ontology

5 The Semantic Web (3/4)

6 The Semantic Web (4/4)

7 Ontology (1/5) An explicit machine-readable specification of a shared conceptualization Crucial role : representation of a shared conceptualization of a particular domain reusable find pages that contain syntactically different but semantically similar words Construct : concepts (which are usually organized by taxonomies), relations, functions, axioms, instances

8 Ontology (2/5)

9 Ontology (3/5) Concepts : – Be anything about which something is said – Also known as classes (XOL, RDF(s), OIL, DAML+OIL), objects (OML), categories (SHOE) Taxonomies : – used to organize ontological knowledge using generalization and specialization relationships through which simple and multiple inheritance could be applied

10 Ontology (4/5) Relations and functions : – An interaction between concepts of the domain and attributes – Be called relations in SHOE 、 OML, roles in OIL – Functions are a special kind of relation Axioms : – Constraining information, verifying correctness, deducting new information – Also known as assertions (OML), rule, logic

11 Ontology (5/5) Instances : – Represent elements in the domain attached to a specific concept Measurement of the expressiveness : – XOL, RDF(s), SHOE, OML, OIL, DAML+OIL

12 XML (1/7) As a serialization syntax for other markup language, ex : SMIL 、 XOL 、 SHOE As semantic markup of Web-pages As a uniform data-exchange format

13 XML (2/7) Universal expressive power : anything can be encoded in XML if a grammar can be defined for it Syntactic interoperability : XML parser can parse any XML data and is usually a reusable component Semantic interoperability : there is no way of recognizing a semantic unit from a particular domain of interest (not yet widely recognized)

14 XML (3/7)

15 XML (4/7) Data exchange : – Build a model of the domain of interest – From the domain model a DTD or an XMLs is constructed Advantage : reusability of the parsing software components There exists multiple possibilities to encode a given domain model into a DTD, so the direct connection from the DTD to the domain model is lost and it cannot be easily reconstructed

16 XML (5/7)

17 XML (6/7) A direct mapping based on the different DTDs is not possible So we have to define the mappings between the different domain models, then between the different DTDs : – Reengineering of the original Domain Model from the DTD or XML Schema – Establishing mappings between the entities in the domain model – Defining translation procedures for XML Documents Using a more suitable formalism than pure XML can save much of the additional effort

18 XML (7/7)

19 Probabilistic DTD(1/11) Describes the most likely orderings of XML tags and that contains statistical properties for each tag Utilize association rule discovery algorithm and sequence mining techniques

20 Probabilistic DTD (2/11) Objectives : tagging all text documents and deriving an appropriate preliminary flat XML DTD – A knowledge discovery in textual databases (KDT) process to build clusters of semantically similar text units and then new documents can be converted into XML documents

21 Probabilistic DTD (3/11) UML schema : are initially conceived by experts serves as a reference for the DTD, but there is no guarantee that the final DTD will be contained in or contain this schema KDT process : – Tagging initial text documents – Domain knowledge constitutes such as thesaurus 、 preliminary UML schema, input to process – Pre-processing – Iterative clustering – Post-processing – Establishing a probabilistic DTD

22 Probabilistic DTD (4/11)

23 Probabilistic DTD (5/11) Pre-processing : – Setting the level of granularity – NLP processing such as tokenization 、 normalization 、 word stemming – Building text unit descriptors—a reduced feature space(now are chosen by engineer) – Mapping all text units into Boolean vectors of this feature space – Extract named entity

24 Probabilistic DTD (6/11) Clustering : – Performed in multiple iterations, each iteration outputs a set of clusters – All text unit vectors are clustered – Partition clusters into “acceptable” and “unacceptable” according to quality criteria – Members of “unacceptable” are input data to the next iteration

25 Probabilistic DTD (7/11) Post-processing : – “acceptable” clusters are semi-automatically assigned a label – Ultimately, cluster labels are determined by the engineer – All default cluster labels are derived from text unit descriptors – Automatically derived XML DTD from XML tags

26 Probabilistic DTD (8/11)

27 Probabilistic DTD (9/11) Establishing a probabilistic DTD : – Deriving the most likely ordering of the tags – Computing the statistically properties of each tag inside the document type definition Deriving the ordering of the tags – Backward Construction of DTD Sequences : builds “maximal” sequences – Forward sequence construction

28 Probabilistic DTD (10/11) Backward Construction of DTD Sequences – Starts with an arbitrary tag ﺡ and then identifies the tag most likely to appear before it – If no such tag exists, then shifts to the next sequence. If there is one, then the next iteration starts. If there are k tags, then duplicates k incomplete sequences. – Each tag X i leading to ﺡ with a confidence C i – If there is a C i larger than the others, then X i is the predecessor of ﺡ in the sequence – If C 0 where is the confidence where ﺡ has no predecessor is largest, then ﺡ is the first element – Confidence is the tag’s TagSupport multiplied by the accuracy

29 Probabilistic DTD (11/11)

30 References The Semantic Web—on the respective Roles of XML and RDF – Stefan Decker, Frank van Harmelen, Jeen Broekstra, Michael Erdmann, Dieter Fensel, Ian Horrocks, Michel Klein, Sergey Melnik Intelligent Information Agent with Ontology on the Semantic Web – Weihua Li Ontology Languages for the Semantic Web – Asuncion Gomez-Perez, Oscar Corcho Extraction of Semantic XML DTDs from Texts Using Data Mining Techniques – Karsten Winkler, Myra Spiliopoulou


Download ppt "XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References."

Similar presentations


Ads by Google