2/25/2016XML1 XML for Beginners Sridevi 1. Basic XML Concepts 2. Defining XML Data Formats 3. Querying XML Data.

Slides:



Advertisements
Similar presentations
Defining XML The Document Type Definition. Document Type Definition text syntax for defining –elements of XML –attributes (and possibly default values)
Advertisements

What is XML? a meta language that allows you to create and format your own document markups a method for putting structured data into a text file; these.
XML Document Type Definitions ( DTD ). 1.Introduction to DTD An XML document may have an optional DTD, which defines the document’s grammar. Since the.
1 XML DTD & XML Schema Monica Farrow G30
CS 898N – Advanced World Wide Web Technologies Lecture 21: XML Chin-Chih Chang
A Technical Introduction to XML Transparency No. 1 XML quick References.
Semi-structured Data. Facts about the Web Growing fast Popular Semi-structured data –Data is presented for ‘human’-processing –Data is often ‘self-describing’
XML Introduction What is XML –XML is the eXtensible Markup Language –Became a W3C Recommendation in 1998 –Tag-based syntax, like HTML –You get to make.
XML(EXtensible Markup Language). XML XML stands for EXtensible Markup Language. XML is a markup language much like HTML. XML was designed to describe.
Tutorial 11 Creating XML Document
Document Type Definitions. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:
April 29th, 2003Organizing and Searching Information with XML1 XML for Beginners Ralf Schenkel 1. XML – the Snake Oil of the Internet age? 2. Basic XML.
Introduction to XML This material is based heavily on the tutorial by the same name at
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
4/20/2017.
ECA 228 Internet/Intranet Design I Intro to XML. ECA 228 Internet/Intranet Design I HTML markup language very loose standards browsers adjust for non-standard.
XP New Perspectives on XML Tutorial 4 1 XML Schema Tutorial – Carey ISBN Working with Namespaces and Schemas.
XML – Extensible Markup Language Sivakumar Kuttuva & Janusz Zalewski.
XML eXtensible Markup Language Prof. Muhammad Saeed.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Document Type Definition.
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
Dr. Azeddine Chikh IS446: Internet Software Development.
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
XML XML Web Applications 1. XML – XML is not…. 2. Basic XML Concepts 3. Defining XML Data Formats 4. Querying XML Data.
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
Introduction to XML Eugenia Fernandez IUPUI. What is XML? From the World Wide Web Consortium (W3C) The Extensible Markup Language (XML) is the universal.
XML Language Family Detailed Examples Most information contained in these slide comes from: These slides are intended.
Document Type Definitions Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
1 © Netskills Quality Internet Training, University of Newcastle Introducing XML © Netskills, Quality Internet Training University.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
XML Extensible Markup Language. What is XML? ● meta-markup language ● a language for defining a family of languages ● semantic/structured mark-up language.
XML Syntax - Writing XML and Designing DTD's
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
XML TUTORIAL Portions from w3 schools By Dr. John Abraham.
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
Lecture 6 XML DTD Content of.xml fileContent of.dtd file.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
Of 33 lecture 3: xml and xml schema. of 33 XML, RDF, RDF Schema overview XML – simple introduction and XML Schema RDF – basics, language RDF Schema –
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation An Introduction to XML.
XML Documents Chao-Hsien Chu, Ph.D. School of Information Sciences and Technology The Pennsylvania State University Elements Attributes Comments PI Document.
SS 2004Informationssysteme17-1 Informationssysteme Kapitel 17 – Einführung in XML 17.1 XML? 17.2 Beispielanwendungen für XML 17.3 Grundlagen von XML 17.4.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
XML Instructor: Charles Moen CSCI/CINF XML  Extensible Markup Language  A set of rules that allow you to create your own markup language  Designed.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
An Introduction to XML Sandeep Bhattaram
XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like.
Sheet 1XML Technology in E-Commerce 2001Lecture 2 XML Technology in E-Commerce Lecture 2 Logical and Physical Structure, Validity, DTD, XML Schema.
Tutorial 13 Validating Documents with Schemas
CS 157B: Database Management Systems II February 11 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
QUALITY CONTROL WITH SCHEMAS CSC1310 Fall BASIS CONCEPTS SchemaSchema is a pass-or-fail test for document Schema is a minimum set of requirements.
When we create.rtf document apart from saving the actual info the tool saves additional info like start of a paragraph, bold, size of the font.. Etc. This.
Introduction to DTD A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list.
Martin Kruliš by Martin Kruliš (v1.1)1.
C Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Introduction to XML Standards.
 XML derives its strength from a variety of supporting technologies.  Structure and data types: When using XML to exchange data among clients, partners,
Introduction to XML Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
XML Introduction to XML Extensible Markup Language.
Matrix Institute of Information Technology (Pvt) Ltd XML BIT/ SEM1/WAD.
XML Notes taken from w3schools. What is XML? XML stands for EXtensible Markup Language. XML was designed to store and transport data. XML was designed.
CITA 330 Section 2 DTD. Defining XML Dialects “Well-formedness” is the minimal requirement for an XML document; all XML parsers can check it Any useful.
Extensible Markup Language (XML) Pat Morin COMP 2405.
Unit 4 Representing Web Data: XML
XML QUESTIONS AND ANSWERS
Chapter 7 Representing Web Data: XML
CSE591: Data Mining by H. Liu
Presentation transcript:

2/25/2016XML1 XML for Beginners Sridevi 1. Basic XML Concepts 2. Defining XML Data Formats 3. Querying XML Data

2/25/2016XML2 XML is not… A replacement for HTML (but HTML can be generated from XML) A presentation format (but XML can be converted into one) A programming language (but it can be used with almost any language) A network transfer protocol (but XML may be transferred over a network) A database (but XML may be stored into a database)

2/25/2016XML3 But then – what is it? XML is a meta markup language for text documents / textual data XML allows to define languages („applications“) to represent text documents / textual data

2/25/2016XML4 XML by Example Gerhard Weikum The Web in 10 Years Easy to understand for human users Very expressive (semantics along with the data) Well structured, easy to read and write from programs This looks nice, but…

2/25/2016XML5 XML by Example Gerhard Weikum The Web in 10 Years Hard to understand for human users Not expressive (no semantics along with the data) Well structured, easy to read and write from programs … this is XML, too:

2/25/2016XML6 XML by Example ch37fhgks73j5mv9d63h5mgfkds8d984lgnsmcns983 Impossible to understand for human users Not expressive (no semantics along with the data) Unstructured, read and write only with special programs … and what about this XML document: The actual benefit of using XML highly depends on the design of the application.

2/25/2016XML7 Possible Advantages of Using XML Truly Portable Data Easily readable by human users Very expressive (semantics near data) Very flexible and customizable (no finite tag set) Easy to use from programs (libs available) Easy to convert into other representations (XML transformation languages) Many additional standards and tools Widely used and supported

2/25/2016XML8 App. Scenario 1: Content Mgt. Database with XML documents Clients Converters XML2HTMLXML2WMLXML2PDF

2/25/2016XML9 App. Scenario 2: Data Exchange Legacy System (e.g., SAP R/2) Legacy System (e.g., Cobol) XML Adapter XML (BMECat, ebXML, RosettaNet, BizTalk, …) SupplierBuyer Order

2/25/2016XML10 App. Scenario 3: XML for Metadata <rdf:RDF A Framework for… Ralf Schenkel While there are... Saarland University XML Indexing Copyright... Electronic Document text/pdf en

2/25/2016XML11 App. Scenario 4: Document Markup This article is about XML. Weikum shows the following theorem (see Section ) For any XML document x,... Weikum

2/25/2016XML12 App. Scenario 4: Document Markup Document Markup adds structural and semantic information to documents, e.g. –Sections, Subsections, Theorems, … –Cross References –Literature Citations –Index Entries –Named Entities This allows queries like –Which articles cite Weikum‘s XML paper from 2001? –Which articles talk about (the named entity) „Weikum“?

2/25/2016XML13 XML for Beginners Part 2 – Basic XML Concepts 2.1 XML Standards by the W3C 2.2 XML Documents 2.3 Namespaces

2/25/2016XML XML Standards – an Overview XML Core Working Group: –XML 1.0 (Feb 1998), 1.1 (candidate for recommendation) –XML Namespaces (Jan 1999) –XML Inclusion (candidate for recommendation) XSLT Working Group: –XSL Transformations 1.0 (Nov 1999), 2.0 planned –XPath 1.0 (Nov 1999), 2.0 planned –eXtensible Stylesheet Language XSL(-FO) 1.0 (Oct 2001) XML Linking Working Group: –XLink 1.0 (Jun 2001) –XPointer 1.0 (March 2003, 3 substandards) XQuery 1.0 (Nov 2002) plus many substandards XMLSchema 1.0 (May 2001) …

2/25/2016XML XML Documents What‘s in an XML document? Elements Attributes plus some other details (see the Lecture if you want to know this)

2/25/2016XML16 A Simple XML Document Gerhard Weikum The Web in Ten Years In order to evolve... The Web provides the universal...

2/25/2016XML17 A Simple XML Document Gerhard Weikum The Web in Ten Years In order to evolve... The Web provides the universal... Freely definable tags

2/25/2016XML18 Element Content of the Element (Subelements and/or Text) A Simple XML Document Gerhard Weikum The Web in Ten Years In order to evolve... The Web provides the universal... End Tag Start Tag

2/25/2016XML19 A Simple XML Document Gerhard Weikum The Web in Ten Years In order to evolve... The Web provides the universal... Attributes with name and value

2/25/2016XML20 Elements in XML Documents ( Freely definable) tags: article, title, author – with start tag: etc. – and end tag: etc. Elements:... Elements have a name ( article ) and a content (... ) Elements may be nested. Elements may be empty: Element content is typically parsed character data (PCDATA), i.e., strings with special characters, and/or nested elements (mixed content if both). Each XML document has exactly one root element and forms a tree. Elements with a common parent are ordered.

2/25/2016XML21 Elements vs. Attributes Elements may have attributes (in the start tag) that have a name and a value, e.g.. What is the difference between elements and attributes?  Only one attribute with a given name per element (but an arbitrary number of subelements)  Attributes have no structure, simply strings (while elements can have subelements)

Elements vs. Attributes As a rule of thumb: Content into elements Metadata into attributes Example: Alan Turing proved that… 2/25/2016XML22

2/25/2016XML23 XML Documents as Ordered Trees article authortitletext sectionabstract The index Web provides … title=“…“ number=“1“ In order … Gerhard Weikum The Web in 10 years

2/25/2016XML24 More on XML Syntax Some special characters must be escaped using entities: < → < & → & (will be converted back when reading the XML doc) Some other characters may be escaped, too: > → > “ → " ‘ → &apos;

2/25/2016XML25 Well-Formed XML Documents A well-formed document must adher to, among others, the following rules: Every start tag has a matching end tag. Elements may nest, but must not overlap. There must be exactly one root element. Attribute values must be quoted. An element may not have two attributes with the same name. Comments and processing instructions may not appear inside tags. No unescaped < or & signs may occur inside character data.

2/25/2016XML26 Well-Formed XML Documents A well-formed document must adher to, among others, the following rules: Every start tag has a matching end tag. Elements may nest, but must not overlap. There must be exactly one root element. Attribute values must be quoted. An element may not have to attributes with the same name. Comments and processing instructions may not appear inside tags. No unescaped < or & signs may occur inside character data. Only well-formed documents can be processed by XML parsers.

2/25/2016XML Namespaces Library of the CS Department Principles of Data Mining Short introduction to data mining, useful for the IRDM course Semantics of the description element is ambigous Content may be defined differently Renaming may be impossible (standards!)  Disambiguation of separate XML applications using unique prefixes

2/25/2016XML28 Namespace Syntax Unique URI to identify the namespace Signal that namespace definition happens Prefix as abbrevation of URI

2/25/2016XML29 Namespace Example......

2/25/2016XML30 Default Namespace Default namespace may be set for an element and its content (but not its attributes):... Can be overridden in the elements by specifying the namespace there (using prefix or default namespace)

2/25/2016XML31 XML for Beginners Part 3 – Defining XML Data Formats 3.1 Document Type Definitions 3.2 XML Schema (very short)

2/25/2016XML Document Type Definitions Sometimes XML is too flexible: Most Programs can only process a subset of all possible XML applications For exchanging data, the format (i.e., elements, attributes and their semantics) must be fixed

Document Type Definitions (DTD)  Document Type Definitions (DTD) for establishing the vocabulary for one XML application  (in some sense comparable to schemas in databases) A document is valid with respect to a DTD if it conforms to the rules specified in that DTD. Most XML parsers can be configured to validate. 2/25/2016XML33

2/25/2016XML34 DTD Example: Elements Content of the title element is parsed character data Content of the article element is a title element, followed by one or more author elements, followed by a text element Content of the text element may contain zero or more section elements in this position

2/25/2016XML35 Element Declarations in DTDs One element declaration for each element type: where content_specification can be (#PCDATA) parsed character data (child) one child element (c1,…,cn) a sequence of child elements c1…cn (c1|…|cn) one of the elements c1…cn For each component c, possible counts can be specified: –cexactly one such element –c+ one or more –c* zero or more –c? zero or one Plus arbitrary combinations using parenthesis:

2/25/2016XML36 More on Element Declarations Elements with mixed content: Elements with empty content: Elements with arbitrary content (this is nothing for production-level DTDs):

2/25/2016XML37 Attribute Declarations in DTDs Attributes are declared per element: <!ATTLIST section number CDATA #REQUIRED title CDATA #REQUIRED> declares two required attributes for element section. element name attribute name attribute type attribute default

2/25/2016XML38 Attribute Declarations in DTDs Attributes are declared per element: <!ATTLIST section number CDATA #REQUIRED title CDATA #REQUIRED> declares two required attributes for element section. Possible attribute defaults: #REQUIRED is required in each element instance #IMPLIED is optional #FIXED default always has this default value default has this default value if the attribute is omitted from the element instance

2/25/2016XML39 Attribute Types in DTDs CDATA string data (A1|…|An) enumeration of all possible values of the attribute (each is XML name) ID unique XML name to identify the element IDREF refers to ID attribute of some other element („intra-document link“) IDREFS list of IDREF, separated by white space plus some more

2/25/2016XML40 Attribute Examples <ATTLIST publication type (journal|inproceedings) #REQUIRED pubid ID #REQUIRED> <ATTLIST citation ref IDREF #IMPLIED cid ID #REQUIRED> Gerhard Weikum In the Web of 2010, XML XML, the extended Markup Language,...

2/25/2016XML41 Attribute Examples <ATTLIST publication type (journal|inproceedings) #REQUIRED pubid ID #REQUIRED> <ATTLIST citation ref IDREF #IMPLIED cid ID #REQUIRED> Gerhard Weikum In the Web of 2010, XML XML, the extended Markup Language,...

2/25/2016XML42 Linking DTD and XML Docs Document Type Declaration in the XML document: keywordsRoot elementURI for the DTD

2/25/2016XML43 Linking DTD and XML Docs Internal DTD: <!DOCTYPE article [... ]>... Both ways can be mixed, internal DTD overwrites external entity information: <!DOCTYPE article SYSTEM „article.dtd“ [ <!ENTITY % pub_content (title+,author*,text) ]>

2/25/2016XML44 Flaws of DTDs No support for basic data types like integers, doubles, dates, times, … No structured, self-definable data types No type derivation id/idref links are quite loose (target is not specified)  XML Schema

3.2 XML Schema Basics XML Schema is an XML-based alternative to DTD. An XML schema describes the structure of an XML document. The XML Schema language is also referred to as XML Schema Definition (XSD). 2/25/2016XML45

What is an XML Schema? The purpose of an XML Schema is to define the legal building blocks of an XML document, just like a DTD. An XML Schema: –defines elements that can appear in a document –defines attributes that can appear in a document –defines which elements are child elements –defines the order of child elements –defines the number of child elements –defines whether an element is empty or can include text –defines data types for elements and attributes –defines default and fixed values for elements and attributes 2/25/2016XML46

XML Schemas are the Successors of DTDs We think that very soon XML Schemas will be used in most Web applications as a replacement for DTDs. Here are some reasons: –XML Schemas are extensible to future additions –XML Schemas are richer and more powerful than DTDs –XML Schemas are written in XML –XML Schemas support data types –XML Schemas support namespaces 2/25/2016XML47

2/25/2016XML48 XML Schema XML Schema is an XML application Provides simple types (string, integer, dateTime, duration, language, …) Allows defining possible values for elements Allows defining types derived from existing types Allows defining complex types Allows posing constraints on the occurrence of elements Allows forcing uniqueness and foreign keys Way too complex to cover in an introductory talk

2/25/2016XML49 Simplified XML Schema Example <xs:element name=“section“ type=“xs:string“ minOccurs=“0“ maxOccurs=“unbounded“/>

Thank you 2/25/2016XML50