Lecture 20 XML. 2 Objectives What semistructured data is. Concepts of the Object Exchange Model (OEM), a model for semistructured data. Basics of Lore,

Slides:



Advertisements
Similar presentations
What is XML? a meta language that allows you to create and format your own document markups a method for putting structured data into a text file; these.
Advertisements

XML: Extensible Markup Language
An Introduction to XML Based on the W3C XML Recommendations.
CS 898N – Advanced World Wide Web Technologies Lecture 21: XML Chin-Chih Chang
XML Schemas Microsoft XML Schemas W3C XML Schemas.
Introduction to XLink Transparency No. 1 XML Information Set W3C Recommendation 24 October 2001 (1stEdition) 4 February 2004 (2ndEdition) Cheng-Chia Chen.
XML A brief introduction ---by Yongzhu Li. XML --- a brief introduction 2 CSI668 Topics in System Architecture SUNY Albany Computer Science Department.
1 XML and QUERY Shilpi Ahuja CSE Data Mining 4 th April 2002.
XML Introduction What is XML –XML is the eXtensible Markup Language –Became a W3C Recommendation in 1998 –Tag-based syntax, like HTML –You get to make.
XML(EXtensible Markup Language). XML XML stands for EXtensible Markup Language. XML is a markup language much like HTML. XML was designed to describe.
Tutorial 11 Creating XML Document
Document Type Definitions. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:
Introducing XHTML: Module B: HTML to XHTML. Goals Understand how XHTML evolved as a language for Web delivery Understand the importance of DTDs Understand.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
Introducing HTML & XHTML:. Goals  Understand hyperlinking  Understand how tags are formed and used.  Understand HTML as a markup language  Understand.
4/20/2017.
ECA 228 Internet/Intranet Design I Intro to XML. ECA 228 Internet/Intranet Design I HTML markup language very loose standards browsers adjust for non-standard.
XML – Data Model, DTD and Schema
XP New Perspectives on XML Tutorial 4 1 XML Schema Tutorial – Carey ISBN Working with Namespaces and Schemas.
XP New Perspectives on XML Tutorial 3 1 DTD Tutorial – Carey ISBN
Database Systems Part VII: XML
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Document Type Definition.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 XML Taken from Chapter 7.
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
Chapter 10: XML.
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
MIS 315 Bsharah An Introduction to XML 1MIS Bsharah.
XML What is XML? XML v.s. HTML XML Components Well-formed and Valid Document Type Definition (DTD) Extensible Style Language (XSL) SAX and DOM.
CISC 3140 (CIS 20.2) Design & Implementation of Software Application II Instructor : M. Meyer Address: Course Page:
1 © Netskills Quality Internet Training, University of Newcastle Introducing XML © Netskills, Quality Internet Training University.
Introduction to XML. XML - Connectivity is Key Need for customized page layout – e.g. filter to display only recent data Downloadable product comparisons.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
FIGIS’ML Hands-on training - © FAO/FIGIS An introduction to XML Objectives : –what is XML? –XML and HTML –XML documents structure well-formedness.
 XML is designed to describe data and to focus on what data is. HTML is designed to display data and to focus on how data looks.  XML is created to structure,
Intro. to XML & XML DB Bun Yue Professor, CS/CIS UHCL.
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
Avoid using attributes? Some of the problems using attributes: Attributes cannot contain multiple values (child elements can) Attributes are not easily.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
Softsmith Infotech XML. Softsmith Infotech XML EXtensible Markup Language XML is a markup language much like HTML Designed to carry data, not to display.
Of 33 lecture 3: xml and xml schema. of 33 XML, RDF, RDF Schema overview XML – simple introduction and XML Schema RDF – basics, language RDF Schema –
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation An Introduction to XML.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
XML Instructor: Charles Moen CSCI/CINF XML  Extensible Markup Language  A set of rules that allow you to create your own markup language  Designed.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
XML Name: Niki Sardjono Class: CS 157A Instructor : Prof. S. M. Lee.
An Introduction to XML Sandeep Bhattaram
XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like.
The eXtensible Markup Language (XML). Presentation Outline Part 1: The basics of creating an XML document Part 2: Developing constraints for a well formed.
Chapter 23 XML. 2 Introduction  XML: eXtensible Markup Language (What is a Markup language?)  Defined by the WWW Consortium (W3C)  Originally intended.
XML 2nd EDITION Tutorial 4 Working With Schemas. XP Schemas A schema is an XML document that defines the content and structure of one or more XML documents.
1 Tutorial 14 Validating Documents with Schemas Exploring the XML Schema Vocabulary.
Tutorial 13 Validating Documents with Schemas
XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.
Internet & World Wide Web How to Program, 5/e. © by Pearson Education, Inc. All Rights Reserved.2.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
XML CSC1310 Fall HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December  Markup  Markup is a symbol.
Document Type Definition (DTD) Eugenia Fernandez IUPUI.
C Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Introduction to XML Standards.
Introduction to XML Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
XML Extensible Markup Language
1 Introduction to XML Babak Esfandiari. 2 What is XML? introduced by W3C in 98 Stands for eXtensible Markup Language it is more general than HTML, but.
XML QUESTIONS AND ANSWERS
CSE591: Data Mining by H. Liu
New Perspectives on XML
Presentation transcript:

Lecture 20 XML

2 Objectives What semistructured data is. Concepts of the Object Exchange Model (OEM), a model for semistructured data. Basics of Lore, a semistructured DBMS, and its query language, Lorel. Main language elements of XML. Difference between well-formed and valid XML documents. How Document Type Definitions (DTDs) can be used to define valid syntax of an XML document.

3 Objectives How Document Object Model (DOM) compares with OEM. About other related XML technologies. Limitations of DTDs and how XML Schema overcomes these limitations. How RDF and RDF Schema provide a foundation for processing metadata. W3C XQuery Language. How to map XML to databases. SQL:2003 support for XML.

4 Introduction In 1998 XML 1.0 was formally ratified by W3C. Yet, has impacted every aspect of programming including graphical interfaces, embedded systems, distributed systems, and database management. Already becoming de facto standard for data communication within software industry, and is quickly replacing EDI systems as primary medium for data interchange among businesses. Some analysts believe it will become language in which most documents are created and stored, both on and off Internet.

5 Semistructured Data Data that may be irregular or incomplete and have a structure that may change rapidly or unpredictably. Semistructured data is data that has some structure, but structure may not be rigid, regular, or complete. Generally, data does not conform to fixed schema (sometimes use terms schema-less or self-describing).

6 Semistructured Data Information normally associated with schema is contained within data itself. Some forms of semistructured data have no separate schema, in others it exists but only places loose constraints on data. Unfortunately, relational, object-oriented, and object-relational DBMSs do not handle data of this nature particularly well.

7 Semistructured Data Has gained importance recently for various reasons: may be desirable to treat Web sources like a database, but cannot constrain these sources with a schema; may be desirable to have a flexible format for data exchange between disparate databases; emergence of XML as standard for data representation and exchange on the Web, and similarity between XML documents and semistructured data.

8 Example 31.1

9 Note, data is not regular: for John White, hold first and last names, but for Ann Beech store single name and also store a salary; for property at 2 Manor Rd, store a monthly rent whereas for property at 18 Dale Rd, store an annual rent; for property at 2 Manor Rd, store property type (flat) as a string, whereas for property at 18 Dale Rd, store type (house) as an integer value.

10 Example 31.1

11 XML (eXtensible Markup Language) A meta-language (a language for describing other languages) that enables designers to create their own customized tags to provide functionality not available with HTML. Most documents on Web currently stored and transmitted in HTML. One strength of HTML is its simplicity. Simplicity may also be one of its weaknesses, with users wanting tags to simplify some tasks and make HTML documents more attractive and dynamic.

12 XML To satisfy this demand, vendors introduced some browser-specific HTML tags, making it difficult to develop sophisticated, widely viewable Web documents. W3C has produced XML, which could preserve general application independence that makes HTML portable and powerful.

13 XML XML is a restricted version of SGML, designed especially for Web documents. SGML allows document to be logically separated into two: one that defines the structure of the document (DTD), other containing the text itself. By giving documents a separately defined structure, and by giving authors ability to define custom structures, SGML provides extremely powerful document management system. However, SGML has not been widely adopted due to its inherent complexity.

14 XML XML attempts to provide a similar function to SGML, but is less complex and, at same time, network-aware. XML retains key SGML advantages of extensibility, structure, and validation. Since XML is a restricted form of SGML, any fully compliant SGML system will be able to read XML documents (although the opposite is not true). XML is not intended as a replacement for SGML or HTML.

15 Advantages of XML Simplicity Open standard and platform/vendor- independent Extensibility Reuse Separation of content and presentation Improved load balancing

16 Advantages of XML Support for integration of data from multiple sources Ability to describe data from a wide variety of applications More advanced search engines New opportunities.

17 XML

18 XML - Elements Elements, or tags, are most common form of markup. First element must be a root element, which can contain other (sub)elements. XML document must have one root element (. Element begins with start-tag ( ) and ends with end-tag ( ). XML elements are case sensitive An element can be empty, in which case it can be abbreviated to. Elements must be properly nested.

19 XML - Attributes Attributes are name-value pairs that contain descriptive information about an element. Attribute is placed inside start-tag after corresponding element name with the attribute value enclosed in quotes. Could also have represented branch as subelement of STAFF. A given attribute may only occur once within a tag, while subelements with same tag may be repeated.

20 XML – Other Sections XML declaration: optional at start of XML document. Entity references: serve various purposes, such as shortcuts to often repeated text or to distinguish reserved characters from content. Comments: enclosed in tags. CDATA sections: instructs XML processor to ignore markup characters and pass enclosed text directly to application. Processing instructions: can also be used to provide information to application.

21 XML – Ordering Semistructured data model described earlier assumes collections are unordered. In XML, elements are ordered. In contrast, in XML attributes are unordered.

22 Document Type Definitions (DTDs) Defines the valid syntax of an XML document. Lists element names that can occur in document, which elements can appear in combination with which other ones, how elements can be nested, what attributes are available for each element type, and so on. Term vocabulary sometimes used to refer to the elements used in a particular application. Grammar specified using EBNF, not XML. Although optional, DTD is recommended for document conformity.

23 Document Type Definitions (DTDs)

24 DTDs – Element Type Declarations Identify the rules for elements that can occur in the XML document. Options for repetition are: * indicates zero or more occurrences for an element; + indicates one or more occurrences for an element; ? indicates either zero occurrences or exactly one occurrence for an element. Name with no qualifying punctuation must occur exactly once. Commas between element names indicate they must occur in succession; if commas omitted, elements can occur in any order.

25 DTDs – Attribute List Declarations Identify which elements may have attributes, what attributes they may have, what values attributes may hold, plus optional defaults. Some types: CDATA: character data, containing any text. ID: used to identify individual elements in document (ID is an element name). IDREF/IDREFS: must correspond to value of ID attribute(s) for some element in document. List of names: values that attribute can hold (enumerated type).

26 DTDs – Element Identity, IDs, IDREFs ID allows unique key to be associated with an element. IDREF allows an element to refer to another element with the designated key, and attribute type IDREFS allows an element to refer to multiple elements. To loosely model relationship Branch Has Staff:

27 DTDs – Document Validity Two levels of document processing: well-formed and valid. Non-validating processor ensures an XML document is well-formed before passing information on to application. XML document that conforms to structural and notational rules of XML is considered well-formed; e.g.: document must start with ; all elements must be within one root element; elements must be nested in a tree structure without any overlap;

28 DTDs – Document Validity Validating processor will not only check that an XML document is well-formed but that it also conforms to a DTD, in which case XML document is considered valid.

29 DOM and SAX XML APIs generally fall into two categories: tree-based and event-based. DOM (Document Object Model) is tree-based API that provides object-oriented view of data. API was created by W3C and describes a set of platform- and language-neutral interfaces that can represent any well-formed XML/HTML document. Builds in-memory representation of document and provides classes and methods to allow an application to navigate and process the tree.

30 Representation of Document as Tree-Structure

31 SAX (Simple API for XML) An event-based, serial-access API that uses callbacks to report parsing events to application. For example, there are events for start and end elements. Application handles these events through customized event handlers. Unlike tree-based APIs, event-based APIs do not built an in-memory tree representation of the XML document. API product of collaboration on XML-DEV mailing list, rather than product of W3C.

32 Namespaces Allows element names and relationships in XML documents to be qualified to avoid name collisions for elements that have same name but defined in different vocabularies. Allows tags from multiple namespaces to be mixed - essential if data comes from multiple sources. For uniqueness, elements and attributes given globally unique names using URI reference.

33 Namespaces <STAFFLIST xmlns=“ xmlns:hq = “ SL21 … 30000