Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 XML and QUERY Shilpi Ahuja CSE 591 - Data Mining 4 th April 2002.

Similar presentations


Presentation on theme: "1 XML and QUERY Shilpi Ahuja CSE 591 - Data Mining 4 th April 2002."— Presentation transcript:

1 1 XML and QUERY Shilpi Ahuja CSE 591 - Data Mining 4 th April 2002

2 2 What is XML? ( Extensible Markup Language ) A Markup language for structured documentation. A Structural and Semantic language, not a formatting language Not just for Web pages

3 3 HTML vs. XML External Presentation Xaver Roe Wikingerrufer 7 10555 Berlin XML Xaver Roe Wikingerufer 7 Berlin HTML Xaver Roe Wikingerufer 7 Berlin

4 4 Why Extensible Markup Language ) Language It has a grammar It has a vocabulary (sort of) It can be parsed by machines Markup Language A mechanism to identify structures in a document. It says what things are; not what they do It is not a programming language It is not compiled Extensible You can add words to the language

5 5 XML describes structure and semantics, not formatting XML documents form a tree Element and attribute names reflect the kind of the element Formatting can be added with a style sheet

6 6 So Is XML Just Like HTML? Discussion Question ?

7 7 Answer : No In HTML, both the tag semantics and the tag set are fixed. XML specifies neither semantics nor a tag set XML lets you define your own tags HTML describes lay-out XML describes the structure of a document XML separates content from presentation

8 8 So IS XML Just Like SGML? No. Well, yes, sort of ! XML is a much-restricted form of SGML It is defined as an application profile of SGML. SGML is not well suited to serving documents over the web

9 9 So Why XML ? XML was created so that richly structured documents could be used over the web HTML -- Bound with a set of semantics, no arbitrary structure SGML provides arbitrary structure, but is too difficult to implement just for a web browser

10 10 What is the advantage of using XML ? Discussion Question ?

11 11 A Simple XML Document Extensible Markup Language Proposed Jane Doe, Staff Writer The newly proposed XML Specification has been making a splash in the community. The newly proposed XML draft stands to revolutionize the exchange of easily. No Notes XML 1.0 Recommendation Released John E. Doe, Reporter The W3C today released the final recommendation for XML XML Developers, are already using the released recommendation See www.w3c.org for more information

12 12 Characteristics The document begins with a processing instruction:. Open and close all tags Empty tags end with /> There is a unique root element Elements may not overlap Attribute values are quoted < and & are only used to start tags and entities

13 13 Elements Most common form of markup. Example: Article, Headline, Byline are all elements Delimited by angle brackets, most elements identify the nature of the content they surround. Some elements may be empty i.e they’ve no content. A non-empty element always begins with a start-tag,, and ends with an end-tag,.

14 14 Attributes Attributes are name-value pairs that occur inside tags after the element name. For example, is the Article element with the attribute Editor having the value Ernie Pyle. In XML, all attribute values must be quoted.

15 15 Entity References Entities are used to represent special characters like left angle bracket, “<” They’re also used to refer to often repeated or varying text and to include the content of external files. Every entity must have a unique name Entity references begin with the ampersand and end with a semicolon.

16 16 Declaring & Referencing Entities Using &NEWSPAPER anywhere in the document inserts “Vervet Logic Times” at that location. Internal entities allows you to define shortcuts for frequently typed text or text that is expected to change, such as the revision status of a document.

17 17 Comments Comments begin with “ ”. Comments can contain any data except the literal string “--”. Comments are not part of the textual content of an XML document. An XML processor is not required to pass them along to an application

18 18 DTD ( Document Type Definition ) Formally identifies the relationships between the various elements that form the document. Can express constraints on the sequence and nesting of tags. Can express constraints on attribute values and their types and defaults The names of external files that may be referenced, the formats of some external (non- XML) data that may be included, and entities that may be encountered.

19 19 <!ATTLIST ARTICLE AUTHOR CDATA #REQUIRED EDITOR CDATA #IMPLIED DATE CDATA #IMPLIED EDITION CDATA #IMPLIED> ELEMENT symbols * as many times as need + at least once ? once or not at all, must be in listed order | either one or other, any order ATRIBUTE option # REQUIRED – must be # IMPLIED – can be Attribute Data Type CDATA – character data ENUMARATED – list of values ID – Unique ID IDREF, IDREFS – referred value ENTITY, ENTITIES – binary data NMTOKEN, NMTOKENS, NOTATION ELEMENT Data Type # PCDATA – any characters The Newspaper DTD

20 20 Types of declarations in XML Element declarations Attribute list declarations Entity declarations Notation declarations.

21 21 Element Declarations Identifies the names of elements and the nature of their content Example An Article must contain Headline,Byline,Lead, Body and may contain Notes

22 22 ELEMENT DATA TYPE ( PCDATA ) Parseable character data Example : The vertical bar indicates an “or” relationship The asterisk indicates that the content is optional (may occur zero or more times) Byline may contain zero or more characters and quote tags.

23 23 Attribute Declarations Identify which elements may have attributes What attributes they may have What values the attributes may hold What default value each attribute has.

24 24 Attributes : Example <!ATTLIST ARTICLE AUTHOR ID #REQUIRED EDITOR CDATA #IMPLIED STATUS ( funny | notfunny ) 'funny'> Author, which is an ID and is required; Editor, which is a string is not required Status, which must be either funny or notfunny and defaults to funny if not specified.

25 25 Types of Attributes CDATA ID IDREF or IDREFS ENTITY or ENTITIES NMTOKEN or NMTOKENS A list of names

26 26 Types of Default Values #REQUIRED #IMPLIED "value" #FIXED "value"

27 27 Notation Declarations Identify specific types of external binary data. This information is passed to the processing application, which may make whatever use of it. A typical notation declaration is:

28 28 XML-QL: A Query Language for XML Designed in the AT&T Labs XML-QL has SELECT-WHERE construct, like SQL It borrows features of query languages recently developed by the database research community for semi-structured data. XML-QL can express queries, which extract pieces of data from XML documents

29 29 Features of XML-QL Declarative : like SQL. Relational complete : It can express joins. Easy implementation Data Extraction: XML-QL can extract data from existing XML documents and construct new XML documents. Views: Supports both ordered and unordered views on an XML document. Availability : XML-QL is implemented as a prototype and is freely available in a Java version.

30 30 Features of XML-QL Path Expressions: Supports partially specified path expressions. Building new Elements: Supports creation of new elements Combining Data Sources: Supports querying several data sources at the same time Negation: XML-QL doesn’t support negation Aggregation: Doesn’t support aggregate functions like min, max, sum, count and avg. Update Language: XML-QL doesn’t provide any support for insert, delete and update of elements

31 31 Queries in XML-QL Query 1: Produce all editors of the articles where author is John Doe Feature Exploited: Selection, Projection and Data Extraction on element values

32 32 Query Function query() { CONSTRUCT { WHERE "John Doe" $b IN "newspaper.xml" CONSTRUCT $b }

33 33 Query Output OUTPUT: Ernie Pyle

34 34 Explanation This query matches every element in the XML document newspaper.xml that has atleast one element and a element and author name is “John Doe”. For each such match, it binds the variable b to the editor. The result is the list of editors bound to b.

35 35 Discussion Question ? Can XML be used for things besides the Internet?


Download ppt "1 XML and QUERY Shilpi Ahuja CSE 591 - Data Mining 4 th April 2002."

Similar presentations


Ads by Google