Enterprise Database Systems XML eXtended Markup Language Dr. Georgia Garani Dr. Theodoros Mitakos Technological.

Slides:



Advertisements
Similar presentations
XML: Extensible Markup Language
Advertisements

XML DOCUMENTS AND DATABASES
XML 6.3 DTD 6. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:  Elements.
XML: Extensible Markup Language. Slide Chapter Outline Introduction Structured, Semi structured, and Unstructured Data. XML Hierarchical (Tree)
CS 898N – Advanced World Wide Web Technologies Lecture 21: XML Chin-Chih Chang
Document Type Definitions
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
1 COS 425: Database and Information Management Systems XML and information exchange.
4/17/2017.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Document Type Definitions. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
4/20/2017.
ECA 228 Internet/Intranet Design I Intro to XML. ECA 228 Internet/Intranet Design I HTML markup language very loose standards browsers adjust for non-standard.
XML – Data Model, DTD and Schema
XP New Perspectives on XML Tutorial 3 1 DTD Tutorial – Carey ISBN
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Document Type Definition.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 XML Taken from Chapter 7.
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
Chapter 26 XML and Internet Databases Copyright © 2004 Pearson Education, Inc.
1 © Netskills Quality Internet Training, University of Newcastle Introducing XML © Netskills, Quality Internet Training University.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
XP 1 DECLARING A DTD A DTD can be used to: –Ensure all required elements are present in the document –Prevent undefined elements from being used –Enforce.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
 XML is designed to describe data and to focus on what data is. HTML is designed to display data and to focus on how data looks.  XML is created to structure,
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
Electronic Commerce COMP3210 Session 4: Designing, Building and Evaluating e-Commerce Initiatives – Part II Dr. Paul Walcott Department of Computer Science,
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
Avoid using attributes? Some of the problems using attributes: Attributes cannot contain multiple values (child elements can) Attributes are not easily.
Winter 2006Keller, Ullman, Cushing18–1 Plan 1.Information integration: important new application that motivates what follows. 2.Semistructured data: a.
XML – An Introduction Structured Data Mark-up James McCartney CSCE 590, Cluster and Grid Computing.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 27 XML: Extensible Markup Language.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
XML Instructor: Charles Moen CSCI/CINF XML  Extensible Markup Language  A set of rules that allow you to create your own markup language  Designed.
XP 1 Creating an XML Document Developing an XML Document for the Jazz Warehouse XML Tutorial.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 26-2 Introduction Although.
Web Technologies COMP6115 Session 4: Adding a Database to a Web Site Dr. Paul Walcott Department of Computer Science, Mathematics and Physics University.
[ Part III of The XML seminar ] Presenter: Xiaogeng Zhao A Introduction of XQL.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
An Introduction to XML Sandeep Bhattaram
XML and databases Chap. 12. Databases Today Data today: Structured - Info in databases – Data organized into chunks, similar entities groups together.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
XML and Database.
INFSY 547: WEB-Based Technologies Gayle J Yaverbaum, PhD Professor of Information Systems Penn State Harrisburg.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
XML CSC1310 Fall HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December  Markup  Markup is a symbol.
Martin Kruliš by Martin Kruliš (v1.1)1.
C Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Introduction to XML Standards.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
XML Extensible Markup Language
XML Databases Presented By: Pardeep MT15042 Anurag Goel MT15006.
XML: Extensible Markup Language Abeiku Duncan Leslie Salami Anthony Mensah-Kumah Michelle Amarteifio Joseph Owusu-Badu.
Extensible Markup Language (XML) Pat Morin COMP 2405.
Enterprise Database Systems XML eXtended Markup Language
CSE202 Database Management Systems
XML: Extensible Markup Language
Unit 4 Representing Web Data: XML
XML QUESTIONS AND ANSWERS
XML: Extensible Markup Language
XML in Web Technologies
Database Processing with XML
Chapter 7 Representing Web Data: XML
XML Data Introduction, Well-formed XML.
New Perspectives on XML
Presentation transcript:

Enterprise Database Systems XML eXtended Markup Language Dr. Georgia Garani Dr. Theodoros Mitakos Technological Educational Institution of Larissa in collaboration with Staffordshire University Larissa 2006

Agenda Structured, semistructured, unstructured data XML Data Model XML Documents, DTD, XML SCHEMA XML and Databases

Internet Architectures (Two tier, three tier) Introduction Client Presentation logic Business logic Server Data processing Monolithic Presentation Logic Business Logic Data Processing Thin client Presentation logic Application Server Business logic Data Server Data processing

Hyperlink Documents - Web languages - Tag languages HTML (Hypertext markup Language) Formatting and structuring web documents Formatting and structuring web documentsXML Structuring and exchanging data over the Web (structure and meaning). Structuring and exchanging data over the Web (structure and meaning). Formating aspects are defined separately by XSL (Extended Stylesheet Language) Formating aspects are defined separately by XSL (Extended Stylesheet Language)

Structured data Data that have a strict format e.g. data that are stored in a relational database table (the same format for all records in a table) We design the schema and DBMS checks to ensure that all data follows the structures and constraints specified in the schema.

Semistructured data In some applications data is collected before it is known how it will be stored and managed. This data may have a structure but not all the information collected will have identical strucuture. E.g. Some attributes may be shared among the various entities but other attributes may exist only in few entities. Moreover additional attribues can be introduced in some of the newer data items in any time and there is no predefined schema. This type of data is known as semistructured data.

Difference between structured and semistructured data In semistructured data, the schema information is mixed in with the data values, since each data object can have different attributes that are not known in advance. This type of data is called self described data.

Semistructured data as a directed graph rojects project name number worker workerr ssn name hours ssn hours name location Product x1 bellaire 123john mary 25

The schema information in the semistructured model is intermixed with the objects and their data values in the same data structure. In the semistructured model there is no requirement for a predefined schema to which the data objects must conform

Unstructured data In this category of data there is a very limited indication of the type of the data. E.g. a text document that contains information embedded within it. E.g. a text document that contains information embedded within it.

<td width=197 valign=top style='width:147.6pt;border:solid windowtext 1.0pt; <td width=197 valign=top style='width:147.6pt;border:solid windowtext 1.0pt; border-top:none;mso-border-top-alt:solid windowtext.5pt;mso-border-alt:solid windowtext.5pt; border-top:none;mso-border-top-alt:solid windowtext.5pt;mso-border-alt:solid windowtext.5pt; padding:0cm 5.4pt 0cm 5.4pt'> padding:0cm 5.4pt 0cm 5.4pt'> 3 3 <td width=197 valign=top style='width:147.6pt;border-top:none;border-left: <td width=197 valign=top style='width:147.6pt;border-top:none;border-left: none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; mso-border-top-alt:solid windowtext.5pt;mso-border-left-alt:solid windowtext.5pt; mso-border-top-alt:solid windowtext.5pt;mso-border-left-alt:solid windowtext.5pt; mso-border-alt:solid windowtext.5pt;padding:0cm 5.4pt 0cm 5.4pt'> mso-border-alt:solid windowtext.5pt;padding:0cm 5.4pt 0cm 5.4pt'> Kate Kate </table> SEMESTER A SEMESTER A </div></body></html>

HTML Web pages with html are considered unstructured data. Text that appears between angled brackets, is an HTML tag A tag with a backslash indicates an end tag which represents ending of the effect of a matching start tag. The tags mark up the document in order to instruct an HTML processor how to display the text between a start tag and a matching end tag HTML has a very big number of tags but HTML documents are very difficult to interpret automatically by computer programs because they do not include schema information about the type of data in the documents.

Example tags … … <body>…</body> Attributes describe addiotional properties of the tag. <tr>

Example <projects><project> Product x Product x 1 1 bellaire bellaire john john mary mary <project>...</project>...</projects>

XML tree data model Elements and attributes As in HTML elements are identified in a document by their start tag and end tag. The tag names are enclosed between angled brackets, and end tags are identified by a backslash,. Complex elements are constructed from other elements hierarchically whereas simple elements contain data values.A major difference between XML and HTML is that XML tag names are defined to describe the meaning of the data elements in the document rather than to display how the test is to be displayed. An XML document can be represented as a tree structure. XML attributes are used to describe properties and characteristics of the elements within which they appear.

Types of XML Documents Data-centric XML documents: These documents have man small data items that follow a specific structure and hence may be extracted from a structured database. They are formatted as XML documents in order to exchange them or display them over the web. Document centric XML documents: These are documents with large amounts of text, such as news aticles or books. There are few or no structured data elements in these documents Hybrid XML documents: These may have parts that contain structured data and other parts that are predominantly textual or unstructured.

An XML DTD <!DOCTYPE projects [ ]>

DTD If an XML document conforms to a predefined XML schema or DTD then the document can be considered as structured data XML documents that do not conform to any schema are considered as semistructured data. These are called schemaless XML documents.

Well formed XML documents It must be syntactically correct. It must follow the syntactic guidelines of the tree model. There must be a single root element and every element must include a matching pair of start and end tags within the start and end tags of the parent element. There must be a single root element and every element must include a matching pair of start and end tags within the start and end tags of the parent element. A standard set of API functions called DOM (Document Object Model) allows programs to manipulate the resulting tree representation corresponding to a well-formed XML document. The whole document must be parsed beforehand when using DOM. Another API called SAX allows processing of XML documents on the fly by notifying the processing program whenever a start or end tag is encountered. A standard set of API functions called DOM (Document Object Model) allows programs to manipulate the resulting tree representation corresponding to a well-formed XML document. The whole document must be parsed beforehand when using DOM. Another API called SAX allows processing of XML documents on the fly by notifying the processing program whenever a start or end tag is encountered.

Notation A * following the element name means that the element can be repeated zero or more times in the document. A + following the element name means that the element can be repeated one or more times in the document. A ? Following the element name means that the element can be repeated zero or one times An element appearing without any of the preceding three symbols must appear exactly once in the document. The type of the element is specified via parentheses following the element. If the parentheses include names of other elements these latter elements are the children of the element in the tree structure. If the parentheses include the keyword #PCDATA or one of the other data types available in XML DTD, the element is a leaf node. A bar symbol (e1| e2) specifies that ither e1 or e2 can appear in the document.

DTD limitations The data types in DTD are not very general. DTD has its own syntax and thus requires specialized processors. All DTD elements are always forced to follow the specified ordering of the document so unordered elements are not permitted.

XML Schema The XML schema language is a standard for specifying the structure of XML documents. It uses the same syntax rules as regular XML documents, so that the same processors can be used both. XML instance document or XML document XML instance document or XML document XML schema document for a document that specifies an XML document. XML schema document for a document that specifies an XML document.

definitions Schema descriptions and XML namespaces: It is necessary to identify the specific set of XML schema language elements being used by specifying a file stored at a web locaton. E.g. “http// definition is called an XML namespace. Annotations, documentation and language used:the tags xsd:documentation and xsd:annotation are used for providing comments and other descriptions in the XML document. xml:lang element specifies the language being used.

Storing XML documents Using a DBMS to store the documents as text: A relational or object DBMS can be used to store whole XML documents as text fields within the DBMS records or objects. This approach can be used if the DBMS has a special module for document processing, and would work for storing schemaless and document-centric XML documents Using a DBMS to store the document contents as data elements; This approach would work for storing a collection of documents that follow a specific XML DTD or XML schema.. Because all the documents have the same structure one can design a relational or object database to store the leaf-level data elements within the XML documents. Designing a specialized system for storing native XML data: A new type of database system based on a tree model could be designed and implemented. Creating or publishing customized XML documents from preexisting relational databases: Because there are enormous amounts of data already stored in relational databases, parts of this data may need to be formatted as documents for exchanging or displaying over the web. Use a a separate middleware software layer to handle the conversions needed between the XML documents and the relational database.

Extracting XML documents from databases 1.Create the appropriate XML hierarchy and the coresponding XML schema document 2.Create the correct query in SQL to extract the desired informatio for the XML document 3.Once the query is executed its result must be structured from the flat relational foro to the XML tree structure. 4.The query can be customized to select either a single object or a multiple objects into the document.

XML QUERYING - XPATH An Xpath expression returns a collection of element nodes that satisfy certain patterns specified in the expression. The names in the XPath expression are node names in the XML document tree that are either tag (element) names or attribute names, possibly with additional quantifier conditions to further restrict the nodes that satisfy the pattern. Two main separators are used when specifying a path: single slash (/) and double slash (//). A single slash before a tag specifies that the tag must appear as a direct child of the previous (parent) tag. A single slash before a tag specifies that the tag must appear as a direct child of the previous (parent) tag. A double slash (//) specifies that the tag can appear as a descendant of the previous tag at any level. A double slash (//) specifies that the tag can appear as a descendant of the previous tag at any level.

examples /company/company/department //employee[employeeSalary gt 1000]/employeeName /company/ employee[employeeSalary gt 1000]/employeeName /company/project/projectworker [hours ge 20.0]

XML QUERYING - XQuery XQuery permits the specification of more general queries on one or more XML documents. The typical form of a query in XQuery is known as FLWR expression, which stands for the four main clauses of XQuery and has the following form: FOR <variable bindings o individual nodes (elements) LET LET WHERE WHERE RETURN RETURN

Examples FOR $x IN Doc( //employee[employeeSalary gt 1000]/employeeName RETURN $x/firstName, $x/lastName RETURN $x/firstName, $x/lastName FOR $x IN Doc( WHERE $ /employeeSalary gt 1000 RETURN $x /employeeName /firstName, $x /employeeName /lastName /lastName

Example - DTD <!ATTLIST product product_id CDATA #REQUIRED Product_desc CDATA #REQUIRED> <!ATTLIST item gender CDATA #REQUIRED> gender CDATA #REQUIRED>

EXAMPLE XSL - XML XSL:<rule> </rule>XML: <catalog> SO1111 SO </item></product></catalog>

Executing queries ehttp://iiserver/virtualroot{?sql=string|?template=XMLtemplate}[{&param=valu e}...] e ’;SELECT+emp_no+mp_lna me+FROM+employee+FOR+XML+RAW;SELECT+’</ROOThttp://ntb11901/sample?sql=SELECT+ ’;SELECT+emp_no+mp_lna me+FROM+employee+FOR+XML+RAW;SELECT+’ ’ ’;SELECT+emp_no+mp_lna me+FROM+employee+FOR+XML+RAW;SELECT+’</ROOT<ROOT> </ROOT>

SELECT emp_no,emp_lname SELECT emp_no,emp_lname FROM employee WHERE emp_no = FOR XML AUTO </sql:query></ROOT> </ROOT>