S EMISTRUCTURED D ATA AND XML. 2222 D ISCUSSION Q UESTION Think about your personal Itunes library. Should it be maintained in a database system?

Slides:



Advertisements
Similar presentations
XML: Extensible Markup Language
Advertisements

1 Web Data Management Path Expressions. 2 In this lecture Path expressions Regular path expressions Evaluation techniques Resources: Data on the Web Abiteboul,
S EMISTRUCTURED D ATA AND XML H OW THE W EB IS T ODAY HTML documents often generated by applications consumed by humans only easy access: across.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 311 Database Systems I The Semistructured Data Model.
Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27, Part D Based on slides by Dan Suciu University of.
Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27, Part D Based on slides by Dan Suciu University of.
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27.
1 Lecture 10 XML Wednesday, October 18, XML Outline XML (4.6, 4.7) –Syntax –Semistructured data –DTDs.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 357 Database Systems I Query Languages for XML.
1 COS 425: Database and Information Management Systems XML and information exchange.
1 New Ways of Querying the Web by Eliahu Brodsky and Alina Blizhovsky.
Managing XML and Semistructured Data Lecture 6: XPath Prof. Dan Suciu Spring 2001.
Semi-structured Data. Facts about the Web Growing fast Popular Semi-structured data –Data is presented for ‘human’-processing –Data is often ‘self-describing’
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
Managing XML and Semistructured Data Lecture 2: XML Prof. Dan Suciu Spring 2001.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
SD2520 Databases using XML and JQuery
4/20/2017.
IS432: Semi-Structured Data Dr. Azeddine Chikh. 1. Semi Structured Data Object Exchange Model.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 XML Taken from Chapter 7.
XML by Dan Suciu 1 Introduction to Semistructured Data and XML Based on slides by Dan Suciu University of Washington.
XML and XPath. Web Services: XML+XPath2 EXtensible Markup Language (XML) a W3C standard to complement HTML A markup language much like HTML origins: structured.
S EMISTRUCTURED D ATA AND XML D ATA F ILES ON THE W EB HTML documents often generated by applications consumed by humans only easy access: across.
TDDD43 XML and RDF Slides based on slides by Lena Strömbäck and Fang Wei-Kleiner 1.
DATABASE and XML Moussa Mané. Learning Objectives ● Learn about Native XML Databases ● Learn about the conversion technology available ● Understand New.
XP New Perspectives on XML Tutorial 6 1 TUTORIAL 6 XSLT Tutorial – Carey ISBN
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
WORKING WITH XSLT AND XPATH
XML Overview. Chapter 8 © 2011 Pearson Education 2 Extensible Markup Language (XML) A text-based markup language (like HTML) A text-based markup language.
Extensible Markup and Beyond
CISC 3140 (CIS 20.2) Design & Implementation of Software Application II Instructor : M. Meyer Address: Course Page:
1 © Netskills Quality Internet Training, University of Newcastle Introducing XML © Netskills, Quality Internet Training University.
Introduction to XML. XML - Connectivity is Key Need for customized page layout – e.g. filter to display only recent data Downloadable product comparisons.
Tutorial 1: XML Creating an XML Document. 2 Introducing XML XML stands for Extensible Markup Language. A markup language specifies the structure and content.
Winter 2006Keller, Ullman, Cushing18–1 Plan 1.Information integration: important new application that motivates what follows. 2.Semistructured data: a.
XML 2nd EDITION Tutorial 1 Creating An Xml Document.
Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001.
Lecture 6: XML Query Languages Thursday, January 18, 2001.
Database Systems Part VII: XML Querying Software School of Hunan University
XP 1 Creating an XML Document Developing an XML Document for the Jazz Warehouse XML Tutorial.
Lecture 5: XML Tuesday, January 16, Outline XML, DTDs (Data on the Web, 3.1) Semistructured data in XML (3.2) Exporting Relational Data in XML (8.3.1)
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
XML query. introduction An XML document can represent almost anything, and users of an XML query language expect it to perform useful queries on whatever.
1 Introduction to Semistructured Data and XML. 2 How the Web is Today  HTML documents often generated by applications consumed by humans only easy access:
1 Tutorial 11 Creating an XML Document Developing a Document for a Cooking Web Site.
More XML: semantics, DTDs, XPATH February 18, 2004.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
CS 157B: Database Management Systems II February 11 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron.
XML CSC1310 Fall HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December  Markup  Markup is a symbol.
Working with XML. Markup Languages Text-based languages based on SGML Text-based languages based on SGML SGML = Standard Generalized Markup Language SGML.
Semi-structured Data In many applications, data does not have a rigidly and predefined schema: –e.g., structured files, scientific data, XML. Managing.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
XML Notes taken from w3schools. What is XML? XML stands for EXtensible Markup Language. XML was designed to store and transport data. XML was designed.
XML Databases Presented By: Pardeep MT15042 Anurag Goel MT15006.
XML BASICS and more…. What is XML? In common:  XML is a standard, simple, self-describing way of encoding both text and data so that content can be processed.
XML: Extensible Markup Language
Unit 4 Representing Web Data: XML
XML path expressions CSE 350 Fall 2003.
Management of XML and Semistructured Data
XML in Web Technologies
Semi-Structured data (XML Data MODEL)
Lecture 9: XML Monday, October 17, 2005.
Lecture 8: XML Data Wednesday, October
Semi-structured Data In many applications, data does not have a rigidly and predefined schema: e.g., structured files, scientific data, XML. Managing such.
Introduction to Database Systems CSE 444 Lecture 10 XML
Semi-Structured data (XML)
Lecture 11: XML and Semistructured Data
Presentation transcript:

S EMISTRUCTURED D ATA AND XML

2222 D ISCUSSION Q UESTION Think about your personal Itunes library. Should it be maintained in a database system?

3333 D ATA F ILES ON THE W EB HTML documents often generated by applications consumed by humans only easy access: across platforms, across organizations only layout, no semantic information No application interoperability: HTML not understood by applications Database technology: client-server vendor specific data files.

4444 XML D ATA E XCHANGE F ORMAT A standard from the W3C (World Wide Web Consortium, The mission of the W3C „... developing common protocols that promote its evolution and ensure its interoperability...“. Basic ideas XML = data XML generated by applications XML consumed by applications Easy access: across platforms, organizations.

5555 P ARADIGM S HIFT ON THE W EB For web search engines: From documents (HTML) to data (XML) From document management to document understanding (e.g., question answering) From information retrieval to data management For database systems: From relational (structured) model to semistructured data From storage to transport

T HE S EMI S TRUCTUED D ATA M ODEL

7777 O BJECT E XCHANGE M ODEL (OEM) &o1 &o12&o24&o29 &o43 &96 &243 &206 &25 “Serge” “Abiteboul” 1997 “Victor” “Vianu” paper book paper references author title year http author title publisher author title page firstname lastname firstnamelastnamefirst last Bib complex object atomic object with objectID

8888 T HE S EMISTRUCTURED D ATA M ODEL Data is self-describing : the data description is integrated with the data itself rather than in a separate schema. Database is a collection of nodes and arcs (directed graph). Leaf nodes represent attribute data of some atomic type ( atomic objects, such as numbers or strings). Interior nodes represent complex objects, entities, or elements. Complex objects consist of components (child nodes), connected by arcs to this node.

9999 T HE S EMISTRUCTURED D ATA M ODEL Arc labels indicate the relationship between the two corresponding nodes. The root node is the only interior node without in- arcs, representing the entire database. All database objects are descendants of the root node. The graph need not be a tree structure, but is usually acyclic.

XML XML Programmer

11 Language A way of communicating information Part of the Semantic Web. Markup Notes or meta-data that describe your data or language Extensible Limitless ability to define new languages or data sets. Sophisticated query languages for XML are available: XPath Xquery XML – The Extensible Markup Language

12 XML: A N E XAMPLE Richard Feynman The Character of Physical Law 1980 R.K. Narayan Waiting for the Mahatma 1981 R.K. Narayan The English Teacher 1980

13 XML – W HAT ’ S T HE P OINT ? You can include your data and a description of what the data represents This is useful for defining your own language or protocol Example: Chemical Markup Language … XML design goals: XML should be compatible with SGML It should be easy to write XML processors The design should be formal and precise

14 XML – S TRUCTURE XML looks like HTML XML is a hierarchy of user-defined tags called elements with attributes and data Data is described by elements, elements are described by attributes … closing tag attribute attribute valuedata open tag element name

15 XML – E LEMENTS … XML is case and space sensitive Element opening and closing tag names must be identical Opening tags: “ ” Closing tags: “ ” closing tag attribute attribute valuedata open tag element name

16 XML – A TTRIBUTES … Attributes provide additional information for element tags. There can be zero or more attributes in every element; each one has the the form: attribute_name =‘ attribute_value ’ - There is no space between the name and the “=‘” - Attribute values must be surrounded by “ or ‘ characters Multiple attributes are separated by white space (one or more spaces or tabs). closing tag attribute attribute valuedata open tag element name

17 XML – D ATA AND C OMMENTS … XML data is any information between an opening and closing tag XML data must not contain the ‘ ’ characters. closing tag attribute attribute valuedata open tag element name

18 XML – N ESTING & H IERARCHY XML tags can be nested in a hierarchy (think tree). XML documents can have only one root tag Between an opening and closing tag you can insert: 1. Data 2. Elements 3. A combination of data and elements Some Text More XML Examples and Exercises

19 G RAPHICAL D ATA M ODEL FOR XML Some Text More Node Type: Element_Node Name: Element Value: Root Node Type: Element_Node Name: Element Value: tag1 Node Type: Text_Node Name: Text Value: More Node Type: Element_Node Name: Element Value: tag2 Node Type: Text_Node Name: Text Value: Some Text

20 XML VS. S EMISTRUCTURED D ATA Both described best by a graph. Both are schema-less, self-describing (XML without DTD / XML schema). XML is ordered, semistructured data is not. XML can mix text and elements: Making Java easier to type and easier to type Phil Wadler XML has lots of other stuff: attributes, entities, processing instructions, comments.

21 XML VS. R ELATIONAL D ATABASES RelationalXML StructureTablesHierarchical Graph, Tree SchemaFixed before adding data Flexible, self- describing QueriesSimple, nice Less so OrderingNoneImplied ImplementationNativeAdd-On Jennifer Widom

22 DTD – D OCUMENT T YPE D EFINITION A DTD is a schema for XML data XML protocols and languages can be standardized with DTD files A DTD says what elements and attributes are required or optional Defines the formal structure of the language More advanced version: XML Schema.XML Schema (not on exam)

23 XML I SSUES Database issues: How are we going to model XML? (graphs). How are we going to query XML? (XPath, XQuery) How are we going to store XML (in a relational database? object-oriented? native?) How are we going to process XML efficiently? (many interesting research questions!)

XML-P ATH = X PATH XML-Q UERY = XQ UERY

25 Q UERY L ANGUAGES FOR XML XPath is a simple query language based on describing similar paths in XML documents. XQuery extends XPath in a style similar to SQL, introducing iterations, subqueries, etc. (not on exam) XPath and XQuery expressions are applied to an XML document and return a sequence of qualifying items. Items can be primitive values or nodes (elements, attributes, documents).

26 XP ATH A path expression returns the sequence of all qualifying items that are reachable from the input item following the specified path. A path expression is a sequence consisting of tags or attributes and special characters such as slashes (“/”). Absolute path expressions are applied to some XML document and returns all elements that are reachable from the document’s root element following the specified path. Relative path expressions are applied to an arbitrary node.

27 P ATH E XPRESSIONS Examples: DB = &o1 &o12&o24&o29 &o43 &o70&o71 &96 &243 &206 &25 “Serge” “Abiteboul” 1997 “Victor” “Vianu” paper book paper references author title year http author title publisher author title page firstname lastname firstnamelastnamefirst last Bib &o44&o45&o46 &o47&o48 &o49 &o50 &o51 &o52 Bib/paper={&o12,&o29} Bib/book/publisher={&o51} Bib/paper/author/lastname={&o71,&206} Bib/paper={&o12,&o29} Bib/book/publisher={&o51} Bib/paper/author/lastname={&o71,&206}

28 XP ATH E XAMPLE D OCUMENT Foundations… Abiteboul Hull Vianu Addison Wesley 1995 XML and Databases Ullmann XML Query Optimization Ng

29 P ATH E XPRESSIONS Examples: bib/paper returns XML and Databases Ullmann XML Query Optimization Ng bib/book/publisher returns Addison Wesley Given an XML document, the value of a path expression p is a set of objects (elements or attribute values).

30 A TTRIBUTES If we do not want to return the qualifying elements, but the value one of their attributes, we end the path expression Foundations… Abiteboul Hull Vianu Addison Wesley 1995 the XPath expression returns the sequence “b100“...

31 W ILDCARDS We can use wildcards instead of actual tags and attributes: * means any tag, means any attribute. // looks for any descendants. Examples /bib/*/author returns the sequence Abiteboul Hull Vianu Ullmann Ng /bib//author returns the same in this case

32 C ONDITIONS AND O THER C ONSTRUCTS /bib/paper[2]/author[1] : choose the second paper, first author: Ng /bib/paper[author = “Ng” ] : find all papers such that there exists an author Ng. XML Query Optimization Ng /bib/(paper|book)/title : find the titles of each element that is a paper or a book. Foundations… XML and Databases XML Query Optimization XPath examples

33 E XERCISE Evaluate 1. /bib/*/title 2. /bib//title 3. /bib/*[publisher = “McGraw”] Foundations… Abiteboul Hull Vianu Addison Wesley 1995 XML and Databases Ullmann XML Query Optimization Ng

34 S UMMARY The Object Exchange Data Model The world consists of objects. Complex objects comprise other objects as parts. XML: Mark-up language that supports semi- structured data. nested tags, consistent with nested objects. XML data: attribute values + free-form text XPath: query languge for XML