Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS5604( Final Presentation) – 12/08/2010 Virginia Polytechnic Institute and State University Presented by: Team 4 (Sony Vijay, Sarosh Malyattil, Sherif)

Similar presentations


Presentation on theme: "CS5604( Final Presentation) – 12/08/2010 Virginia Polytechnic Institute and State University Presented by: Team 4 (Sony Vijay, Sarosh Malyattil, Sherif)"— Presentation transcript:

1 CS5604( Final Presentation) – 12/08/2010 Virginia Polytechnic Institute and State University Presented by: Team 4 (Sony Vijay, Sarosh Malyattil, Sherif)

2 OUTLINE  XML introduction  XML Database introduction  XQuery introduction  Sedna package introduction  5Ss of Sedna  Sedna Architecture Outline  Sedna installation  Demo  WikiXMLDB: Query Wikipedia with Xquery  Example application areas of XML  Concept Map  Relationship with IR  Resources

3 What is XML? XML stands for extensible Markup Language. A markup language is used to provide information about a document. Tags are added to the document to provide the extra information. HTML tags tell a browser how to display the document. XML tags give a reader some idea what some of the data means.

4 Difference between XML and HTML XML is not a replacement for HTML. XML was designed to describe data and to focus on what data is. HTML was designed to display data and to focus on how data looks. HTML is about displaying information, XML is about describing information.

5 Why Is XML Important? Plain Text – Easy to edit – Useful for storing small amounts of data – Possible to efficiently store large amounts of XML data through an XML front end to a database Data Identification – Tell you what kind of data you have – Can be used in different ways by different applications Inline Reusability – Can be composed from separate entities – Modularize your documents without resorting to links Hierarchical – Faster to access – Easier to rearrange

6 Why XML evolved? Infrastructure for the Internet 1986 SGML (Standard Generalized Markup Language) for defining and representing structured documents 1991 WWW and HTML introduced for the Internet 1991 Business adopts the WWW technology; huge expansion in the use of the Internet 1995 New kinds of businesses evolve, based on the connectivity of people all over the world and connectivity of applications built by various software providers (B2C, B2B) Urgent need for a new, common data format for the Internet

7 XML Building blocks Element Delimited by angle brackets Identify the nature of the content they surround General format: … Empty element: Attribute Name-value pairs that occur inside start-tags after element name, like:

8 Example of an XML Document John

9 XML Trees An XML document has a single root node. The tree is a general ordered tree. – A parent node may have any number of children. – Child nodes are ordered, and may have siblings. Preorder traversals are usually used for getting information out of the tree. address name phonebirthday firstlastyearmonthday

10 Document Type Definitions A DTD describes the tree structure of a document and something about its data. There are two data types, PCDATA and CDATA. – PCDATA is parsed character data. – CDATA is character data, not usually parsed. A DTD determines how many times a node may appear, and how child nodes are ordered.

11 DTD for address Example

12 XML Database An XML database is a data persistence software system that allows data to be stored in XML format. This data can then be queried, exported and serialized into the desired format. Two major classes of XML database exist: – XML-enabled: these map all XML to a traditional database (such as a relational database[1]), accepting XML as input and rendering XML as output. This term implies that the database does the conversion itself (as opposed to relying on middleware). – Native XML (NXD): the internal model of such databases depends on XML and uses XML documents as the fundamental unit of storage, which are, however, not necessarily stored in the form of text files.

13 What is XQuery? XQuery is designed to be a small, easily implemental language in which the queries are easily understood. It is also flexible enough to query a broad spectrum of XML information sources, including both databases and documents. It is a human-readable query syntax and an XML based query syntax.

14 XQuery Example FOR $b IN document(“bib.xml") WHERE $b/publisher = "Morgan Kaufmann" AND $b/year = "1998" RETURN $b/title

15 ABOUT SEDNA Sedna is a free native XML database which provides a full range of core database services. It provides persistent storage, ACID transactions, security, indices, hot backup. Provides support for fine-grained XML triggers Provides sql connection from XQuery. Provides XQuery external functions implemented in C. Provides database security (users, roles and privileges).

16 5S 16 SsExamplesObjectives Streams XML Describes properties of the DL content such as encoding and language for textual material or particular forms of multimedia data Structures Collection, Entities, Attributes, hypertext, DTD, metadata, Specifies organizational aspects of the DL content Spaces Xquery operations Defines logical and presentational views of several DL components Scenarios Searching, Querying, updating, inserting, deleting Details the behavior of DL services Societies Database administrators, database users, web users Defines managers, responsible for running DL services; actors, that use those services; and relationships among them

17 Sedna Architecture Overview

18 Sedna Architecture: interactions between processes

19 Sedna API overview APIs developed by our team: – C – Java – Scheme – OmniMark (Stilo’s streaming programming language used for content engineering tasks) APIs contributed by Sedna open source users: – Python – PHP –.Net – XML:DB API (standard API for XML databases, supported by other products also)

20 Basic Sedna C API C API – lightweight set of functions for: Managing sessions: – SEconnect, SEclose; Managing transactions: – SEbegin, SEcommit, SErollback; Executing queries: – SEexecute, SEgetData, SEnext; – Query result can be presented as a DOM tree, or processed as a stream of SAX events Load data: – SEloadData more… DDL statements are incorporated into Sedna language: for example: CREATE COLLECTION, LOAD “xmark.xml” “xmark”, CREATE INDEX …, CREATE TRIGGER … and more.

21 Sedna Native XML Database Client/Server Protocol Sedna XML Database server uses a message-based protocol for communication with clients through the TCP/IP sockets. The common message structure is as follows: the first four bytes (int) is instruction, the next four bytes (int) is the length of a body in bytes; the next ’length’ bytes is the body. To begin a session, a client creates a connection to the server and sends a startup message. The server launches a new process that is associated with the session Queries are executed via different subprotocols depending on the type of the query and the query length. There are three types of queries: query, update, bulk load. Termination can be initiated by the client (for example when it closed the session) or by the server (for example in case of an administrator- commanded database shutdown or some failure).

22 Sedna Query Parser & Optimizing Rewriter Physical Plan Generator Optimizing Rewriter Static Analyzer Parser XML update statement Data Definition Language statement XQuery query query execution plan (QEP): to Sedna executor operation tree

23 Query Processing Steps Parser: 3 types of queries/statements:XQuery queries, XML update statements,Data Definition Language statements (e.g. create document statement). Static analyzer: All namespace prefixes, function names and variable names are resolved. XQuery static errors are reported. Optimizing Rewriter: Removing unnecessary ordering operations, removing unnecessary node copies for constructed content. Query physical plan generation: Query execution plan (QEP) for the query operation tree is constructed

24 INSTALLING THE PACKAGE The installation guide can be found at The following steps need to be followed. –Download the self extracting script from –Make the script executable by changing its permission using: chmod +x sedna bin-linux-x86.sh –Execute the script by running the following command on the terminal in the directory in which the script was downloaded:./sedna bin-linux-x86.sh –The operating system user that is going to run Sedna must have r-w-x permissions for the following Sedna directories: $SEDNA_INSTALL/data $SEDNA_INSTALL/cfg (here $SEDNA refers to the directory in which Sedna was installed)

25 INSTALLING THE PACKAGE –To grant the necessary permissions on linux execute the following command on the terminal. chown cfg data –Sedna actually needs to run 50 sessions (default maximum) simultaneously. To do this some of the kernel parameter settings have to be extended. This can be done as under. Log on as a user with root authority Open up /etc/sysctl.conf in a text editor and add entries: kernel.sem = " "

26 WORKING WITH SEDNA Go to INSTALL_DIR/bin ( Here INSTALL_DIR refers to the directory in which Sedna was installed) To start Sedna server run:./se_gov ( If Sedna is started successfully it prints "GOVERNOR has been started in the background mode“) To shutdown Sedna server run:./ se_stop To create a database named testdb:./se_cdb testdb ( If the database is created successfully it prints "The database 'auction' has been created successfully“) To run the testdb database:./se_sm testdb ( If the database is started successfully it prints "SM has been started in the background mode".) To shutdown the testdb database:./se_smsd testdb

27 WORKING WITH SEDNA(RUN QUERIES) The directory INSTALL_DIR/examples/commandline has an xml file named auction.xml and some sample xml queries in files named sample01.xquery to sample10.xquery. The directory also contains a script which uploads an xml file into the data base named load_data.xquery. Load the sample XML document into the auction database by typing the command in the directory:INSTALL_DIR/bin :./se_term -file INSTALL_DIR/examples/commandline load_data.xquery testdb Now you can execute the sample queries by typing the command:./se_term -file INSTALL_DIR/examples/commandline /.xquery testdb (where.xquery is the name of a file with the sample query). For instance, to execute sample01.xquery you should type:./se_term -file INSTALL_DIR/examples/commandline /sample01.xquery testdb

28 DEMO Log in to the cloud instance using the key provided and username idcuser. Sedna has been installed in the xml directory present in /home/idcuser. Installation folder

29 DEMO Start Sedna Create a database called auction Start data base

30 DEMO Load the XML document into the auction database execute the sample queries in sample02.xquery

31 DEMO Load the XML document into the auction database execute the sample queries in sample02.xquery

32 DEMO:C API Library files are located at the SEDNA_DIR/driver/c folder, where SEDNA_DIR is the directory Sedna is installed in. Examples discussed in this document can be found in SEDNA_DIR/examples/api/c. To build examples use: gcc -ISEDNA_DIR/driver/c -osample sample.c SEDNA_DIR/driver/c/libsedna.a

33 DEMO:C API

34

35 How it Works? XML File …… … …..

36 How it Works? Xquery (: Return the initial increases of all open auctions. :) for $b in doc("auction")/site/open_aucti ons/open_auction return { $b/bidder[position()=1]/increas e/text() } Result

37 Drawbacks of Wikipedia Not easy to use Wikipedia off-the-shelf in applications. Downloadable in XML format but individual articles are formatted in a proprietary wiki format. Most interesting uses of Wikipedia in applications are locked behind the access troubles.

38 Sedna’s Solution - WikiXMLDB: Query Wikipedia with XQuery WikiXMLDB provides a way of querying Wikipedia with Xquery. Implementation: Entire English Wikipedia content is parsed into XML representation (total size – about 36 GB). Loaded into Sedna XML database. A query interface is provided.

39 Deploying WikiXMLDB WikiXMLDB demo is deployed on Amazon EC2 and runs on the virtual computer with restricted resources. Can be run on local computer to achieve better performance and unlimited customization.

40 Uses of WikiXMLDb User can dissect individual articles, rip out abstracts, sections, links, infoboxes and other components of Wikipedia. User can combine pieces of existing documents into new XML documents and convert them to web pages with XSLT for example. User can enrich his content with data from Wikipedia and unlock its power for his applications. User can perform all these actions using W3C Xquery Language.

41 WikiXMLDB Demo XML representation of Wikipedia articles: WikiXMLDB Xquery interface:

42 Example application areas of XML Presentation Oriented Publishing (POP) - Same data in a web browser, mobile phone and a PDA. - Applications that require the web client to present different views of the same data to different users. Message Oriented Middleware (MOM) – B2B. XML used for exchanging database contents. -Applications that require the web client to mediate between two or more heterogeneous databases. Applications that attempt to distribute a significant proportion of the processing load from the web server to the web client. Intranet applications.

43 Presentation Oriented Publishing(POP) Data stored in database is used to generate XML documents Transform XML documents Using XSL (XML Style Sheet language) and scripting languages HTML WML

44 Message Oriented Middleware (MOM) Orders product Delivery Agent Sends standard message Updates itself Sends acknowledgment

45 Exchanging database contents Marketing Department Oracle Database MS Access database XML allows all the tables to be totally described by using custom tags.

46 Concept Map

47 Relationship with Information Retrieval Chapter 10 - XML Retrieval

48 Resources pageorder/5/ pageorder/5/ advantages.html advantages.html a.htm a.htm

49 Questions? Suggestions! Comments! Thank You!


Download ppt "CS5604( Final Presentation) – 12/08/2010 Virginia Polytechnic Institute and State University Presented by: Team 4 (Sony Vijay, Sarosh Malyattil, Sherif)"

Similar presentations


Ads by Google