Efficient XML Interchange. XML Why is XML good? A widely accepted standard for data representation Fairly simple format Flexible It’s not used by everyone,

Slides:



Advertisements
Similar presentations
Efficient XML Interchange What is it? Why is it? How does it fit in?
Advertisements

 Fundamentals of Web Design.  Describe the history and theory of XHTML  Understand the rules for creating valid XHTML documents  Apply a DTD to an.
XML 6.3 DTD 6. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:  Elements.
The Assembly Language Level
Open-DIS Open Source Distributed Interactive Simulation Don McGregor (mcgredo at nps dot edu) Don Brutzman (brutzman at nps dot edu) John Grant (johnkonradgrant.
Open-DIS and XML DIS in Other Formats. Distributed Interactive Simulation DIS is an IEEE standard for simulations, primarily virtual worlds Binary protocol:
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
XML Parsing Using Java APIs AIP Independence project Fall 2010.
Technical Architectures
A New Computing Paradigm. Overview of Web Services Over 66 percent of respondents to a 2001 InfoWorld magazine poll agreed that "Web services are likely.
Document Type Definitions
W3C Finland Seminar: Semantic Web & Web Services© Kimmo RaatikainenMay 6, 2003 XML in Wireless World Kimmo Raatikainen University of Helsinki, Department.
Annotation Types for UIMA Edward Loper. UIMA Unified Information Management Architecture Analytics framework –Consists of components that perform specific.
Thayer School of Engineering Dartmouth Lecture 2 Overview Web Services concept XML introduction Visual Studio.net.
23-Jun-15 HTML. 2 Web pages are HTML HTML stands for HyperText Markup Language Web pages are plain text files, written in HTML Browsers display web pages.
CS 898N – Advanced World Wide Web Technologies Lecture 22: Applying XML Chin-Chih Chang
XML(EXtensible Markup Language). XML XML stands for EXtensible Markup Language. XML is a markup language much like HTML. XML was designed to describe.
Document Type Definitions. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:
Technical Track Session XML Techie Tools Tim Bornholt.
Optimized Communication For Mobile Multimedia Collaboration Applications Sangyoon Oh Community Grids Laboratory Indiana University
Networking Nasrullah. Input stream Most clients will use input streams that read data from the file system (FileInputStream), the network (getInputStream()/getInputStream()),
Internet Skills An Introduction to HTML Alan Noble Room 504 Tel: (44562 internal)
Avro Apache Course: Distributed class Student ID: AM Name: Azzaya Galbazar
EARTH SCIENCE MARKUP LANGUAGE “Define Once Use Anywhere” INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
Introduction to XML cs3505. References –I got most of this presentation from this site –O’reilly tutorials.
Object and component “wiring” standards This presentation reviews the features of software component wiring and the emerging world of XML-based standards.
Worshipping at the Shrine: Myths and Legends from comp.text.xml Kerry “the heretic” Raymond, CiTR.
Copyright © 2012 Accenture All Rights Reserved.Copyright © 2012 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are.
Introducing Axis2 Eran Chinthaka. Agenda  Introduction and Motivation  The “big picture”  Key Features of Axis2 High Performance XML Processing Model.
First part System Utilities Lecture 3 ASSEMBLER Ştefan Stăncescu 1.
Games Development 2 Text-based Game Data CO3301 Week 4.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
XML Parsers Overview  Types of parsers  Using XML parsers  SAX  DOM  DOM versus SAX  Products  Conclusion.
From Code to XLIFF Bridging the Chasm Dr. Stephen Flinter Connect Global Solutions LRC Conference – 19 November 2003.
Open-DIS Open Source Distributed Interactive Simulation Protocol in C++ and Java Don McGregor (mcgredo nps.edu)
EARTH SCIENCE MARKUP LANGUAGE Why do you need it? How can it help you? INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
XP Tutorial 9 1 Working with XHTML. XP SGML 2 Standard Generalized Markup Language (SGML) A standard for specifying markup languages. Large, complex standard.
Copyrighted material John Tullis 10/17/2015 page 1 04/15/00 XML Part 3 John Tullis DePaul Instructor
WEB BASED DATA TRANSFORMATION USING XML, JAVA Group members: Darius Balarashti & Matt Smith.
2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of.
Cohesion and Coupling CS 4311
EXI Comparisions. EXI Emerging W3C standard, now in “final call” status on the standards track Provides a more efficient, alternate.
Efficient XML Interchange High Performance XML Don McGregor (mcgredo (at) nps.edu) Don Brutzman (brutzman (at) nps.edu)
XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.
XML Example: a datafile Ed Gogol John Smith Elements have an opening tag, for example and a closing tag, for example. Person is the element name. Everything.
On the data side of the application… In the beginning, we needed to translate the ideas for the game’s dialogue progression and how the player would interact.
XML eXtensible Markup Language. XML A method of defining a format for exchanging documents and data. –Allows one to define a dialect of XML –A library.
Challenges in handling XML: performance and memory usage Sami Poikonen Republica oy.
S O A P ‘the protocol formerly known as Simple Object Access Protocol’ Team Pluto Bonnie, Brandon, George, Hojun.
Web Technologies Lecture 4 XML and XHTML. XML Extensible Markup Language Set of rules for encoding a document in a format readable – By humans, and –
When we create.rtf document apart from saving the actual info the tool saves additional info like start of a paragraph, bold, size of the font.. Etc. This.
Representing data with XML SE-2030 Dr. Mark L. Hornick 1.
FIPA Abstract Architecture London FIPA meeting January 24-29, 2000 from: TC-A members.
PRESENTATION DAY Group ID: gp09-cmg Speaker: Matthew Albers RFID APPLICATION DEVELOPMENT SUITE.
03/26/2009draft-cheng-grow-bgp-xml-00.txt 1 An XML Format for BGP Data Collection draft-cheng-grow-bgp-xml-00.txt Dan Massey Kevin BurnettPayne Cheng He.
XML Extensible Markup Language
CHAPTER 9 File Storage Shared Preferences SQLite.
Challenges in XML It’s good… but is it good enough? Siddhesh Bhobe Persistent eBusiness Solutions.
Static SDO Proposal Main Points/ Next Steps. Goals of specification Not about the algorithm to generate Static SDOs Instead, about how Static SDOs map.
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
Efficient XML Aaron Braeckel Briefing to FAA Program Management 28 Dec 2008 National Center for Atmospheric Research Boulder, CO Copyright ©
XML Parsers Overview Types of parsers Using XML parsers SAX DOM
Java XML IS
XML Parsers Overview Types of parsers Using XML parsers SAX DOM
Fundamentals of Data Structures
XML Problems and Solutions
Using NFFI Web Services on the tactical level: An evaluation of compression techniques 13th ICCRTS: C2 for Complex Endeavors Frank T. Johnsen.
Extensible Markup Language (XML)
Presentation transcript:

Efficient XML Interchange

XML Why is XML good? A widely accepted standard for data representation Fairly simple format Flexible It’s not used by everyone, but it’s used by enough people to make for a rich tools environment It’s flexible enough to be used in lots of contexts It’s text based and human readable, which makes it a good archival format

XML XML in 10 points Includes (3) “XML is meant to be read”, and (4) “XML is verbose by design” XML can (but should not be) read by humans, and is not very compact

XML These design principles also make it very difficult to use XML in some environments Wireless military links: low bandwidth Mobile devices: battery life limitations Processing efficiency: it can take CPU cycles to parse XML Data binding

Limitations Lots of ships have 64 Kbit/sec at best. It is problematic to ship XML across these links CPUs are on Moore’s law curve, but battery power is limited by the state of chemistry. We can’t assume that faster processors will save us. Lots of applications for hand held devices with limited battery power (cell phones, etc.) Cell phones don’t necessarily have strong CPUs, so parsing XML can be expensive relative to other tasks

Data Binding This is a more subtle problem. How do you convert this to an object? You need to parse the string “1.0”, then convert it to a binary representation It’s the difference between string x; And float x;

Data Binding Typically something comes in from the wire, and you have to do the Java equivalent of Float.parseFloat(“1.0”); This is expensive when working with numeric- heavy documents It is much more efficient to keep the value X in a binary representation in the document, then simply read it on the receiving side

Efficient XML Interchange EXI relaxes some of the requirements of XML in order to be more compact, faster to parse, and have better data binding characteristics Relax the “human readable” requirement Allow binary data What you get is an alternate encoding of the XML infoset that is more compact, faster to parse, and allows deployment in new environments that XML previously could not be deployed in

EXI EXI is being developed by a W3C working group and is on a standards track. The hope is that this will become a W3C-blessed encoding of the XML infoset Working group draft now working its way to approval. Need multiple implementations, blessed by W3C technical architecture group, approval by other W3C working groups (encryption, processors, etc.)

EXI Represents the same data as an XML document, only in a more efficient encoding Minimal impact on other XML technologies, such as encryption More efficient to parse, better data binding performance

EXI Includes file format specification, primer on EXI, best practices Note that one thing that is NOT specified is an API for accessing the data. This is an important and significant omission Lack of a standardized typed API means we still have to go through string representations

Typed API What is meant by a typed API? DOM and SAX return string values: Attr anAttribute; … // DOM returns a String attribute value here String val = anAttribute.getValue() And then we need to convert val into a float via Float aFloat = Float.parseFloat(val);

Typed API But what we often want is the value specified in the schema: Float aFloat = anAttribute.getFloat(); There are proposals for a generalized typed API, but it is not part of this standard

EXI EXI has several options to handle different situations. You have an XML document and a schema You have an XML document but no schema You have an XML document, and a schema that almost, but not quite, matches the document

Element and Attribute Names Tag names take up a lot of space, and can be somewhat expensive to parse Virginia Count up the characters used for markup here:31/55 ~=50-60% of file size for markup tags If we replace the character tags with numeric stand-ins we can get much more compact, and it will be faster to parse

Schema-Informed If you have a schema, that gives you type information about the XML document. You know that means the x is a float value rather than a string, because the schema tells you that. That means you can store the “1.0” value in a binary format, which is generally more compact and has the potential to have better data binding with a typed API

Schemaless What if you don’t have a schema? This means you can’t exploit type information. But EXI should support this situation, because it should be a general solution EXI handles this by replacing repeating strings with a compact identifier

Schemaless The strings “Monterey” and the zip code are likely to be repeated many times in an XML document. We can create a table of these values, and then use the table ID rather than the whole string StringID Monterey San Jose

“Almost” Schemas If you have a document that doesn’t quite match the schema, EXI can take a forgiving attitude. It uses the schema to encode the types it knows about, and uses strings and string table identifiers to handle the ones not described by the schema

Implementations As of now there is one implementation of the draft spec, Efficient XML from Agile Delta ( Other open source projects underway, and some commercial projects The standards process requires that multiple independent implementations be available before the standard is approved

Results Example: Distributed Interactive Simulation (DIS) is an IEEE standard for modeling and simulation. It is a binary standard that contains (x,y,z), velocity, acceleration, and other numeric-heavy data We did an XML representation of the binary DIS standard

Results DIS Binary (bytes) DIS XML EXI Format 1 PDU PDUs 464,480 3,924,680365,564

Results Somewhat better size than the original binary format. The exact size varies somewhat depending on the numeric data, while the original binary format is always the same size. Exi seems to be consistently better, though AND it is marked up in a way that makes it equivalent to an XML file. This means we can easily access all the tools of the XML ecosystem by simply converting it to a text XML representation

Conclusions Replace all text XML with EXI? No! EXI is intended to expand the use of XML into use cases that XML could not service. XML mostly does fine in its existing environment EXI can be used to XML-ify existing binary protocols and get slightly better performance with greatly increased interoperability (no one knows DIS binary, everyone knows XML) Next great frontier: typed XML APIs