Presentation is loading. Please wait.

Presentation is loading. Please wait.

A scalable approach to processing large XML data volumes Dr. Peter Fankhauser Fraunhofer IPSI Darmstadt Dr. Tim Weitzel Institute.

Similar presentations


Presentation on theme: "A scalable approach to processing large XML data volumes Dr. Peter Fankhauser Fraunhofer IPSI Darmstadt Dr. Tim Weitzel Institute."— Presentation transcript:

1 A scalable approach to processing large XML data volumes Dr. Peter Fankhauser Fraunhofer IPSI Darmstadt fankhauser@ipsi.fhg.de Dr. Tim Weitzel Institute of IS Frankfurt University tim@xml-network.de Dr. Thomas Tesch Infonyte GmbH Darmstadt tesch@infonyte.de

2 we scale your XML http://www.infonyte.de „one half of the world uses XML... the other half has to“ increasing XML penetration and data volumes document management, content management data and process integration deregulated electricity markets straight through processing in stock trading („garage clearing“) challenge: develop scalable XML tools IETD (3,5 GB XML-manuals) trading platform integration 40,000 transaction every hour 1MB SWIFT = 10MB swiftML = 100MB RAM consumption  main memory as bottleneck

3 we scale your XML http://www.infonyte.de XML and main memory scalability challenging even on huge systems, often not a relative problem try editing the 3,5 GB XML-manual of a Boeing airplane with XML Spy reason: DOM implemantations represent entire DOM tree in main memory depending on XML document and DOM implementation, textual XML up to 20 times as big in a main memory DOM analogous for XSLT: 20 MB XML document requires 200-400 MB EDI example: SWIFT  swiftML scalability problem: main menory restrictions, mobile devices, embedded systems many architectures don‘t require permant XML storage but rather import data into an „XML warehouse“ (complementary to relational systems) for subsequent processing (XSLT, Xpath, XML Schema validation  aggregation, synchronization, retrieval  filter, format, transform)

4 we scale your XML http://www.infonyte.de XML processing

5 we scale your XML http://www.infonyte.de IDB – Infonyte Data Base IDB uses Persistent DOM (PDOM) result of >10 PY of OO/XML database research at Germany‘s main think tank compact, binary, indexed XML format for representing DOM (directly processing well-formed XML) basic elements of IDB: PDOM persistent XSLT processor (PXSLT) query engines for XPath, XQL document collection support XML workbench

6 we scale your XML http://www.infonyte.de PDOM PDOM for storing and accessing XML documents according to W3C DOM API binary representation of XML instances, accessed using DOM Level 2 Interface also: structural indices for reconstructing document sequence and increasing query performance; PDOM engine for optimizing allocation of XML documents between main and secondary memory PDOM can store up to 2^30 XML nodes or 1 Terabyte XML

7 we scale your XML http://www.infonyte.de Architecture modular (e.g. use parts of IDB as highly scalable XML backend for J2EE conforming IBM WebSphere Application Server) PDOM IDB components 400-800 KB code size, require 16 MB RAM access system via command line, web server oder Java interfaces can use schema-less XML all index and storage structures derived from XML instance  no need to define mappings on physical data models (as in realtional systems and some XML databases)

8 we scale your XML http://www.infonyte.de IDB component architecture

9 we scale your XML http://www.infonyte.de Performance test using XML-ified version of freely available freeDB CD database (FreeDB 2002) FreeDB consists of about 500,000 CD descriptions XML version about 500 MB On a standard PC (1,8 Ghz, 512 MB RAM) parsing and PDOM creation (32 million XML nodes, 400 MB) including all structural indices takes about 4 minutes (~2MB/s) generating user-defined index for all CD keys (indexes 548,000 nodes or 1.7% of the entire database) in about 88 seconds generating full-text index (28 million nodes, 89% of the entire data- base) in 17 minutes, resulting in an index size of 90 MB XSLT processing (generate HTML) throughput up to 10 MB per second searching for CDs with particular titles or tracks using the full-text index, first results are delivered within 5-10 milliseconds, analogous for subsequent hits.

10 we scale your XML http://www.infonyte.de Search results for “bowie” on “bbc”

11 we scale your XML http://www.infonyte.de Search results for “bowie” on “bbc”

12 we scale your XML http://www.infonyte.de Scalability

13 we scale your XML http://www.infonyte.de Applications I XML Warehouse business process integration congregating data from different information systems into one common XML representation all data then reformatted, e.g. for publishing on a web server, using XSLT or XQL/XPath commands. huge US-based financial information and service provider based on IDB, an application was developed for individualized messaging and feeding a web portal that allows customers to get their individual transaction data in real time Infonyte system gets 10 GB XML raw data every day, indexes it and makes it available for ten days significant savings by straightforwardly processing these large amounts of data going along with access time in millisecond range

14 we scale your XML http://www.infonyte.de Applications II Interactive Electronic Technical Documentation (IETD) aviation industry with long SGML history, now many systems as browser based XML applications main challenge: designing distributed authoring environment with centralized data repository and efficient production process for compiling and formatting electronic manuals for different user groups, Sikorsky Aircraft Corporation XML-IETD system based on Infonyte IDB used for production process as well as for providing the documents via a web server production: Infonyte XSLT processor is key element for demand driven compilation of large XML data volumes subsequent usage of the technical manuals in a reading environment, Infonyte is used as client-side tools to enable XML query languages to retrieve relevant document fragments. architectures helped Sikorsky realize substantial cost and service improvements

15 we scale your XML http://www.infonyte.de Applications III Mobile Information Management challenge low memory consumption, platform independence qua Java and the compact PDOM format make Infonyte the ideal XML based mobile application kernel. Mobil Sales Force Automation US-based Vaultus (http://www.vaultus.com) used Infonyte technology as foundation of their mobile information platform. In addition to data management, the system offers offline capabilities, secure transactions, network independence, and remote maintenance services

16 we scale your XML http://www.infonyte.de Performance Performance of IDB on mobile devices developed mobile demo scenario using the full freeDB a limited version consisting only of the data server, the PDOM, and the index and collection APIs (all in all about 300 KB), the full FreeDB demo runs on a PocketPC (iPAQ Pocket PC H3800 with 64 MB Ram, 32 MB Rom, 206 MHz ARM-Processor, 1GB IBM- Microdrive, Personal Java 1.2 Insignia Jeode) using the indices, response time for Boolean search on this limited platform is 1-2 seconds, searching for singular criteria is even faster.

17 we scale your XML http://www.infonyte.de

18 we scale your XML http://www.infonyte.de Performance: an EDI example Algebraic Query Optimizer Persistent DOM (PDOM) XQueryXPathXQL Dataserver I/O Manager PDOM File RDBMSPaged I/O Main Memory XSLT Index Manager W3C DOM API Collection API XML Application ServletJava APICommand Line Web PDF+ Print XML Message EDI PDOM CD-ROM Import Checkin Checkout Replace Reuse Search Assembly Validate Formatting Filtering Transformation Aggregation SourceProductionDestination SWIFT FIX SWIFT ML FpML EDI SWIFT FIX

19 we scale your XML http://www.infonyte.de SWIFT2XML processing SWIFT messages with XML SWIFT to XML developed parser fully XML-ified (i.e. no information loss) generic XML  multi-step optimization of process chain, trading-off bandwidth and document construction time (multiple calculations like PDOM creation and full-text index) XML processing processing of well-formed XML storage as PDOM access using full-text indices and data indices visualizatin using XSLT, integration with web server SWIFTXMLPDOMfull text index data volume100MB430MB200MB40MB compression92% 8MB 97% 12,9MB 73% 54MB 69% 12,4MB transfer and parsing (10MB/s) ~12 min (+7 min) ~7 min (+7 min) 6 sec+~2 sec transfer and parsing (2MB/s) 4s + ~12 min (+7 min) 6s + ~7 min (+7 min) 33 sec+ ~7 sec

20 tim@xml-network.de download IDB, FreeDB etc.: www.infonyte.com papers etc. http://tim.weitzel.com


Download ppt "A scalable approach to processing large XML data volumes Dr. Peter Fankhauser Fraunhofer IPSI Darmstadt Dr. Tim Weitzel Institute."

Similar presentations


Ads by Google