Presentation is loading. Please wait.

Presentation is loading. Please wait.

Benchmarking XML Processors for Applications in Grid Web Services Michael R. Head*, Madhusudhan Govindaraju*, Robert van Engelen**, Wei Zhang** *Grid Computing.

Similar presentations


Presentation on theme: "Benchmarking XML Processors for Applications in Grid Web Services Michael R. Head*, Madhusudhan Govindaraju*, Robert van Engelen**, Wei Zhang** *Grid Computing."— Presentation transcript:

1 Benchmarking XML Processors for Applications in Grid Web Services Michael R. Head*, Madhusudhan Govindaraju*, Robert van Engelen**, Wei Zhang** *Grid Computing Research Laboratory Binghamton University (SUNY) **Florida State University

2 2006-11-16 GCRL Binghamton University 2 Outline ● Motivation ● XML Performance Obstacles ● Benchmark Suite ● Results for a Variety of XML Processors ● Recommendations and Conclusions ● Future Work

3 2006-11-16 GCRL Binghamton University 3 XML Defined ● Text based (usually UTF-8 encoded) ● Tree structured ● Language independent ● Generalized data format

4 2006-11-16 GCRL Binghamton University 4 Motivation from SOAP ● Generalized RPC mechanism ● Broad industrial support ● Web Services on the Grid – OGSA: Open Grid Services Architecture – WSRF: Web Services Resource Framework ● At bottom, SOAP depends on XML

5 2006-11-16 GCRL Binghamton University 5 XML Exclusive of SOAP ● General structured data format ● Becoming standard for many scientific datasets – HapMap – mapping genes – Protein Sequencing – NASA astronomical data – Many more instances

6 2006-11-16 GCRL Binghamton University 6 Benchmark Motivation ● Grid applications place a wide range of requirements on the communication substrate and data formats. ● Simple and straightforward implementations can have a severe performance impact.

7 2006-11-16 GCRL Binghamton University 7 XML Performance Limitations ● Compared to “legacy” formats – Text-based ● Lacks any “header blocks” (ex. TCP headers), so must scan every character to tokenize ● Numeric types take more space and conversion time – Lacks indexing ● Unable to quickly skip over fixed-length records

8 2006-11-16 GCRL Binghamton University 8 Array size: SOAP vs. Binary 5 times difference in size

9 2006-11-16 GCRL Binghamton University 9 CPU Usage when parsing doubles 90% of CPU time is being spent in floating point conversions

10 2006-11-16 GCRL Binghamton University 10 Parsing Optimizations in Use ● Look-aside buffers/String caching [gsoap, XPP] ● Trie data structure with schema-specific parser ● One pass table-driven recursive descent parser [TDX]

11 2006-11-16 GCRL Binghamton University 11 Benchmark Suite 1)A chosen set of XML documents – Low level probes – Application-based benchmarks 2)A driver application for each XML processor – Runs the parser on the input, but does not act on the data ● Eliminates application-level performance differences ● One for each interface style (SAX/DOM)

12 2006-11-16 GCRL Binghamton University 12 Benchmark Probes ● Overhead test – Minimal XML document ● (header plus one self-closing element) ● Buffering – Repeated use of xsi:type attributes ● Namespace management – Gratuitous use of xmlns attributes ● SOAP payloads

13 2006-11-16 GCRL Binghamton University 13 Application Benchmarks ● Ptolemy Workflow documents (which Kepler uses) ● Genetic data files – (Large) files from the International HapMap Project ● Molecular data ● Mesh interface objects, event streams (WSMG) ● WS-Security documents ● Eager for more

14 2006-11-16 GCRL Binghamton University 14 Results – Latency Overhead

15 2006-11-16 GCRL Binghamton University 15 C Parsers: SOAP Payloads

16 2006-11-16 GCRL Binghamton University 16 C Parsers: Application-level tests

17 2006-11-16 GCRL Binghamton University 17 Java Parsers: SOAP Payloads

18 2006-11-16 GCRL Binghamton University 18 Java Parsers: Application-level tests

19 2006-11-16 GCRL Binghamton University 19 TDX Performance SOAP payload of array of strings

20 2006-11-16 GCRL Binghamton University 20 Recommendations ● When handling disparate XML formats, different parsers, consider a pluggable XML handling mechanism ● Schema-specific parsing techniques (TDX for example) are very promising when schemas are known in advance ● When considering designs for multi-core architectures, using TDX may be far faster than attempting to parallelize the other existing processors

21 2006-11-16 GCRL Binghamton University 21 Community Relations ● Publicly available benchmark suite ● Encourage vendors, users, developers to contribute additional XML parsers and sample files as necessary – head@acm.org head@acm.org – mgovinda@cs.binghamton.edu mgovinda@cs.binghamton.edu – engelen@cs.fsu.edu engelen@cs.fsu.edu – http://grid.cs.binghamton.edu/projects/xmlbench http://grid.cs.binghamton.edu/projects/xmlbench

22 2006-11-16 GCRL Binghamton University 22 Future Work ● Various techniques to parallelize XML processing ● Add new XML parser tests to the suite – Add more tests for existing parsers ● Include more sample files ● Update web site with current performance snapshots

23 2006-11-16 GCRL Binghamton University 23 Questions


Download ppt "Benchmarking XML Processors for Applications in Grid Web Services Michael R. Head*, Madhusudhan Govindaraju*, Robert van Engelen**, Wei Zhang** *Grid Computing."

Similar presentations


Ads by Google