Benchmarking XML Processors for Applications in Grid Web Services Michael R. Head*, Madhusudhan Govindaraju*, Robert van Engelen**, Wei Zhang** *Grid Computing.

Slides:



Advertisements
Similar presentations
Inside an XSLT Processor Michael Kay, ICL 19 May 2000.
Advertisements

Differential (De)Serialization for Optimized SOAP Performance Michael J. Lewis Grid Computing Research Laboratory Department of Computer Science Binghamton.
SOAP.
XML in the real world (2) SOAP. What is SOAP? ► SOAP stands for Simple Object Access Protocol ► SOAP is a communication protocol ► SOAP is for communication.
XML Security Processing With VTD- XML Jimmy Zhang XimpleWare Feb-18, 10:05am.
Application of XML Schema in Web Services Security Sridhar Guthula W3C XML Schema 1.0 User Experiences
Information Retrieval in Practice
A Compiler-Based Approach to Schema-Specific Parsing Kenneth Chiu Grid Computing Research Laboratory SUNY Binghamton Sponsored by NSF ANI
NetCDF An Effective Way to Store and Retrieve Scientific Datasets Jianwei Li 02/11/2002.
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
Overview of Search Engines
Technical Track Session XML Techie Tools Tim Bornholt.
1 Simple Object Access Protocol (SOAP) by Kazi Huque.
Networking Nasrullah. Input stream Most clients will use input streams that read data from the file system (FileInputStream), the network (getInputStream()/getInputStream()),
WSDL Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
Object and component “wiring” standards This presentation reviews the features of software component wiring and the emerging world of XML-based standards.
XML and its applications: 4. Processing XML using PHP.
Chapter 1. Introduction.
Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
IEEE CCGrid May 22, The gSOAP Toolkit Robert van Engelen Kyle Gallivan Florida State University.
Web Services for Satellite Emulation Development Kathy J. LiszkaAllen P. Holtz The University of AkronNASA Glenn Research Center.
WEB BASED DATA TRANSFORMATION USING XML, JAVA Group members: Darius Balarashti & Matt Smith.
Grid Computing Research Lab SUNY Binghamton 1 XCAT-C++: A High Performance Distributed CCA Framework Madhu Govindaraju.
1. 2 Preface In the time since the 1986 edition of this book, the world of compiler design has changed significantly 3.
Parallel XML Parsing Using Meta-DFAs Yinfei Pan 1, Ying Zhang 1, Kenneth Chiu 1, Wei Lu 2 1 State University of New York (SUNY) Binghamton 2 Indiana University.
XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.
Streaming XPath Engine Oleg Slezberg Amruta Joshi.
1 Introduction JAXP. Objectives  XML Parser  Parsing and Parsers  JAXP interfaces  Workshops 2.
9/25/08IEEE ICWS 2008 High-Performance XML Parsing and Validation with Permutation Phrase Grammar Parsers Wei Zhang & Robert van Engelen Department of.
Web services. DOM parsing and SOAP.. Summary. ● Exercise: SAX-Based checkInvoice(), ● push parsing, ● event-based parsing, ● traversal order is depth-first.
TDX: a High-Performance Table-Driven XML Parser Wei Zhang Robert van Engelen Department of Computer Science Florida State University.
XML Extensible Markup Language
SOAP, Web Service, WSDL Week 14 Web site:
Language Hierarchy Grid Services Flow Language Patrick Wagstrom 1,2, Sriram Krishnan 1,3, Gregor von Laszewski 1 1 Mathematics and Computer Science Division,
Chapter 1. Introduction.
Information Retrieval in Practice
Databases (CS507) CHAPTER 2.
XML Parsers Overview Types of parsers Using XML parsers SAX DOM
CST 1101 Problem Solving Using Computers
Evaluating Web Services Based Implementations of Grid RPC
Efficient Evaluation of XQuery over Streaming Data
Unit 4 Representing Web Data: XML
Lecture 5 Text File I/O; Parsing.
Introduction to Compiler Construction
Search Engine Architecture
Wei Zhang Robert van Engelen
XML Schema for WIRED XML Detector Description Workshop
SOFTWARE DESIGN AND ARCHITECTURE
Middleware independent Information Service
Some Basics of Globus Web Services
Exploring Remote Object Coherence in XML Web Services
课程名 编译原理 Compiling Techniques
Net 323 D: Networks Protocols
Overview of Hadoop MapReduce MapReduce is a soft work framework for easily writing applications which process vast amounts of.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Chapter 7 Representing Web Data: XML
Ahmet Fatih Mustacoglu
XML Parsers Overview Types of parsers Using XML parsers SAX DOM
Chapter 9 Web Services: JAX-RPC, WSDL, XML Schema, and SOAP
Net 323 D: Networks Protocols
XML Problems and Solutions
Grid Based Data Integration with Automatic Wrapper Generation
Deepak Shenoy Agni Software
Introduction to Data Structure
CS 240 – Advanced Programming Concepts
Chaitali Gupta, Madhusudhan Govindaraju
XML and its applications: 4. Processing XML using PHP
Supporting High-Performance Data Processing on Flat-Files
Lecture 20: Representing Data Elements
Presentation transcript:

Benchmarking XML Processors for Applications in Grid Web Services Michael R. Head*, Madhusudhan Govindaraju*, Robert van Engelen**, Wei Zhang** *Grid Computing Research Laboratory Binghamton University (SUNY) **Florida State University

GCRL Binghamton University 2 Outline ● Motivation ● XML Performance Obstacles ● Benchmark Suite ● Results for a Variety of XML Processors ● Recommendations and Conclusions ● Future Work

GCRL Binghamton University 3 XML Defined ● Text based (usually UTF-8 encoded) ● Tree structured ● Language independent ● Generalized data format

GCRL Binghamton University 4 Motivation from SOAP ● Generalized RPC mechanism ● Broad industrial support ● Web Services on the Grid – OGSA: Open Grid Services Architecture – WSRF: Web Services Resource Framework ● At bottom, SOAP depends on XML

GCRL Binghamton University 5 XML Exclusive of SOAP ● General structured data format ● Becoming standard for many scientific datasets – HapMap – mapping genes – Protein Sequencing – NASA astronomical data – Many more instances

GCRL Binghamton University 6 Benchmark Motivation ● Grid applications place a wide range of requirements on the communication substrate and data formats. ● Simple and straightforward implementations can have a severe performance impact.

GCRL Binghamton University 7 XML Performance Limitations ● Compared to “legacy” formats – Text-based ● Lacks any “header blocks” (ex. TCP headers), so must scan every character to tokenize ● Numeric types take more space and conversion time – Lacks indexing ● Unable to quickly skip over fixed-length records

GCRL Binghamton University 8 Array size: SOAP vs. Binary 5 times difference in size

GCRL Binghamton University 9 CPU Usage when parsing doubles 90% of CPU time is being spent in floating point conversions

GCRL Binghamton University 10 Parsing Optimizations in Use ● Look-aside buffers/String caching [gsoap, XPP] ● Trie data structure with schema-specific parser ● One pass table-driven recursive descent parser [TDX]

GCRL Binghamton University 11 Benchmark Suite 1)A chosen set of XML documents – Low level probes – Application-based benchmarks 2)A driver application for each XML processor – Runs the parser on the input, but does not act on the data ● Eliminates application-level performance differences ● One for each interface style (SAX/DOM)

GCRL Binghamton University 12 Benchmark Probes ● Overhead test – Minimal XML document ● (header plus one self-closing element) ● Buffering – Repeated use of xsi:type attributes ● Namespace management – Gratuitous use of xmlns attributes ● SOAP payloads

GCRL Binghamton University 13 Application Benchmarks ● Ptolemy Workflow documents (which Kepler uses) ● Genetic data files – (Large) files from the International HapMap Project ● Molecular data ● Mesh interface objects, event streams (WSMG) ● WS-Security documents ● Eager for more

GCRL Binghamton University 14 Results – Latency Overhead

GCRL Binghamton University 15 C Parsers: SOAP Payloads

GCRL Binghamton University 16 C Parsers: Application-level tests

GCRL Binghamton University 17 Java Parsers: SOAP Payloads

GCRL Binghamton University 18 Java Parsers: Application-level tests

GCRL Binghamton University 19 TDX Performance SOAP payload of array of strings

GCRL Binghamton University 20 Recommendations ● When handling disparate XML formats, different parsers, consider a pluggable XML handling mechanism ● Schema-specific parsing techniques (TDX for example) are very promising when schemas are known in advance ● When considering designs for multi-core architectures, using TDX may be far faster than attempting to parallelize the other existing processors

GCRL Binghamton University 21 Community Relations ● Publicly available benchmark suite ● Encourage vendors, users, developers to contribute additional XML parsers and sample files as necessary – – – –

GCRL Binghamton University 22 Future Work ● Various techniques to parallelize XML processing ● Add new XML parser tests to the suite – Add more tests for existing parsers ● Include more sample files ● Update web site with current performance snapshots

GCRL Binghamton University 23 Questions