E-Science Data Information and Knowledge Transformation The BinX Language.

Slides:



Advertisements
Similar presentations
Introduction to the BinX Library eDIKT project team Ted Wen Robert Carroll
Advertisements

Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh Alan Chappell PNNL
Data formats in e-Science Two key requirements Two key requirements –Interoperability and Scalability –XML is flexible, but verbose –Binary formats are.
E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen
DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
XML: Extensible Markup Language
Snejina Lazarova Senior QA Engineer, Team Lead CRMTeam Dimo Mitev Senior QA Engineer, Team Lead SystemIntegrationTeam Telerik QA Academy SOAP-based Web.
ILDG File Format Chip Watson, for Middleware & MetaData Working Groups.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
StatCat Building a Statistical Data Finder ssrs.yale.edu/statcat Steven Citron-Pousty Ann Green Julie Linden Yale University.
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 13-1 COS 346 Day 25.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
The Basic Tools Presented by: Robert E., & Jonathan Chase.
Mapping Physical Formats to Logical Models to Extract Data and Metadata Tara Talbott IPAW ‘06.
BinX and Astronomy Bob Mann Institute for Astronomy and National e-Science Centre.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Technical Track Session XML Techie Tools Tim Bornholt.
1 1 Roadmap to an IEPD What do developers need to do?
4/20/2017.
CVSQL 2 The Design. System Overview System Components CVSQL Server –Three network interfaces –Modular data source provider framework –Decoupled SQL parsing.
Sheet 1XML Technology in E-Commerce 2001Lecture 6 XML Technology in E-Commerce Lecture 6 XPointer, XSLT.
An Extension to XML Schema for Structured Data Processing Presented by: Jacky Ma Date: 10 April 2002.
EARTH SCIENCE MARKUP LANGUAGE “Define Once Use Anywhere” INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
1 XML at a neighborhood university near you Innovation 2005 September 16, 2005 Kwok-Bun Yue University of Houston-Clear Lake.
Zhonghua Qu and Ovidiu Daescu December 24, 2009 University of Texas at Dallas.
An Introduction to XML Presented by Scott Nemec at the UniForum Chicago meeting on 7/25/2006.
CS 157B: Database Management Systems II May 8 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Peoplesoft XML Publisher Integration with PeopleTools -Jayalakshmi S.
Intro. to XML & XML DB Bun Yue Professor, CS/CIS UHCL.
1 Use of XML in LDR's Integrated Tax System Louisiana Department of Revenue Technology Conference San Antonio, TX August , 2000.
Supporting High- Performance Data Processing on Flat-Files Xuan Zhang Gagan Agrawal Ohio State University.
Technical Aspects of SIARD “SIARD under the hood” 10. April 2003 / Stephan Heuscher.
Ontologies and Lexical Semantic Networks, Their Editing and Browsing Pavel Smrž and Martin Povolný Faculty of Informatics,
E-Science Data Information and Knowledge Transformation Edikt : e-Science Data, Information and Knowledge Transformation E-Science Centres of Excellence.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
“This presentation is for informational purposes only and may not be incorporated into a contract or agreement.”
1 Scalable Vector Graphics (SVG). 2 SVG SVG is an application language of XML. “SVG is a language for describing two- dimensional graphics in XML. SVG.
SupervisorStudent Prof. Atilla ElciHussam Hussein ABUAZAB June 2007 Using ORACLE XML Parser to Access Ontology CMPE 588 Engineering Semantic for.
Metadata Mòrag Burgon-Lyon University of Glasgow.
XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.
CS 157B: Database Management Systems II February 11 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
Starlink VOTable software Author: Mark Taylor Open source Java software for table manipulation STIL:
SOAP-based Web Services Telerik Software Academy Software Quality Assurance.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, October.
Martin Kruliš by Martin Kruliš (v1.1)1.
E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen
©2001 Priority Technologies, Inc. All Rights Reserved Meteor Status Miami Face to Face Meeting January 16 – 18, 2002.
XML Tools (Chapter 4 of XML Book). What tools are needed for a complete XML application? n Fundamental components n Web infrasructure n XML development.
The BinX API eDIKT project team May 2003 Ted Wen Robert Carroll
The HDF Group Introduction to HDF5 Session Two Data Model Comparison HDF5 File Format 1 Copyright © 2010 The HDF Group. All Rights Reserved.
Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
XML 1. Chapter 8 © 2013 Pearson Education, Inc. Publishing as Prentice Hall SAMPLE XML SCHEMA (XSD) 2 Schema is a record definition, analogous to the.
I Copyright © 2004, Oracle. All rights reserved. Introduction.
1 XML and XML in DLESE Katy Ginger November 2003.
Fundamentals & Ethics of Information Systems IS 201
What is FITS? FITS = Flexible Image Transport System
XML in Web Technologies
Database Processing with XML
MANAGING DATA RESOURCES
Ivan Kurtev, Klaas van den Berg Software Engineering Group
Database Design Hacettepe University
Real-World File Structures
Supporting High-Performance Data Processing on Flat-Files
Presentation transcript:

e-Science Data Information and Knowledge Transformation The BinX Language

What is BinX?  Binary in XML –Use XML to mark up binary data –Mark up data types –Mark up sequences –Mark up arrays –Complex structures

Primitive Data Types  Mark up data types FF 7F 7F FF FF FF C C

Abstract “struct” types  Mark up a sequence Screen descriptor in GIF: Screen width: unsigned short; Screen height: unsigned short; Packed field: a byte Background colour index: byte Pixel aspect ratio: byte

Abstract “array” types  Mark up an array A 2-dimensional array containing 10-by-100, 32-bit integers

Embedded abstract types  Complex structures

User-defined metadata  Label the data types and structures

Reusable type definitions  Define macros for reuse

Linking to binary data  Reference the binary data file … …

A BinX document  –  – –  –  –  Root element Data class section Data instance section Abstract data type

DataBinX DataBinX = BinX with Data

e-Science Data Information and Knowledge Transformation The BinX Library

BinX Components  The library has core functionality to support generic utilities and applications Applications Utilities BinX Library Core BinX core functionality Parse/Gen BinX doc Read/write binary data Parse/Gen DataBinX Generic tools DataBinx pack/unpack Extractor, Viewer BinX editor Applications Domain-specific

BinX application models  Data catalogue model  Data manipulation model  Data query model  Data service model  Data transportation model

Data catalogue model Primary storage Binary data files Metadata Syntactic annotation Semantic annotation Classification Domain specific Cross-reference XLink BinX 1.1 BinX 1.1 BinX BinX BinX BinX BinX BinX BinX 1.2 BinX 1.2 BinX 1 BinX 1 BINARY Detailed Abstract METADATA

Data manipulation model  Extraction –Subset of a dataset  Combination –Merge several datasets  Transformation –Conversion of data types –Change of sequence order –Transposition of array dimensions  Transparency –Automatic change of byte order

Data query model  In-dataset query –XPath against virtual XML  Cross-dataset query –Link into multiple datasets  Defining result format –XQuery-based return fragment  Output interface –SAX events Utility BinX library BinX data source BinX data source DataBinX SAX Events VOTable SAX Events APP VOTable APP DataBinx BinX data source BinX data source APP Custom XQuery SAX Events BinX data source BinX data source XPath BinX data source BinX data source XLink Transform

Data service model  Publishing logical datasets in BinX DB Client BinX Grid BinX Dataset from one binary file Dataset from several binary files Dataset from multiple data sources

Data transportation model DataBinX as interlingua XML document XML document DataBinX Schema BinX Schema BinX + Binary BinX + Binary ZIP (MIME) ZIP (MIME) XSLT BinX Util ZIP tool Send Receive XSLT BinX Util ZIP tool

e-Science Data Information and Knowledge Transformation Application in Astronomy Case Study 1 Data Conversion Between FITS and VOTable

Application in astronomy  FITS and VOTable conversion DataBinX Utility BinX library Core SIMPLE = T … END SIMPLE = T … END <?xml version=. … <?xml version=. …

FITS file SIMPLE = T / file does conform to FITS standard BITPIX = 8 / number of bits per data pixel NAXIS = 1 / number of data axes … END 3D 4A 14 0F 1C FE … … XTENSION= ‘BINTABLE’ / binary table extension BITPIX = 8 / 8-bit bytes NAXIS = 2 / 2-dimensional binary table … END 7B 3E 40 2C E7 6F … … 0 79 Primary HDU Extension Header Data

VOTable Procyon

FITS →DataBinX →VOTable  FITS to VOTable conversion DataBinX Utility FITS Schema BinX Schema BinX Preprocessor DataBinX VOTable XSLT transformer

VOTable→DataBinX→FITS  VOTable to FITS conversion XSLT transformer VOTable XSLT Preprocessor DataBinX FITS Schema BinX Schema BinX DataBinX Utility Binary Data Binary Data Post processor FITS Header FITS Header

FITS-VOTable experiment  Sample FITS file –A data table of 82 rows X 20 fields –File size: 37KB  Generated DataBinX by DataBinX utility –Time spent: 268 ms –DataBinX document size: 1.2MB  VOTable transformed by MSXML –Time spent: about 1 second –VOTable document size: 51KB F V DB

e-Science Data Information and Knowledge Transformation Application in Astronomy Case Study 2 Data Transportation by pipelining BinX and VOTable

The Problem  Three kinds of VOTable data sources –Pure XML VOTable (large) –VOTable + FITS (small) –VOTable + Binary (smaller)  Difficulties –Additional parser for VOTable+Binary –Limited binary format –Byte order and data types

The Solution: VOTable + BinX  No coding necessary  Smaller data files  Easy to separate and restore  Pipelined to work in the background  Platform independent

Approaches 1.Embedded BinX 2.BinX document linking Perhaps another method?

Embedded BinX  Example:

BinX Document Linking  Example:

Comparison of the two approaches  Embedded BinX –Advantages:  One annotation file  Consistency with VOTable definitions –Disadvantages:  Spoil the VOTable document  Difficult to parse  BinX document linking –Advantages:  Keep VOTable clean  Easy to parse –Disadvantages:  Need separate BinX document  Difficult to keep consistent

e-Science Data Information and Knowledge Transformation BinX Software Today and the Future

Future releases  Utilities (GUI BinX editor)  XPath-based data query  DFDL support  Text file support  Output through SAX events  Output as XQuery return  Database interfacing  Java wrapper for utilities

Support  Information and software download: – (coming soon)  Questions:  Requirements and suggestions: