Presentation is loading. Please wait.

Presentation is loading. Please wait.

E-Science Data Information and Knowledge Transformation The BinX Language.

Similar presentations


Presentation on theme: "E-Science Data Information and Knowledge Transformation The BinX Language."— Presentation transcript:

1 e-Science Data Information and Knowledge Transformation The BinX Language

2 www.edikt.org What is BinX?  Binary in XML –Use XML to mark up binary data –Mark up data types –Mark up sequences –Mark up arrays –Complex structures

3 www.edikt.org 1. 32767 2. 2147483647 3. 100.0 4. 100.0 Primitive Data Types  Mark up data types FF 7F 7F FF FF FF 00 00 C8 42 42 C8 00 00 1234

4 www.edikt.org Abstract “struct” types  Mark up a sequence Screen descriptor in GIF: Screen width: unsigned short; Screen height: unsigned short; Packed field: a byte Background colour index: byte Pixel aspect ratio: byte

5 www.edikt.org Abstract “array” types  Mark up an array A 2-dimensional array containing 10-by-100, 32-bit integers

6 www.edikt.org Embedded abstract types  Complex structures

7 www.edikt.org User-defined metadata  Label the data types and structures

8 www.edikt.org Reusable type definitions  Define macros for reuse

9 www.edikt.org Linking to binary data  Reference the binary data file … …

10 www.edikt.org A BinX document  –  – –  –  –  Root element Data class section Data instance section Abstract data type

11 www.edikt.org DataBinX DataBinX = BinX with Data 100 1000 5.257 1 2

12 e-Science Data Information and Knowledge Transformation The BinX Library

13 www.edikt.org BinX Components  The library has core functionality to support generic utilities and applications Applications Utilities BinX Library Core BinX core functionality Parse/Gen BinX doc Read/write binary data Parse/Gen DataBinX Generic tools DataBinx pack/unpack Extractor, Viewer BinX editor Applications Domain-specific

14 www.edikt.org BinX application models  Data catalogue model  Data manipulation model  Data query model  Data service model  Data transportation model

15 www.edikt.org Data catalogue model Primary storage Binary data files Metadata Syntactic annotation Semantic annotation Classification Domain specific Cross-reference XLink 0101 0101 01 BinX 1.1 BinX 1.1 BinX 1.2.1 BinX 1.2.1 BinX 1.2.2 BinX 1.2.2 BinX 1.2.3 BinX 1.2.3 0101 0101 01 BinX 1.2 BinX 1.2 BinX 1 BinX 1 BINARY Detailed Abstract METADATA

16 www.edikt.org Data manipulation model  Extraction –Subset of a dataset  Combination –Merge several datasets  Transformation –Conversion of data types –Change of sequence order –Transposition of array dimensions  Transparency –Automatic change of byte order

17 www.edikt.org Data query model  In-dataset query –XPath against virtual XML  Cross-dataset query –Link into multiple datasets  Defining result format –XQuery-based return fragment  Output interface –SAX events Utility BinX library 0101010 10 BinX data source BinX data source DataBinX SAX Events VOTable SAX Events APP VOTable APP DataBinx 0101010 10 BinX data source BinX data source APP Custom XQuery SAX Events 0101010 10 BinX data source BinX data source XPath 0101010 10 BinX data source BinX data source XLink Transform

18 www.edikt.org Data service model  Publishing logical datasets in BinX DB 0101 0101 01 Client BinX Grid 0101 0101 01 BinX Dataset from one binary file Dataset from several binary files Dataset from multiple data sources

19 www.edikt.org Data transportation model DataBinX as interlingua XML document XML document DataBinX Schema BinX Schema BinX + Binary BinX + Binary ZIP (MIME) ZIP (MIME) XSLT BinX Util ZIP tool Send Receive XSLT BinX Util ZIP tool

20 e-Science Data Information and Knowledge Transformation Application in Astronomy Case Study 1 Data Conversion Between FITS and VOTable

21 www.edikt.org Application in astronomy  FITS and VOTable conversion DataBinX Utility BinX library Core SIMPLE = T … END 01010101 SIMPLE = T … END 01010101 <?xml version=. … <?xml version=. …

22 www.edikt.org FITS file SIMPLE = T / file does conform to FITS standard BITPIX = 8 / number of bits per data pixel NAXIS = 1 / number of data axes … END 3D 4A 14 0F 1C FE 25 04 … … XTENSION= ‘BINTABLE’ / binary table extension BITPIX = 8 / 8-bit bytes NAXIS = 2 / 2-dimensional binary table … END 7B 3E 40 2C 16 70 E7 6F … … 0 79 Primary HDU Extension Header Data

23 www.edikt.org VOTable Procyon 114.827 5.227 4 5 3 4 3 2 1 2 3 3 5 6

24 www.edikt.org FITS →DataBinX →VOTable  FITS to VOTable conversion DataBinX Utility FITS Schema BinX Schema BinX Preprocessor DataBinX VOTable XSLT transformer

25 www.edikt.org VOTable→DataBinX→FITS  VOTable to FITS conversion XSLT transformer VOTable XSLT Preprocessor DataBinX FITS Schema BinX Schema BinX DataBinX Utility Binary Data Binary Data Post processor FITS Header FITS Header

26 www.edikt.org FITS-VOTable experiment  Sample FITS file –A data table of 82 rows X 20 fields –File size: 37KB  Generated DataBinX by DataBinX utility –Time spent: 268 ms –DataBinX document size: 1.2MB  VOTable transformed by MSXML –Time spent: about 1 second –VOTable document size: 51KB F V DB

27 e-Science Data Information and Knowledge Transformation Application in Astronomy Case Study 2 Data Transportation by pipelining BinX and VOTable

28 www.edikt.org The Problem  Three kinds of VOTable data sources –Pure XML VOTable (large) –VOTable + FITS (small) –VOTable + Binary (smaller)  Difficulties –Additional parser for VOTable+Binary –Limited binary format –Byte order and data types

29 www.edikt.org The Solution: VOTable + BinX  No coding necessary  Smaller data files  Easy to separate and restore  Pipelined to work in the background  Platform independent

30 www.edikt.org Approaches 1.Embedded BinX 2.BinX document linking Perhaps another method?

31 www.edikt.org Embedded BinX  Example: http://www.edikt.org/binx/2003/06/binx

32 www.edikt.org BinX Document Linking  Example:

33 www.edikt.org Comparison of the two approaches  Embedded BinX –Advantages:  One annotation file  Consistency with VOTable definitions –Disadvantages:  Spoil the VOTable document  Difficult to parse  BinX document linking –Advantages:  Keep VOTable clean  Easy to parse –Disadvantages:  Need separate BinX document  Difficult to keep consistent

34 e-Science Data Information and Knowledge Transformation BinX Software Today and the Future

35 www.edikt.org Future releases  Utilities (GUI BinX editor)  XPath-based data query  DFDL support  Text file support  Output through SAX events  Output as XQuery return  Database interfacing  Java wrapper for utilities

36 www.edikt.org Support  Information and software download: –http://www.edikt.org/binx (coming soon)http://www.edikt.org/binx  Questions: –support@edikt.orgsupport@edikt.org  Requirements and suggestions: –tedwen@edikt.orgtedwen@edikt.org –robertc@edikt.orgrobertc@edikt.org


Download ppt "E-Science Data Information and Knowledge Transformation The BinX Language."

Similar presentations


Ads by Google