Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to the BinX Library eDIKT project team Ted Wen Robert Carroll

Similar presentations


Presentation on theme: "Introduction to the BinX Library eDIKT project team Ted Wen Robert Carroll"— Presentation transcript:

1 Introduction to the BinX Library eDIKT project team Ted Wen tedwen@nesc.ac.uk tedwen@nesc.ac.uk Robert Carroll robertc@nesc.ac.uk robertc@nesc.ac.uk

2 Agenda About the BinX project About the BinX project A brief introduction to the BinX language A brief introduction to the BinX language Introduction to the BinX library Introduction to the BinX library Advanced API to the BinX library Advanced API to the BinX library Use cases and requirements Use cases and requirements Dr Bob Mann Dr Bob Mann Dr Chris Maynard Dr Chris Maynard Discussion Discussion

3 About the BinX project

4 The problem XML is useful to represent metadata XML is useful to represent metadata Scientific datasets can be too large in XML Scientific datasets can be too large in XML Most scientific data are in binary files Most scientific data are in binary files Binary data files are not all standardized Binary data files are not all standardized Binary data files are platform-dependent Binary data files are platform-dependent

5 BinX – a solution Initially designed for the Grid environment Initially designed for the Grid environment Annotate data schema for any binary file Annotate data schema for any binary file Data elements are marked up in XML Data elements are marked up in XML Describe three levels of features in a binary file Describe three levels of features in a binary file Underlying physical representation (byte order) Underlying physical representation (byte order) Primitive data types (integer, float) Primitive data types (integer, float) Structure of the dataset (array, table) Structure of the dataset (array, table)

6 The BinX project at eDIKT Implementing a software library for BinX Implementing a software library for BinX Develop a series of tools based on the library Develop a series of tools based on the library Choose C++ for performance Choose C++ for performance Write portable code for different platforms Write portable code for different platforms Robust and easy to use Robust and easy to use

7 Development status Requirement gathering from July 2002 Requirement gathering from July 2002 Development started in October 2002 Development started in October 2002 Prototype finished in December 2002 Prototype finished in December 2002 Alpha version complete in April 2003 Alpha version complete in April 2003 Beta version to be released in June 2003 Beta version to be released in June 2003

8 The deliverables The BinX library The BinX library Compiled code on different platforms Compiled code on different platforms Source code with Open Source license Source code with Open Source license Documentation Documentation Users guide Users guide Developers guide Developers guide Utilities and examples Utilities and examples

9 The BinX Language

10 What is BinX? The Binary XML Description Language The Binary XML Description Language A language for annotating binary data files A language for annotating binary data files It describes data types, data structures and attributes such as byte order It describes data types, data structures and attributes such as byte order A BinX document is an XML file with metadata of a binary data file A BinX document is an XML file with metadata of a binary data file

11 A BinX document Root element Data class section Data instance section Abstract data type

12 Data elements Primitive data elements Primitive data elements Byte, character, integer, real Byte, character, integer, real Complex data elements Complex data elements Arrays, struct, union Arrays, struct, union User-defined data elements User-defined data elements

13 Primitive data types Bit Bit Character Character Integer Integer,, Real Real

14 Complex data types Arrays Arrays Repetitive collection of any data element Repetitive collection of any data element Multidimensional Multidimensional Three types of arrays Three types of arrays Fixed length array Fixed length array Variable-length array Variable-length array Streamed array Streamed array Struct Struct A sequence of data elements A sequence of data elements Union Union One of a group of possible data elements conditional to the discriminant One of a group of possible data elements conditional to the discriminant

15 Arrays Fixed-length array Fixed-length array Variable-length array Variable-length array Streamed array Streamed array

16 Struct

17 Union

18 User-defined data type

19 Data elements as instances

20 Reference defined elements

21 The BinX Library Alpha version

22 Fundamental requirements Access to data elements in binary files via BinX Access to data elements in binary files via BinX Parse the BinX document Parse the BinX document Build in-memory data structures Build in-memory data structures Read data values from the binary file Read data values from the binary file Automatic conversion Automatic conversion Byte ordering Byte ordering Padding Padding Producing BinX document and binary data Producing BinX document and binary data Generate BinX document for data structures Generate BinX document for data structures Save assigned data values into binary files Save assigned data values into binary files

23 General use cases Data conversion (byte order) Data conversion (byte order) Data extraction (sub-dataset) Data extraction (sub-dataset) Data combination (two arrays to one) Data combination (two arrays to one) Data presentation (browse, pure XML) Data presentation (browse, pure XML)

24 BinX Components The library has core functionality to support generic utilities and applications The library has core functionality to support generic utilities and applications Applications Utilities BinX Library Core BinX core functionality Parse BinX document Read binary data Generic tools Data conversion Extraction Packing/Unpacking Applications Domain-specific

25 The BinX library core Input: SchemaBinX, binary data file Input: SchemaBinX, binary data file Output: DataBinX, In-memory dataset Output: DataBinX, In-memory dataset … … 0101010101 The BinX library In-memory Data structure (Values loaded on demand) 100 100

26 The BinX Utilities DataBinX generator DataBinX generator DataBinX splitter DataBinX splitter SchemaBinX creator SchemaBinX creator Binary file indexer Binary file indexer

27 DataBinX generator Put binary data inside XML Put binary data inside XML For browsing, web service return, query result set For browsing, web service return, query result set … … 0101010101 The BinX library 100 100

28 DataBinX splitter The reverse of DataBinX generator The reverse of DataBinX generator Generate binary file for testing, transportation Generate binary file for testing, transportation Cross-platform (byte order) Cross-platform (byte order) … … 0101010101 The BinX library 100 100

29 SchemaBinX creator GUI and Web-based utilities GUI and Web-based utilities Build BinX document interactively Build BinX document interactively Create a BinX document based on another Create a BinX document based on another

30 Binary file indexer Generating indices for binary data files Generating indices for binary data files Such indices can be used for fast data access Such indices can be used for fast data access … … 0101010101 The BinX library XYXY 0000 0004

31 Applications for astronomy FITS and VOTable conversion FITS and VOTable conversion DataBinX Utility BinX library Core SIMPLE = T … END 01010101 SIMPLE = T … END 01010101 <?xml version=. … <?xml version=. …

32 FITS DataBinX VOTable FITS to VOTable conversion FITS to VOTable conversion DataBinx Utility FITS Schema BinX Schema BinX Preprocessor DataBinx VOTable XSLT transformer

33 VOTableDataBinXFITS VOTable to FITS conversion VOTable to FITS conversion XSLT transformer VOTable XSLT Preprocessor DataBinx FITS Schema BinX Schema BinX DataBinx Utility Binary Data Binary Data Post processor FITS Header FITS Header

34 FITS-VOTable experiment Sample FITS file Sample FITS file A data table of 82 rows X 20 fields A data table of 82 rows X 20 fields File size: 37KB File size: 37KB Generated DataBinx by DataBinx utility Generated DataBinx by DataBinx utility Time spent: 268 ms Time spent: 268 ms DataBinx document size: 1.2MB DataBinx document size: 1.2MB VOTable transformed by MSXML VOTable transformed by MSXML Time spent: about 1 second Time spent: about 1 second VOTable document size: 51KB VOTable document size: 51KB

35 Possible future releases DataBinX parsing DataBinX parsing Utilities (GUI BinX editor) Utilities (GUI BinX editor) XPath-based data query XPath-based data query DFDL support DFDL support Preserving special tags Preserving special tags For comments, application-specific tags For comments, application-specific tags Text file support Text file support

36 Features or issues to consider Converting floating point numbers Converting floating point numbers 80-bit, 96-bit, 128-bit floating point 80-bit, 96-bit, 128-bit floating point Array manipulation (slice, section) Array manipulation (slice, section) SAX-based XML document parsing SAX-based XML document parsing Use cases in place of DOM parsing Use cases in place of DOM parsing Built in the library or as add-on component? Built in the library or as add-on component? Database support Database support Annotating database tables? Annotating database tables? Query database tables through BinX? Query database tables through BinX? Java version of the library Java version of the library Keeping exactly the same features with the C++ version? Keeping exactly the same features with the C++ version? Supporting XQuery Supporting XQuery Query binary data files with XQuery on BinX Query binary data files with XQuery on BinX

37 Support For problems of usage: For problems of usage: http://www.edikt.org/binx (coming soon) http://www.edikt.org/binx (coming soon) http://www.edikt.org/binx support@edikt.org support@edikt.org support@edikt.org For requirements and suggestions: For requirements and suggestions: tedwen@edikt.org tedwen@edikt.org tedwen@edikt.org robertc@edikt.org robertc@edikt.org robertc@edikt.org


Download ppt "Introduction to the BinX Library eDIKT project team Ted Wen Robert Carroll"

Similar presentations


Ads by Google