Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh Alan Chappell PNNL

Similar presentations


Presentation on theme: "Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh Alan Chappell PNNL"— Presentation transcript:

1 Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk Alan Chappell PNNL chappella@battelle.org

2 Agenda Introduction and welcome - Martin Westhead 10mins Binary Format Description Language (BFD) - Alan Chappell 10mins Binary XML (BinX) - Stephen Rutherford 10mins DFDL - Martin Westhead 15mins – Big picture – Structural Description Language – Charter (20 mins Discussion) Examples repository - Alan Chappell 10mins –Bruce Barkstrom Examples at NASA (15mins Discussion)

3 Motivation There will never be a standard data format –E.g. XML – verbose, tree-based, explicit structure –Legacy formats –Application specific formats –One size will never fit all But could we provide a language for describing formats –Transparency of physical representation –Automatic format conversion –Unambiguous description of data

4 Theres more… Explicit structure enables: Standard transformation to/from XML representation –Could allow application to read/write XML –But provide underlying efficient binary representation Data stream/file becomes database –Point to parts of the structure –Extract parts of the structure –Modify parts of the structure –Integrate parts of different structures

5 And more… Generic tools possible –Browsing –Conversion and transformation Annotation of data –E.g. identify bits that depict hurricane in an image Enables general semantic labels, many ontologies could be developed e.g.: –S.I. units, SQL types, Time –Community specific labels, starClass = whiteDwarf –Application specific labels, nodeColour = green Could lead to a standard transformation language

6 Not fairy tales Based on implemented work –BinX http://www.epcc.ed.ac.uk/gridserve/WP5/Binx/ http://www.epcc.ed.ac.uk/gridserve/WP5/Binx/ –BFD part of the Scientific Annotation Middleware project (http://www.scidac.org/SAM/)http://www.scidac.org/SAM/ Generalized and extended a little Formal semantics Foundation for extensibility

7 Approach Separate out structure and semantics General structural language –Repetition –Pointers –References to data –New structures can be built (compositionality) Semantics –Hard to express so…we dont –General labeling –Label semantics define elsewhere (ontologies) –Labels can be added (extensibility)

8 Structure – arbitrary labels fooSet fooPair foo bunchThings thing0 1 1 0 0 1 1 1 bunchThings............ foo...... fooPair......

9 Structure – example labels complex Array complex float byte bit0 1 1 0 0 1 1 1 byte............ float...... complex......

10 Structural language Formal semantics –Structured binary sequence –Defines hierarchical structure over underlying sequence of binary values Language for describing hierarchical structure –Repetition Explicit number repeats Termination characters –Data reference Conditionals Data size –Pointers Scope –As general as possible but –Must be concise and implementable Draft language definition on web page (www.epcc.ed.ac.uk/dfdl)

11 CSV file example char:=byte data:=[(char - [',']).*] field:=[data; [',']] finalField:=[data; [\n]] row:=[field.*] :: [finalField] table:=[row.*]

12 Semantic labels Many ontologies possible Initial scope probably: –Basic types (floating point, integer, character) –Simple structures (structs, arrays, tables) Obvious extensions: –SQL types –XML Schema types Key WG goal: –Define form and requirements of new ontologies

13 What is an Ontology? XML Schema for new types Structural description of new types Definition of core API behaviour on new type API extensions Relationships to other types

14 WG goals Formal language for DFDL data structure Standard representation of this language in XML Requirements for DFDL ontology Basic types ontology Basic structures ontology

15 Currently under discussion Abstraction from the underlying binary –Compression, encoding, encryption –Physical vs. conceptual binary sequence Abstraction of description –complex:=[foo; foo] –Instantiate foo:= float or foo:= double at use time Filtering of results –Getting to data model and leave format behind –CSV -> [[value; value; value]; [value; value; value]]

16 DFDL in the VO Generic tools Metadata possibilities –Ontologies can define relationships between types –E.g. polar to Cartesian –Standard classes over data objects

17 Getting involved Webpages: http://www.epcc.ed.ac.uk/dfdl Mailing list (dfdl@gridforum.org) My address: M.Westhead@epcc.ed.ac.uk


Download ppt "Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh Alan Chappell PNNL"

Similar presentations


Ads by Google