Presentation is loading. Please wait.

Presentation is loading. Please wait.

ILDG File Format Chip Watson, for Middleware & MetaData Working Groups.

Similar presentations


Presentation on theme: "ILDG File Format Chip Watson, for Middleware & MetaData Working Groups."— Presentation transcript:

1 ILDG File Format Chip Watson, for Middleware & MetaData Working Groups

2 December 3, 2004ILDG 5 Workshop, Chip Watson2 Outline The (Real) Requirements The (Real) Requirements Soft Requirements Soft Requirements Issues Issues Options Options Status Status Proposal Proposal

3 December 3, 2004ILDG 5 Workshop, Chip Watson3 The Real File Format Requirements Must be able to share configuration files Must be able to share configuration files –Find and retrieve the files  Addressed by meta data catalog, middleware components –Consume (use) foreign files  Potential implications on how to produce files & meta data Must have a (recommended) way to keep correspondence between binary data in files and the full meta data in the MDC Must have a (recommended) way to keep correspondence between binary data in files and the full meta data in the MDC Must not keep mutable (changeable) meta data within the binary files Must not keep mutable (changeable) meta data within the binary files –Otherwise maintenance is too painful

4 December 3, 2004ILDG 5 Workshop, Chip Watson4 Soft Requirements Making foreign files useable: format should… Adapt easily to variability in binary data type Adapt easily to variability in binary data type –single / double precision –byte ordering (consensus seems to be big endian) –3x3 or 3x2 (consensus seems to be 3x3) Support data integrity checks Support data integrity checks –CRC, plaquette Allow additional (collaboration specific) data to be included Allow additional (collaboration specific) data to be included Make it easy to skip over uninteresting pieces Make it easy to skip over uninteresting pieces

5 December 3, 2004ILDG 5 Workshop, Chip Watson5 Issues How to incorporate legacy data? How to incorporate legacy data? –Convert & re-store? –Provide conversion utility (convert at use)? How to include collaboration specific preferences or standards? How to include collaboration specific preferences or standards? –Certainly want to avoid double storing data (collaboration specific format, and ILDG format) Simplicity vs flexibility… Simplicity vs flexibility… –Flexibility (to address everyone’s desires) comes at a price; can the price be kept low enough?

6 December 3, 2004ILDG 5 Workshop, Chip Watson6 General Approaches Virtual shared format Virtual shared format (different formats, common way to read, hide actual storage format) –binX as universal reader  Collaborations provide binX description OR –C code as reader  Collaborations provide C code  Need to develop a common calling convention (API) Physical shared format Physical shared format –Data retrieved within ILDG is in this format  May require double storage, or conversion on the fly OR –Translation tools are provided by each group

7 December 3, 2004ILDG 5 Workshop, Chip Watson7 Option 1: Binary-only Files Implications: –Meta data exists only in the MDC –Users must keep the correspondence between the file copy and the meta data  File naming conventions (Global File Name, GFN) OR  Local database to track correspondence file : GFN

8 December 3, 2004ILDG 5 Workshop, Chip Watson8 Option 2: NERSC style Meaning: –ASCII header containing essential meta data Implications: –Develop new standard for header –Can include GFN, to allow retrieval of other meta data from MDC

9 December 3, 2004ILDG 5 Workshop, Chip Watson9 Option 3: Structured File Format Goal: encapsulate, in an extensible way, binary data and meta data within a single file Goal: encapsulate, in an extensible way, binary data and meta data within a single file Good Candidate: LIME / SciDAC-derived format Good Candidate: LIME / SciDAC-derived format –SciDAC software committee considered several possibilities for encapsulation (including tar, cpio) –DIME (Microsoft Direct Internet Message Encapsulation) similar in approach to MIME, used for e-Mail attachments, was considered a good fit http://msdn.microsoft.com/library/default.asp?url=/library/en- us/dnservice/html/service01152002.asp –LIME == LQCD modification of DIME to be a bit simpler, and support 64 bit sizes for records –Software implementation (library) exists

10 December 3, 2004ILDG 5 Workshop, Chip Watson10 Option 3 (cont): LIME Details: –File has multiple messages, messages have multiple records –Record format:  32 bits: 3 flags, id-length (13), type-format (3) type-length (13)  record id (variable length, round up to 4 byte multiple)  record type (variable length, round up)  data length (64 bit – DIME was 32)  payload (round up) –SciDAC Records contain either XML meta data (string), or binary –Possible records:  ILDG meta data (XML)  binX descriptor for binary layout  Collaboration specific extensions  Binary data (stored using NERSC conventions) –ILDG meta data record options:  Existing configuration schema (subset, non-mutable)  OR, new, simpler (flat) schema

11 December 3, 2004ILDG 5 Workshop, Chip Watson11 ILDG record idea (minimalist, from Carlton): <ildgFormat> 1.0 1.0 big big 32 32 20 20 20 64 20 20 20 64 </ildgFormat> This is a bit more verbose than the NERSC ASCII header, but is completely extensible (add new fields without breaking old applications), and the string can be parsed by standard XML libraries (which are already planned to be used for ILDG meta data).

12 December 3, 2004ILDG 5 Workshop, Chip Watson12 Current Status ILDG board mandated a solution to file formats to be found prior to this workshop (missed goal) ILDG board mandated a solution to file formats to be found prior to this workshop (missed goal) There is a wide range of opinions on best path forward (XML, NERSC format, pure binary) There is a wide range of opinions on best path forward (XML, NERSC format, pure binary) There may be a current movement towards accepting XML and LIME There may be a current movement towards accepting XML and LIME

13 December 3, 2004ILDG 5 Workshop, Chip Watson13 Proposal 1.Current ad-hoc committee to work out implications of adopting LIME (January 2005)  Standardize ILDG record XML schema  Produce doc, simple test codes to show usage 2.Compare to virtual file format, and to pure binary and NERSC-like (pro’s and con’s) and select a path forward (Jan 2005) 3.Refine selected approach, reaching version 1.0 by the end of February 2005  Documentation of schema, code  C library (if appropriate), test codes available for download


Download ppt "ILDG File Format Chip Watson, for Middleware & MetaData Working Groups."

Similar presentations


Ads by Google