ILDG File Format Chip Watson, for Middleware & MetaData Working Groups.

Slides:



Advertisements
Similar presentations
© 1998, Progress Software Corporation 1 Migration of a 4GL and Relational Database to Unicode Tex Texin International Product Manager.
Advertisements

Introduction to the BinX Library eDIKT project team Ted Wen Robert Carroll
Provenance-Aware Storage Systems Margo Seltzer April 29, 2005.
A Blended Curriculum for Bermuda Public Primary Schools
E-Science Data Information and Knowledge Transformation The BinX Language.
Data Quality Class 3. Goals Dimensions of Data Quality Enterprise Reference Data Data Parsing.
I/O and the SciDAC Software API Robert Edwards U.S. SciDAC Software Coordinating Committee May 2, 2003.
Word Templates- Documents Directly from GP.
RMS Importer/Exporter Create configuration for the MedAustron Control System PP a-ABR_RMSImporterExporter.pptm abrett/mmarchha RMS Importer/Exporter.
CVSQL 2 The Design. System Overview System Components CVSQL Server –Three network interfaces –Modular data source provider framework –Decoupled SQL parsing.
MISMO Trimester Meeting January Jacksonville Florida Using the Reference Model Internally: Enterprise Systems Jim Metzger, Harland Greg Alvord,
Project Proposal: Academic Job Market and Application Tracker Website Project designed by: Cengiz Gunay Client: Cengiz Gunay Audience: PhD candidates and.
Overview of Mini-Edit and other Tools Access DB Oracle DB You Need to Send Entries From Your Std To the Registry You Need to Get Back Updated Entries From.
Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Mike Folks, The HDF Group Ruth Duerr, NSIDC 1.
EARTH SCIENCE MARKUP LANGUAGE “Define Once Use Anywhere” INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
Joomla! Day France SEBLOD Version 2.0 for Joomla! 1.6.
Session 8-1 Session 8 The Power and Flexibility of EDExpress.
Lattice 2004Chris Maynard1 QCDml Tutorial How to mark up your configurations.
MAHI Research Database Data Validation System Software Prototype Demonstration September 18, 2001
The NISO Question/Answer Transaction Protocol (QATP) AVIAC January 2004 Donna Dinberg Library and Archives Canada Mark Needleman Sirsi Corporation.
Sending Topic 4, Chapters 9, 10 Network Programming Kansas State University at Salina.
Peoplesoft XML Publisher Integration with PeopleTools -Jayalakshmi S.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
XML: Changing the Way SFA Does Business Presented by Paul Hill & Holly Hyland.
The NISO NETREF Protocol Mark H Needleman Product Manager- Standards Sirsi Corporation LITA National Conference 2004.
The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1.
The netCDF-4 data model and format Russ Rew, UCAR Unidata NetCDF Workshop 25 October 2012.
Macro Express. What is a Macro? “A macro is a way to automate a task that you perform repeatedly or on a regular basis. It is a series of commands and.
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct HDF and.
The european ITM Task Force data structure F. Imbeaux.
Operating Systems COMP 4850/CISG 5550 File Systems Files Dr. James Money.
CS4432: Database Systems II Record Representation 1.
ILDG Middleware Status Bálint Joó UKQCD University of Edinburgh, School of Physics on behalf of ILDG Middleware Working Group alternative title: Report.
Lattice QCD Data Grid Middleware: status report M. Sato, CCS, University of Tsukuba ILDG6, May, 12, 2005.
COD Common Record & XML Paul Hill Senior Technical Advisor, Title IV Delivery SFA Schools Channel.
Using and modifying plan constraints in Constable Jim Blythe and Yolanda Gil Temple project USC Information Sciences Institute
1 Digital Preservation Testbed Database Preservation Issues Remco Verdegem Bern, 9 April 2003.
“This presentation is for informational purposes only and may not be incorporated into a contract or agreement.”
XML Engr. Faisal ur Rehman CE-105T Spring Definition XML-EXTENSIBLE MARKUP LANGUAGE: provides a format for describing data. Facilitates the Precise.
National Center for Supercomputing ApplicationsNational Computational Science Grid Packaging Technology Technical Talk University of Wisconsin Condor/GPT.
25th Nov 2005CERN AB Controls Post Mortem data conversion G.Kruk.
1 Metadata Working G roup Report Members (fixed in mid-January) G.AndronicoINFN,Italy P.CoddingtonAdelaide,Australia R.EdwardsJlab,USA C.MaynardEdinburgh,UK.
Lattice QCD Data Grid Middleware: Meta Data Catalog (MDC) -- CCS ( tsukuba) proposal -- M. Sato, for ILDG Middleware WG ILDG Workshop, May 2004.
Firmware - 1 CMS Upgrade Workshop October SLHC CMS Firmware SLHC CMS Firmware Organization, Validation, and Commissioning M. Schulte, University.
Lecture 10 Page 1 CS 111 Summer 2013 File Systems Control Structures A file is a named collection of information Primary roles of file system: – To store.
Connect. Communicate. Collaborate perfSONAR base 2.0 for Java services Maciej Głowiak, Roman Łapacz, PSNC JRA1 meeting, Zagreb, 2008.
CommonRecord: CommonLine Implementation Gary Allen David Steiner.
PQDIF PQDIF: A Technical Overview Prepared by: Erich Gunther, Bill Dabbs, and Rob Scott Electrotek Concepts, Inc. NEW! IMPROVED!
E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen
Patterns in caBIG Baris E. Suzek 12/21/2009. What is a Pattern? Design pattern “A general reusable solution to a commonly occurring problem in software.
LCG Distributed Databases Deployment – Kickoff Workshop Dec Database Lookup Service Kuba Zajączkowski Chi-Wei Wang.
Marco Cattaneo, 6-Apr Issues identified in sub-detector OO software reviews Calorimeters:18th February Tracking:24th March Rich:31st March.
ECHO Technical Interchange Meeting 2013 Timothy Goff 1 Raytheon EED Program | ECHO Technical Interchange 2013.
This was written with the assumption that workbooks would be added. Even if these are not introduced until later, the same basic ideas apply Hopefully.
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
University of Colorado at Denver and Health Sciences Center Department of Preventive Medicine and Biometrics Contact:
Clinical Data Exchange using HL7 and Mirth Connect Lecture 14 - DICOM connectors - Encoding/decoding Base64 data - Message Attachments - System Events.
Intro to Google Docs 2014.
Chapter 3 Data Representation
Module 11: File Structure
SAP Business One B1iF Training
The importance of being Connected
Event Data Definition in LHCb
What is FITS? FITS = Flexible Image Transport System
ILDG Implementation Status
MANAGING DATA RESOURCES
Real-World File Structures
Best Practices in Higher Education Student Data Warehousing Forum
Presentation transcript:

ILDG File Format Chip Watson, for Middleware & MetaData Working Groups

December 3, 2004ILDG 5 Workshop, Chip Watson2 Outline The (Real) Requirements The (Real) Requirements Soft Requirements Soft Requirements Issues Issues Options Options Status Status Proposal Proposal

December 3, 2004ILDG 5 Workshop, Chip Watson3 The Real File Format Requirements Must be able to share configuration files Must be able to share configuration files –Find and retrieve the files  Addressed by meta data catalog, middleware components –Consume (use) foreign files  Potential implications on how to produce files & meta data Must have a (recommended) way to keep correspondence between binary data in files and the full meta data in the MDC Must have a (recommended) way to keep correspondence between binary data in files and the full meta data in the MDC Must not keep mutable (changeable) meta data within the binary files Must not keep mutable (changeable) meta data within the binary files –Otherwise maintenance is too painful

December 3, 2004ILDG 5 Workshop, Chip Watson4 Soft Requirements Making foreign files useable: format should… Adapt easily to variability in binary data type Adapt easily to variability in binary data type –single / double precision –byte ordering (consensus seems to be big endian) –3x3 or 3x2 (consensus seems to be 3x3) Support data integrity checks Support data integrity checks –CRC, plaquette Allow additional (collaboration specific) data to be included Allow additional (collaboration specific) data to be included Make it easy to skip over uninteresting pieces Make it easy to skip over uninteresting pieces

December 3, 2004ILDG 5 Workshop, Chip Watson5 Issues How to incorporate legacy data? How to incorporate legacy data? –Convert & re-store? –Provide conversion utility (convert at use)? How to include collaboration specific preferences or standards? How to include collaboration specific preferences or standards? –Certainly want to avoid double storing data (collaboration specific format, and ILDG format) Simplicity vs flexibility… Simplicity vs flexibility… –Flexibility (to address everyone’s desires) comes at a price; can the price be kept low enough?

December 3, 2004ILDG 5 Workshop, Chip Watson6 General Approaches Virtual shared format Virtual shared format (different formats, common way to read, hide actual storage format) –binX as universal reader  Collaborations provide binX description OR –C code as reader  Collaborations provide C code  Need to develop a common calling convention (API) Physical shared format Physical shared format –Data retrieved within ILDG is in this format  May require double storage, or conversion on the fly OR –Translation tools are provided by each group

December 3, 2004ILDG 5 Workshop, Chip Watson7 Option 1: Binary-only Files Implications: –Meta data exists only in the MDC –Users must keep the correspondence between the file copy and the meta data  File naming conventions (Global File Name, GFN) OR  Local database to track correspondence file : GFN

December 3, 2004ILDG 5 Workshop, Chip Watson8 Option 2: NERSC style Meaning: –ASCII header containing essential meta data Implications: –Develop new standard for header –Can include GFN, to allow retrieval of other meta data from MDC

December 3, 2004ILDG 5 Workshop, Chip Watson9 Option 3: Structured File Format Goal: encapsulate, in an extensible way, binary data and meta data within a single file Goal: encapsulate, in an extensible way, binary data and meta data within a single file Good Candidate: LIME / SciDAC-derived format Good Candidate: LIME / SciDAC-derived format –SciDAC software committee considered several possibilities for encapsulation (including tar, cpio) –DIME (Microsoft Direct Internet Message Encapsulation) similar in approach to MIME, used for attachments, was considered a good fit us/dnservice/html/service asp –LIME == LQCD modification of DIME to be a bit simpler, and support 64 bit sizes for records –Software implementation (library) exists

December 3, 2004ILDG 5 Workshop, Chip Watson10 Option 3 (cont): LIME Details: –File has multiple messages, messages have multiple records –Record format:  32 bits: 3 flags, id-length (13), type-format (3) type-length (13)  record id (variable length, round up to 4 byte multiple)  record type (variable length, round up)  data length (64 bit – DIME was 32)  payload (round up) –SciDAC Records contain either XML meta data (string), or binary –Possible records:  ILDG meta data (XML)  binX descriptor for binary layout  Collaboration specific extensions  Binary data (stored using NERSC conventions) –ILDG meta data record options:  Existing configuration schema (subset, non-mutable)  OR, new, simpler (flat) schema

December 3, 2004ILDG 5 Workshop, Chip Watson11 ILDG record idea (minimalist, from Carlton): <ildgFormat> big big </ildgFormat> This is a bit more verbose than the NERSC ASCII header, but is completely extensible (add new fields without breaking old applications), and the string can be parsed by standard XML libraries (which are already planned to be used for ILDG meta data).

December 3, 2004ILDG 5 Workshop, Chip Watson12 Current Status ILDG board mandated a solution to file formats to be found prior to this workshop (missed goal) ILDG board mandated a solution to file formats to be found prior to this workshop (missed goal) There is a wide range of opinions on best path forward (XML, NERSC format, pure binary) There is a wide range of opinions on best path forward (XML, NERSC format, pure binary) There may be a current movement towards accepting XML and LIME There may be a current movement towards accepting XML and LIME

December 3, 2004ILDG 5 Workshop, Chip Watson13 Proposal 1.Current ad-hoc committee to work out implications of adopting LIME (January 2005)  Standardize ILDG record XML schema  Produce doc, simple test codes to show usage 2.Compare to virtual file format, and to pure binary and NERSC-like (pro’s and con’s) and select a path forward (Jan 2005) 3.Refine selected approach, reaching version 1.0 by the end of February 2005  Documentation of schema, code  C library (if appropriate), test codes available for download