NetCDF An Effective Way to Store and Retrieve Scientific Datasets Jianwei Li 02/11/2002.

Slides:



Advertisements
Similar presentations
Introduction to C Programming
Advertisements

1 Projection Indexes in HDF5 Rishi Rakesh Sinha The HDF Group.
File Systems.
Streaming NetCDF John Caron July What does NetCDF do for you? Data Storage: machine-, OS-, compiler-independent Standard API (Application Programming.
The Future of NetCDF Russ Rew UCAR Unidata Program Center Acknowledgments: John Caron, Ed Hartnett, NASA’s Earth Science Technology Office, National Science.
Chapter 6 Structured Data Types Arrays Records. Copyright © 2007 Addison-Wesley. All rights reserved. 1–2 Definitions data type –collection of data objects.
Chapter 11: File System Implementation
BTrees & Bitmap Indexes
File System Implementation
Variable Length Data and Records Eswara Satya Pavan Rajesh Pinapala CS 257 ID: 221.
ISBN Chapter 6 Data Types: Structured types.
Introduction to NetCDF Ernesto Munoz. Outline Overview of NetCDF Overview of NetCDF NetCDF file information NetCDF file information CDL utilities: ncdump,
Chapter 2: Impact of Machine Architectures What is the Relationship Between Programs, Programming Languages, and Computers.
1 Friday, July 07, 2006 “Vision without action is a daydream, Action without a vision is a nightmare.” - Japanese Proverb.
Chapter 12: File System Implementation
Structured Data Types and Encapsulation Mechanisms to create new data types: –Structured data Homogeneous: arrays, lists, sets, Non-homogeneous: records.
2 Systems Architecture, Fifth Edition Chapter Goals Describe numbering systems and their use in data representation Compare and contrast various data.
Lecture 1: Overview of Java. What is java? Developed by Sun Microsystems (James Gosling) A general-purpose object-oriented language Based on C/C++ Designed.
Status of netCDF-3, netCDF-4, and CF Conventions Russ Rew Community Standards for Unstructured Grids Workshop, Boulder
1 Input/Output. 2 Principles of I/O Hardware Some typical device, network, and data base rates.
1 High level view of HDF5 Data structures and library HDF Summit Boeing Seattle September 19, 2006.
Segmentation & O/S Input/Output Chapter 4 & 5 Tuesday, April 3, 2007.
HDF5 A new file format & software for high performance scientific data management.
Prof. Yousef B. Mahdy , Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
February 2-3, 2006SRB Workshop, San Diego P eter Cao, NCSA Mike Wan, SDSC Sponsored by NLADR, NFS PACI Project in Support of NCSA-SDSC Collaboration Object-level.
Cis303a_chapt03-2a.ppt Range Overflow Fixed length of bits to hold numeric data Can hold a maximum positive number (unsigned) X X X X X X X X X X X X X.
An Introduction to MINC John G. Sled. What is MINC? A medical image file format based on NetCDF A core set tools and libraries for image processing A.
The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1.
Introduction and Features of Java. What is java? Developed by Sun Microsystems (James Gosling) A general-purpose object-oriented language Based on C/C++
The netCDF-4 data model and format Russ Rew, UCAR Unidata NetCDF Workshop 25 October 2012.
Lec 6 Data types. Variable: Its data object that is defined and named by the programmer explicitly in a program. Data Types: It’s a class of Dos together.
Data Structure & File Systems Hun Myoung Park, Ph.D., Public Management and Policy Analysis Program Graduate School of International Relations International.
Operating Systems COMP 4850/CISG 5550 File Systems Files Dr. James Money.
Project 4 : SciDAC All Hands Meeting, September 11-13, 2002 A. Choudhary, W. LiaoW. Gropp, R. Ross, R. Thakur Northwestern UniversityArgonne National Lab.
_______________________________________________________________CMAQ Libraries and Utilities ___________________________________________________Community.
1 HDF5 Life cycle of data Boeing September 19, 2006.
NetCDF Data Model Issues Russ Rew, UCAR Unidata NetCDF 2010 Workshop
ISBN Chapter 6 Data Types Introduction Primitive Data Types User-Defined Ordinal Types.
Core Java Introduction Byju Veedu Ness Technologies httpdownload.oracle.com/javase/tutorial/getStarted/intro/definition.html.
CE Operating Systems Lecture 17 File systems – interface and implementation.
The HDF Group Introduction to netCDF-4 Elena Pourmal The HDF Group 110/17/2015.
NetCDF-4: Software Implementing an Enhanced Data Model for the Geosciences Russ Rew, Ed Hartnett, and John Caron UCAR Unidata Program, Boulder
FITSIO, HDF4, NetCDF, PDB and HDF5 Performance Some Benchmarks Results Elena Pourmal Science Data Processing Workshop February 27, 2002.
The Instruction Set Architecture. Hardware – Software boundary Java Program C Program Ada Program Compiler Instruction Set Architecture Microcode Hardware.
Postgraduate Computing Lectures PAW 1 PAW: Physicist Analysis Workstation What is PAW? –A tool to display and manipulate data. Learning PAW –See ref. in.
Chapter 5 Record Storage and Primary File Organizations
FILE SYSTEM IMPLEMENTATION 1. 2 File-System Structure File structure Logical storage unit Collection of related information File system resides on secondary.
Part III Storage Management
Madhuri Gollu Id: 207. Agenda Agenda  Records with Variable Length Fields  Records with Repeating Fields  Variable Format Records  Records that do.
The HDF Group Introduction to HDF5 Session 7 Datatypes 1 Copyright © 2010 The HDF Group. All Rights Reserved.
Unidata Infrastructure for Data Services Russ Rew GO-ESSP Workshop, LLNL
NetCDF Data Model Details Russ Rew, UCAR Unidata NetCDF 2009 Workshop
CSI 3125, Data Types, page 1 Data types Outline Primitive data types Structured data types Strings Enumerated types Arrays Records Pointers Reading assignment.
Copyright © 2010 The HDF Group. All Rights Reserved1 Data Storage and I/O in HDF5.
Other Projects Relevant (and Not So Relevant) to the SODA Ideal: NetCDF, HDF, OLE/COM/DCOM, OpenDoc, Zope Sheila Denn INLS April 16, 2001.
Chapter 2 Variables and Constants. Objectives Explain the different integer variable types used in C++. Declare, name, and initialize variables. Use character.
Chapter 3 Data Representation
Moving from HDF4 to HDF5/netCDF-4
SRNWP Interoperability Workshop
Chapter 6: Data Types Lectures # 10.
HDF5 Metadata and Page Buffering
File System Implementation
File Sharing Sharing of files on multi-user systems is desirable
Introduction to Python
NetCDF and Scientific Data Standard
Variable Length Data and Records
Introduction to Data Structure
ECE 103 Engineering Programming Chapter 8 Data Types and Constants
Presentation transcript:

netCDF An Effective Way to Store and Retrieve Scientific Datasets Jianwei Li 02/11/2002

Motivation Related scientific datasets in a single file. –Define a file format –Present the multi-dimensional arrays in a self- describing way Common data access through a simple interface –Typical ways of access to multi-dimensional arrays –Platform-independent / Network-transparent Here comes netCDF!

Outline Today Overview of netCDF netCDF File Format netCDF library and Programming Model Advantages & Disadvantages Future Plans for NetCDF

What Is netCDF? Self-describing. Portable. Direct-access. Appendable. Sharable. NetCDF (network Common Data Form) is an interface for array-oriented data access. It defines a machine-independent file format for representing multi-dimensional arrays with ancillary data, and provide support for creation, access, and sharing of array-oriented data. The netCDF software is implemented as I/O libraries for C, FORTRAN, C++, and Perl and other language for which a netCDF library is available.

Why use netCDF? Facilitate the use of common datasets by distinct applications. Permit datasets to be transported between or shared by dissimilar computers transparently, i.e., without translation. Reduce the programming effort usually spent interpreting formats. Reduce errors arising from misinterpreting data and ancillary data. Facilitate using output from one application as input to another. Establish an interface standard which simplifies the inclusion of new software into existing system.

Platforms that netCDF runs on? AIX HPUX-9.05 IRIX, IRIX64 MSDOS (gcc, f2c, GNU make) Linux OSF OpenVMS OS/2 SUNOS ULTRIX UNICOS Windows NT Platforms that range from CRAYs to personal computers and include most UNIX-based workstations.

netCDF dataset Components Dimensions name, length –Fixed dimension –UNLIMITED dimension Variables: named arrays name, type, shape, attributes –Fixed sized variables: array of fixed dimensions –Record variables: array with its most-significant dimension UNLIMITED –Coordinate variables: 1-D array with the same name as its dimension Attributes name, type, values, length –Variable attributes –Global attributes

An Example netCDF File

netCDF File Format Header –Number of record variables –Dimension list –Global attribute list –Variable list Data (row-major) –Fixed-sized data data for each variable is stored contiguously –Record data a variable number of fixed-size records, each of which contains one record for each of the record variables in order. Both use extended XDR

Data access in netCDF dataset access to all elements (whole array); access to individual elements, specified with an index vector; access to array sections, specified with an index vector, and count vector; access to subsampled array sections, specified with an index vector, count vector, and stride vector; access to mapped array sections, specified with an index vector, count vector, stride vector, and an index mapping vector (map the indices of an element in an external subarray to the offset in internal buffer). Direct Access! Efficient for small subsets out of large datasets Forms of access:

Dataset Access Example Start[] = {2, 1}, Count[] = {3, 2}, Stride[] = {3, 2}, Imap[] = {1, 3}

netCDF Library & Programming Model Modes: definition mode, data mode IDs: dataset ID, dimension ID, variable ID, attribute number Create & WriteRead by name Read sequencially Add dim,var,att

Example library functions

netCDF data types and type conversion Different library functions for each data type Automatic conversion to or from external type Conversion from one numeric type to another netCDF Data TypeC API MnemonicDiscription byteNC_BYTE 8-bit characters for text data. charNC_CHAR 8-bit signed or unsigned integers short NC_SHORT16-bit signed integers int NC_INT32-bit signed integers float NC_FLOAT32-bit IEEE floating-point double NC_DOUBLE64-bit IEEE floating-point

Other Data Types? Data Structure –No direct support –Can build some data structure by grouping arrays or by using data in one array as pointers into another array. Inefficient, and may not be self-describing! Character String –not a primitive netCDF external data type –Charater-string attributes: well represented by NC_CHAR values and its length. –Charater-string variables: use a character-position dimension as the most quickly varying dimension for the variable (dim-length = the max string length)

Share Data Between Read and Write Support for multiple read and single write Use NC_SHARE mode to create or open dataset to use unbuffered I/O Or call nc_sync() to force writing changes back to disk Header changes in define mode are synchronized to disk only when nc_enddef() is called, reader must call nc_sync() to bring header up-to-date in its memory

Advantages Simple and useful libraries for array- oriented data access The dataset is self-describing Portable and easy to share Flexible and efficient data access Mechanism to support extendable arrays

Disadvantages No support for multiple concurrent write Limited number of external numeric data types: 8-, 16-, 32-bit integers, or 32- or 64-bit floating-point numbers. No support for nested data structures such as trees, nested arrays, or other recursive structures Only one unlimited dimension. Multiple variables sharing an unlimited dimension, must all grow together. Redefine(structure changes) that requires more file space is accomplished by copying the dataset.

Future plans for netCDF Parallel write support Support for bit data type and transparent data packing for bit variables Support for efficient structure changes Support for nested arrays and other data structure