Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web Extract Transform Load Charts & Maps Tools and websites Provide Add meta information Script to convert raw data into netcdf OpenEarth RawData OpenEarth OPeNDAP OpenEarth Tools
Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web Extract Transform Load Charts & Maps Tools and websites Provide Add meta information Script to convert raw data into netcdf OpenEarth RawData OpenEarth OPeNDAP OpenEarth Tools
Transform Add metadata Store in netcdf Save script in subversion
Add metadata Use the inspire meta data form to store information about the dataset. Click launch editor Transform
Turn validation on Transform – add metadata validation
Location in subversion micore File identification Transform – add metadata
History of your data. Transform – add metadata quality
Please fill in limitations of use. Transform – add metadata constraints
Store in course/Pcnumber/inspire_description.xml Transform – add metadata Save metadata file 1.Save metadata file (local) 2.Add to subversion (local) 3.Commit => metadata into subversion (remote)
Transform Add metadata Store in netcdf Save script in subversion
Store in netcdf What’s netcdf? Write a script to transform data into netcdf Using CF convention Transform
What is netcdf Data format defined by unidata Data store used for coverage data and multidimensional data CF Metadata convention Transform – store in netcdf - netcdf
What is netcdf X Z T Y An array based data structure for storing multidimensional data N-dimensional coordinates systems X coordinate (e.g. longitude) Y coordinate (e.g. latitude) Z coordinate (e.g. altitude) Time dimension … other dimensions Variables – support for multiple variables Temperature, humidity, pressure, salinity, etc Geometry – implicit or explicit Regular grid (implicit) Irregular grid Points TransformTransform – store in netcdf - netcdf
Storing Multidimensional Data XYZQ X YZ 32 numbers 14 numbers Transform – store in netcdf - netcdf
Data Model Data model for netcdf and others. Also usable for hdf, opendap, grib, etc. See the java library for details Data model for netcdf and others. Also usable for hdf, opendap, grib, etc. See the java library for details Transform – store in netcdf - netcdf
ArcGis ArcGis also reads and writes netcdf files. Transform – store in netcdf – netcdf - applications
Your favorite text editor xml representation of a netcdf file Transform – store in netcdf - netcdf
Other Tools NCO #diff ncdiff -v time file1.nc file2.nc #compression & packingncpdq -4 -L 9 in.nc out.nc # Deflated packing (~80% lossy compression) #selecting variables by regex ncks -v '^Q..' in.nc # Q01--Q99, QAA--QZZ, etc. IDV Very useful Web hyperslabs, cool! Not so stable. Transform – store in netcdf - netcdf
Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web Extract Transform Load Charts & Maps Tools and websites Provide Add meta information Script to convert raw data into netcdf OpenEarth RawData OpenEarth OPeNDAP OpenEarth Tools
Store in netcdf What’s netcdf? Write a script to transform data into netcdf Using CF convention Transform – store in netcdf - script
Write script Read raw data Read header line Read data Read all data Create function to read all data Use function in Matlab Raw data into empty netcdf file Create empty netcdf file Add dimensions and variables Store variables Read values Transform – store in netcdf - script
Reading raw data into memory Use one of the following matlab functions to read the file data into an array fscanf Transform – store in netcdf - script
Example: Transect.txt file … Header line Year number of points Points X Z X Z … Location: OpenEarthRawData\course\example\raw Transform – store in netcdf - script
Read header line >> fid = fopen('..\raw\transect.txt') fid = 15 >> header = fscanf(fid, '%d', 2) header = >> year = header(1) year = 2000 >> npoint = header(2) npoint = 58 Transform – store in netcdf - script
% read header header = fscanf(fid, '%d', 2); year = header(1); % store year in time time(i) = year; npoint = header(2); % read data data = fscanf(fid, '%d', npoint*2); data = reshape(data, [2, npoint]); % use column vectors data = data'; Read data >> % read data data = fscanf(fid, '%d', npoint*2) data = >> data = reshape(data, [2, npoint]) data = Columns 1 through >> % use column vectors data = data' data = Transform – store in netcdf - script
Read all data % preallocate all data % (time, coastward) transectseries = NaN(3, 58); coastward_distance = NaN(58, 1); time = NaN(3, 1); % open file and get file id fid = fopen('..\raw\transect.txt'); i = 1; while (~feof(fid)) % read header header = fscanf(fid, '%d', 2); year = header(1); % store year in time time(i) = year; npoint = header(2); % read data data = fscanf(fid, '%d', npoint*2); data = reshape(data, [2, npoint]); % use column vectors data = data' % store data in transect series transectseries(i,:) = data(:,2); coastward_distance(:) = data(:,1); fgetl(fid); i = i + 1; end Transform – store in netcdf - script
Create a function function transect = readtransect(filename) % preallocate all data % (time, coastward) transectseries = NaN(3, 58); coastward_distance = NaN(58, 1); time = NaN(3, 1); % open file and get file id fid = fopen(filename); i = 1; while (~feof(fid)) % read header header = fscanf(fid, '%d', 2); year = header(1); % store year in time time(i) = year; npoint = header(2); % read data data = fscanf(fid, '%d', npoint*2); data = reshape(data, [2, npoint]); % use column vectors data = data'; % store data in transect series transectseries(i,:) = data(:,2); coastward_distance(:) = data(:,1); fgetl(fid); i = i + 1; end transect = struct('series', transectseries, … 'distance', coastward_distance, 'time', time); end Transform – store in netcdf - script
Use the new function >> data = readtransect('..\raw\transect.txt') data = series: [3x58 double] distance: [58x1 double] time: [3x1 double] Transform – store in netcdf - script
Loading data into netcdf What does a netcdf file look like Required meta information Transform – store in netcdf - script
Netcdf file transect.nc netcdf transect { dimensions: coastward = 58 ; time = 3 ; variables: float coastward_distance(coastward) ; coastward_distance:unit = "metre" ; float year(time) ; year:unit = "year" ; float height(time, coastward) ; height:unit = "metre" ; data: coastward_distance = -135, -130,…, 150, 160, 170, 180, 190, 200, 210, 220 ; year = 1999, 2000, 2001 ; height = 353, 354, … -142, -146, -170, -206, -232, -273, -309, -346, -375, -388, … -32, … -92, -110, -127, -143, -156, -177, -211, -259, -303, -334 ; } Transform – store in netcdf - script
Create an empty netcdf file >> nc_create_empty(outputfile) >> nc_dump(outputfile) netcdf transect.nc { dimensions: variables: } Transform – store in netcdf - script
Add dimensions nc_add_dimension(outputfile, 'crossshore', 58) nc_add_dimension(outputfile, 'time', 3) nc_dump(outputfile) >> netcdf transect.nc { dimensions: coastward = 58 ; time = 3 ; variables: } help nc_add_dimension Transform – store in netcdf - script
Add variables crossshoreVariable = struct(... 'Name', 'crossshore_distance',... 'Nctype', 'float',... 'Dimension', {{‘crossshore'}},... 'Attribute', struct('Name', 'unit', 'Value', 'metre')... ); nc_addvar(outputfile, crossshoreVariable); timeVariable = struct(... 'Name', 'year',... 'Nctype', 'float',... 'Dimension', {{'time'}},... 'Attribute', struct('Name', 'unit', 'Value', 'year')... ); nc_addvar(outputfile, timeVariable); heightVariable = struct(... 'Name', 'height',... 'Nctype', 'float',... 'Dimension', {{'time', ‘crossshore'}},... 'Attribute', struct('Name', 'unit', 'Value', 'metre')... ); nc_addvar(outputfile, heightVariable); nc_dump(outputfile) help nc_addvar Transform – store in netcdf - script
Result netcdf transect.nc { dimensions: coastward = 58 ; time = 3 ; variables: float coastward_distance(coastward), shape = [58] coastward_distance:unit = "metre" float year(time), shape = [3] year:unit = "year" float height(time,coastward), shape = [3 58] height:unit = "metre" } Transform – store in netcdf - script
Store variables nc_varput(outputfile, 'height', data.series) nc_varput(outputfile, 'year', data.time) nc_varput(outputfile, 'coastward_distance', data.distance) help nc_varput Transform – store in netcdf - script
Result: Netcdf file transect.nc netcdf transect { dimensions: coastward = 58 ; time = 3 ; variables: float coastward_distance(coastward) ; coastward_distance:unit = "metre" ; float year(time) ; year:unit = "year" ; float height(time, coastward) ; height:unit = "metre" ; data: coastward_distance = -135, -130,…, 150, 160, 170, 180, 190, 200, 210, 220 ; year = 1999, 2000, 2001 ; height = 353, 354, … -142, -146, -170, -206, -232, -273, -309, -346, -375, -388, … -32, … -92, -110, -127, -143, -156, -177, -211, -259, -303, -334 ; } Transform – store in netcdf - script
Read values surface(nc_varget(outputfile, 'height')') Transform – store in netcdf - script
Store in netcdf What’s netcdf? Write a script to transform data into netcdf Using CF convention Transform – store in netcdf - convention
CF convention Standard used by USGS, NOAA, Arcgis, GDAL Climate and Forecast (CF) Convention Initially developed for Climate and forecast data Atmosphere, surface and ocean model-generated data Also used for observational datasets CF is the most widely used convention for geospatial netCDF data. Transform – store in netcdf - convention
Improve output Store extra attributes Title Author Standard_name Transform – store in netcdf - convention
Transform Add metadata Store in netcdf Save script in subversion
Transform – save script Save script 1.Save script (local, using matlab 2.Add to subversion (local) 3.Commit => script into subversion (remote)