Sciamachy features and usage with respect to end-users The typical fate of retrieval people dealing with large datasets… C. Frankenberg, SRON team, IUP Heidelberg team
ADAGUC meeting, KNMI, De Bilt, 03/04 October SCIAMACHY on ENVISAT, a brief introduction SCIAMACHY
ADAGUC meeting, KNMI, De Bilt, 03/04 October © IUP Bremen SCIAMACHY, a grating spectrometer
ADAGUC meeting, KNMI, De Bilt, 03/04 October SCIAMACHY nadir mode, global coverage every 6 days
ADAGUC meeting, KNMI, De Bilt, 03/04 October SCIAMACHY data viewer (1 orbit =300Mb)
ADAGUC meeting, KNMI, De Bilt, 03/04 October Scientific question in my case: Retrieval of CH 4 and CO 2 Spectra vertical column densities of CO 2 and CH 4 xVMR(CH4)
ADAGUC meeting, KNMI, De Bilt, 03/04 October CH 4 VMR August through November 2003 Frankenberg et al., Assessing methane emissions from global space borne observations, Science 2005
ADAGUC meeting, KNMI, De Bilt, 03/04 October Issues related to ADAGUC SCIAMACHY data access, 5Gb/day direct download from the Netherlands SCIAMACHY data center SCIAMACHY data access, 5Gb/day direct download from the Netherlands SCIAMACHY data center Data access, binary PDS file Data access, binary PDS file No library available at that time No library available at that time Official reading tool not useful for nearly operational retrievals Official reading tool not useful for nearly operational retrievals Own C/C++ access routine was written Own C/C++ access routine was written Complex code structure, retrieval and data access are difficult to separate Complex code structure, retrieval and data access are difficult to separate Too instrument specific to be of general interest in ADAGUC
ADAGUC meeting, KNMI, De Bilt, 03/04 October Issues related to ADAGUC General procedure: General procedure: 1) Level 1 PDS File: Geographic entity (usually a 60*120km rectangle) comprises spectra and numerous auxiliary datasets 2) Retrieval via own C++ code, results stored in so called level 2 file 3) Level2 File (own format, so far ASCII) Geographic entity comprises eg CH 4 total column and additional parameters such as cloud cover, albedo, fit error, etc. 4) Generating gridded plots of the level 2 files depending on filter criteria (eg. CloudTopHeight < 1km, fitError < 2%) 5) Compare data (raw and gridded) with other datasets (eg. Model output, retrievals of other groups, other satellite sensors)
ADAGUC meeting, KNMI, De Bilt, 03/04 October What is of general interest? Points 3-5: 3) Output file generation (file format, no standards!) 4) Gridding and plotting data based on predefined selection criteria 5) Comparing datasets Points 3-5: 3) Output file generation (file format, no standards!) 4) Gridding and plotting data based on predefined selection criteria 5) Comparing datasets
ADAGUC meeting, KNMI, De Bilt, 03/04 October Output file generation Why ASCII? Why ASCII? Human readable Human readable Easiest exchange between different groups (preferred format for the comparison between SRON, IUP Bremen, IUP Heidelberg) Easiest exchange between different groups (preferred format for the comparison between SRON, IUP Bremen, IUP Heidelberg) Variety of linux tools available for processing, most notably awk Variety of linux tools available for processing, most notably awk Drawbacks… Drawbacks… Slow access, big files, files not self-describing Slow access, big files, files not self-describing Why didn’t I use HDF/netCDF/GIS format? Why didn’t I use HDF/netCDF/GIS format? Lazy (additional work, new skills necessary) Lazy (additional work, new skills necessary) Awk tools not available Awk tools not available
ADAGUC meeting, KNMI, De Bilt, 03/04 October Gridding, projections, plotting What did I use? What did I use? Admittedly very simple methods, lat/lon box gridding with own routines, IDL plotting/projection routines Admittedly very simple methods, lat/lon box gridding with own routines, IDL plotting/projection routines What would be nice? What would be nice? Better gridding options (eg weighting by the overlapping area) Better gridding options (eg weighting by the overlapping area) Data conversion tools for easier access to tools such as GMT (Generic mapping tool) Data conversion tools for easier access to tools such as GMT (Generic mapping tool)
ADAGUC meeting, KNMI, De Bilt, 03/04 October Comparing datasets a headache a headache Even within SCIA: different pixel sizes comparing different species needs averaging to the lowest resolution, how to do the averaging? Even within SCIA: different pixel sizes comparing different species needs averaging to the lowest resolution, how to do the averaging? Processing a lot of files is slow due to the ASCII format Processing a lot of files is slow due to the ASCII format Data exchange Data exchange In my case only within the atmospheric community, so no direct problems as people were experienced with the formats, ASCII no problem anyway (but slow and large) In my case only within the atmospheric community, so no direct problems as people were experienced with the formats, ASCII no problem anyway (but slow and large) What is needed for the GIS community, level 2 and/or level 3 (gridded) data? What is needed for the GIS community, level 2 and/or level 3 (gridded) data?
ADAGUC meeting, KNMI, De Bilt, 03/04 October What I find ideal… Results stored in a relational database management system (RDBMS) with extracting routines of subsets to HDF, netCDF, ASCII Results stored in a relational database management system (RDBMS) with extracting routines of subsets to HDF, netCDF, ASCII Why? Database systems are meant for large datasets and complex queries to derive subsets Why? Database systems are meant for large datasets and complex queries to derive subsets Simple example in SQL language select avg(CH4) from results where latitude>50 and latitude 0.2 and cloudCover 50 and latitude 0.2 and cloudCover<0.05 FAST due to indexing (tested with a test database with 5 million entries, one query takes no time)! FAST due to indexing (tested with a test database with 5 million entries, one query takes no time)! Selection criteria easy (no awk necessary) Selection criteria easy (no awk necessary)
ADAGUC meeting, KNMI, De Bilt, 03/04 October Even better: Spatial SQL Spatial SQL: Spatial extension of the database systems (eg. Points, polygons, etc) Spatial SQL: Spatial extension of the database systems (eg. Points, polygons, etc) Example syntax (Postgres): Example syntax (Postgres): SELECT ch4_total_column FROM results WHERE distance( center_point, GeomFromText( 'POINT( )', -1 ) ) < 100 Dumpers to eg “shape files” available: Dumpers to eg “shape files” available: pgsql2shp [ ] pgsql2shp [ ] Direct connection to data viewers such as QGIS possible Direct connection to data viewers such as QGIS possible Web interface to the interactive plotting tool mapserver Web interface to the interactive plotting tool mapserver
ADAGUC meeting, KNMI, De Bilt, 03/04 October What takes most of the time? SCIA data format Esp. level2 files for validations are far too complex and frustrate people SCIA data format Esp. level2 files for validations are far too complex and frustrate people Data filtering plotting interpreting change filters and so forth Data filtering plotting interpreting change filters and so forth An interactive data viewer would be great (such as in GIS, click on the point and you get additional information)
Lots of time for discussion Website for spatial RDBMS: