BARRODALE COMPUTING SERVICES LTD. www.barrodale.com Managing and serving large volumes of gridded spatial environmental data Adit Santokhee, Chunlei Liu,

Slides:



Advertisements
Similar presentations
Netlobs Manipulating Gridded Data in a Relational World Neil Stamps Technical Architect.
Advertisements

BlogMyData A Virtual Research Environment for collaborative visualization of environmental data Andrew Milsted, Jeremy Frey Department of Chemistry, University.
® OGC Web Services Initiative, Phase 9 (OWS-9): Innovations Thread - OPeNDAP James Gallagher and Nathan Potter, OPeNDAP © 2012 Open Geospatial Consortium.
GI Systems and Science January 30, Points to Cover  Recap of what we covered so far  A concept of database Database Management System (DBMS) 
T HE W EB - BASED I NTERFACE TO C ENSUS I NTERACTION D ATA - WICID Presentation to the ESRC Research Methods Festival Adam Dennett Centre for Interaction.
ArcGIS Geodatabase Miles Logsdon Spatial Information Technologies, UW Garry Trudeau - Doonesbury.
Raster Data in ArcSDE 8.2 Why Put Images in a Database? What are Basic Raster Concepts? How Raster data stored in Database?
Dynamic Quick View, interoperability and the future Jon Blower, Keith Haines, Chunlei Liu, Alastair Gemmell Environmental Systems Science Centre University.
Exploring large marine datasets using an interactive website and Google Earth Jon Blower, Dan Bretherton, Keith Haines, Chunlei Liu, Adit Santokhee Reading.
The NERC Cluster Grid Dan Bretherton, Jon Blower and Keith Haines Reading e-Science Centre Environmental Systems Science Centre.
Printed by STORING AND MANIPULATING GRIDDED DATA IN SPATIALLY-ENABLED DATABASES Adit Santokhee, Jon Blower, Keith Haines Reading.
The International Surface Pressure Databank (ISPD) and Twentieth Century Reanalysis at NCAR Thomas Cram - NCAR, Boulder, CO Gilbert Compo & Chesley McColl.
Активное распределенное хранилище для многомерных массивов Дмитрий Медведев ИКИ РАН.
TPAC Digital Library Talk Overview Presenter:Glenn Hyland Tasmanian Partnership for Advanced Computing & Australian Antarctic Division Outline: TPAC Overview.
Unidata TDS Workshop THREDDS Data Server Overview October 2014.
The use of standard OGC web services in integrating distributed model, satellite and in-situ datasets Alastair Gemmell Jon Blower Keith Haines Environmental.
EGU 2011 TIGGE, TIGGE LAM and the GIFS T. Paccagnella (1), D. Richardson (2), D. Schuster(3), R. Swinbank (4), Z. Toth (3), S.
The importance of locality in the visualization of large datasets John Brooke 1, James Marsh 1, Steve Pettipher 1, Lakshmi Sastry 2 1 The University of.
© University of Reading 2008www.reading.ac.uk Reading e-Science Centre September 10, 2015 Integrating a Web Map Service into the THREDDS Data Server Jon.
GADS: A Web Service for accessing large environmental data sets Jon Blower, Keith Haines, Adit Santokhee Reading e-Science Centre University of Reading.
DISTRIBUTED DATA FLOW WEB-SERVICES FOR ACCESSING AND PROCESSING OF BIG DATA SETS IN EARTH SCIENCES A.A. Poyda 1, M.N. Zhizhin 1, D.P. Medvedev 2, D.Y.
Running Climate Models On The NERC Cluster Grid Using G-Rex Dan Bretherton, Jon Blower and Keith Haines Reading e-Science Centre Environmental.
Unidata’s TDS Workshop TDS Overview – Part II October 2012.
Rendering Adaptive Resolution Data Models Daniel Bolan Abstract For the past several years, a model for large datasets has been developed and extended.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Unidata TDS Workshop TDS Overview – Part I XX-XX October 2014.
Spatiotemporal Tile Indexing Scheme Oscar Pérez Cruz Polytechnic University of Puerto Rico Mentor: Dr. Ranga Raju Vatsavai Computational Sciences and Engineering.
1 The NERC DataGrid DataGrid The NERC DataGrid DataGrid AHM 2003 – 2 Sept, 2003 e-Science Centre Metadata of the NERC DataGrid Kevin O’Neill CCLRC e-Science.
1 Foundations V: Infrastructure and Architecture, Middleware Deborah McGuinness TA Weijing Chen Semantic eScience Week 10, November 7, 2011.
Animation of DSM2 Outputs in ArcMap Siqing Liu Bay Delta Office Department of Water Resources 2/17/2015.
Slide 1 TIGGE phase1: Experience with exchanging large amount of NWP data in near real-time Baudouin Raoult Data and Services Section ECMWF.
DELIVERING ENVIRONMENTAL WEB SERVICES (DEWS) Partners: UK Met Office (Lead Partner), British Atmospheric Data Centre (BADC), British Maritime Technology.
Accomplishments and Remaining Challenges: THREDDS Data Server and Common Data Model Ethan Davis Unidata Policy Committee Meeting May 2011.
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
Integrated Grid workflow for mesoscale weather modeling and visualization Zhizhin, M., A. Polyakov, D. Medvedev, A. Poyda, S. Berezin Space Research Institute.
BARRODALE COMPUTING SERVICES LTD. Spatial Data Activities at the Reading e-Science Centre Adit Santokhee, Jon Blower, Keith Haines Reading.
DAP4 James Gallagher & Ethan Davis OPeNDAP and Unidata.
Unidata TDS Workshop THREDDS Data Server Overview
, Key Components of a Successful Earth Science Subsetter Architecture ASDC Introduction The Atmospheric Science Data Center (ASDC) at NASA Langley Research.
Composing workflows in the environmental sciences using Web Services and Inferno Jon Blower, Adit Santokhee, Keith Haines Reading e-Science Centre Roger.
Requests from F. Blanc Retrieve information from each individual national reports and present this. –2. Data serving –The HYCOM products are freely available.
Data Discovery and Access to The International Surface Pressure Databank (ISPD) 1 Thomas Cram Gilbert P. Compo* Doug Schuster Chesley McColl* Steven Worley.
Sciamachy features and usage with respect to end-users The typical fate of retrieval people dealing with large datasets… C. Frankenberg, SRON team, IUP.
The CERA2 Data Base Data input – Data output Hans Luthardt Model & Data/MPI-M, Hamburg Services and Facilities of DKRZ and Model & Data Hamburg,
NetCDF file generated from ASDC CERES SSF Subsetter ATMOSPHERIC SCIENCE DATA CENTER Conversion of Archived HDF Satellite Level 2 Swath Data Products to.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
CCGrid, 2012 Supporting User Defined Subsetting and Aggregation over Parallel NetCDF Datasets Yu Su and Gagan Agrawal Department of Computer Science and.
John Pickford IBM H11 Wednesday, October 4, :30. – 14:30. Platform: Informix Practical Applications of IDS Extensibility (Part 2 of 2)
GIS in the cloud: implementing a Web Map Service on Google App Engine Jon Blower Reading e-Science Centre University of Reading United Kingdom
00/XXXX 1 Data Processing in PRISM Introduction. COCO (CDMS Overloaded for CF Objects) What is it. Why is COCO written in Python. Implementation Data Operations.
SCD Research Data Archives; Availability Through the CDP About 500 distinct datasets, 12 TB Diverse in type, size, and format Serving 900 different investigators.
Using Google Maps and other OpenSource GIS software for displaying geospatial data Jon Blower, Dan Bretherton, Keith Haines, Chunlei Liu, Adit Santokhee.
ERDDAP The Next Generation of Data Servers Bob Simons DOC / NOAA / NMFS / SWFSC / ERD Monterey, CA Disclaimer: The opinions expressed.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
Recent Enhancements to Quality Assurance and Case Management within the Emissions Modeling Framework Alison Eyth, R. Partheepan, Q. He Carolina Environmental.
UC 2006 Tech Session 1 NetCDF in ArcGIS 9.2. UC 2006 Tech Session2 Overview Introduction to Multidimensional DataIntroduction to Multidimensional Data.
British Atmospheric Data Centre ( Searching: Whither NDG? Bryan Lawrence.
Grid Remote Execution of Large Climate Models (NERC Cluster Grid) Dan Bretherton, Jon Blower and Keith Haines Reading e-Science Centre
Data Discovery and Access to The International Surface Pressure Databank (ISPD) 1 Thomas Cram Gilbert P. Compo* Doug Schuster Chesley McColl* Steven Worley.
NcBrowse: A Graphical netCDF File Browser Donald Denbo NOAA-PMEL/UW-JISAO
Advances with the DDS David J. S. Poulter, British Oceanographic Data Centre, National Oceanography Centre, UK
1. Gridded Data Sub-setting Services through the RDA at NCAR Doug Schuster, Steve Worley, Bob Dattore, Dave Stepaniak.
Reading e-Science Centre Technical Director Jon Blower ESSC Director Rachel Harrison CS Director Keith Haines ESSC Associated Personnel External Collaborations.
DELIVERING ENVIRONMENTAL WEB SERVICES (DEWS)
Data Browsing/Mining/Metadata
Spatial Data Activities at the Reading e-Science Centre
MERRA Data Access and Services
TIGGE Archives and Access
Comparing NetCDF and a multidimensional array database on managing and querying large hydrologic datasets: a case study of SciDB– P5 Haicheng Liu.
ECMWF usage, governance and perspectives
Presentation transcript:

BARRODALE COMPUTING SERVICES LTD. Managing and serving large volumes of gridded spatial environmental data Adit Santokhee, Chunlei Liu, Jon Blower, Keith Haines Reading e-Science Centre University of Reading, United Kingdom Ian Barrodale, E. Davies Barrodale Computing Services Ltd., Victoria, Canada

BARRODALE COMPUTING SERVICES LTD. Background  At Reading we hold copies of various datasets (~2TB) –Mainly from models of oceans and atmosphere –Four-dimensional, spatio-temporal data –Also some observational data (e.g. satellite data) –From UK Met Office, National Oceanography Centre, European Centre for Medium Range Weather Forecasts (ECMWF), more  Most of these datasets are in the form of files –Variety of formats (e.g. NetCDF, GRIB, HDF5) –Large 4D spatio-temporal grids –Contain data about many variables (e.g. temperature, salinity etc) –Data are discretized on a number of different grids (e.g. standard lat-lon grids of different resolutions, rotated grids)

BARRODALE COMPUTING SERVICES LTD. Background (2)  We required a Web Service interface to this data store –Hence development of GADS (Grid Access Data Service) –Developed as part of GODIVA project (Grid for Ocean Diagnostics, Interactive Visualisation and Analysis NERC e-Science pilot project) –Originally developed by Woolf et al (2003)  GADS similar to OPeNDAP in principle, but uses SOAP messaging instead of URL-based queries –Hides more details of storage than OPeNDAP  Modern database systems now include capability for storing geospatial data –ReSC has been evaluating Informix with Grid DataBlade –Our data are spatio-temporal, i.e. four dimensional! (x, y, z, t) –A lot of geospatial systems are “two and a half”-dimensional.  We are investigating whether data can be managed and served more efficiently by being stored in DB

BARRODALE COMPUTING SERVICES LTD. GADS  User’s don’t need to know anything about storage details  Can expose data with conventional names without changing data files  Users can choose their preferred data format, irrespective of how data are stored  Behaves as aggregation server –Delivers single file, even if original data spanned several files  Deployed as a Web Service –Can be called from any platform/language –Can be called programmatically (easily incorporated into larger systems), workflows –Java / Apache Axis / Tomcat  Doesn’t support “advanced” features like interpolation, re-projection, etc  Will be re-engineered as WCS-compliant in near future

BARRODALE COMPUTING SERVICES LTD. GODIVA Web Portal ( Allows users to interactively select data for download using a GUI Users can create movies on the fly cf. Live Access Server

BARRODALE COMPUTING SERVICES LTD. The Grid DataBlade  Plug-in for the IBM Informix database (also version for PostgreSQL)  Written, supplied and supported by Barrodale Computing Services  Stores gridded data and metadata in an object-relational DBMS –Any data on a regular grid, not just met-ocean –Stores grids using a tiling scheme in conjunction with Smart BLOBS –Manages own (low-level) metadata automatically  Provides functions to load data directly from GIEF (Grid Import Export Format) file format (netCDF) –Metadata is automatically read from the GIEF file  Provides functions to extract data from the grids –Extractions can be sliced, subsetted or at oblique angles to the original axes

BARRODALE COMPUTING SERVICES LTD. Main features of the Grid DataBlade  Can store: –1D: timeseries, vectors –2D: raster images, arrays –3D: spatial volumes, images at different times –4D: volumes at different times –5D: 4D grids with a set of variables at each 4D point  Provides interpolation options using N-Linear, nearest- neighbour or user-supplied interpolation schemes  Provides C, Java and SQL APIs

BARRODALE COMPUTING SERVICES LTD. Example use of Grid DataBlade  extract data along a path e.g. along a ship track that involves many “legs”. The DataBlade automatically does interpolation along the path. Using North Atlantic FOAM Data

BARRODALE COMPUTING SERVICES LTD. Testing Methodology  A key criterion for evaluating a data server is the time required to extract a certain volume of data from an archive and re-package it as a new file, ready for download  We performed test extractions of data from the UK Met Office operational North Atlantic marine forecast dataset, which has a total size of 100 GB  The data are stored under GADS as a set of NetCDF files and another copy is held in the Informix database –There is one NetCDF file per time step, i.e. one per model day, in GADS –Data ordering is (time, depth, latitude, longitude)  We tested many parameters that control the data extraction time, including: –size of the extracted data in MB –number of source files used in the extraction –shape of the extracted data volume

BARRODALE COMPUTING SERVICES LTD. Testing: notes  GADS is just an interface: we are really comparing the DataBlade with the latest version of the Java NetCDF API  We also tested our OPeNDAP aggregation server –Much slower than both GADS and DataBlade –Probably because it was based on earlier NetCDF API –Would expect performance to be close to that of GADS with same API. –Detailed results not reported here  Reported times include time to parse query, search metadata, extract data and produce data product

BARRODALE COMPUTING SERVICES LTD. Preliminary test results Shape of extracted data : 50 * increasing depth * 50 * 50 DataBlade is faster than GADS for small data extraction

BARRODALE COMPUTING SERVICES LTD. Preliminary test results Shape of extracted data : 50 * increasing depth * 200 * 200 DataBlade’s performance declines sharply around 40 MB point

BARRODALE COMPUTING SERVICES LTD. Preliminary test results GADS : Max 360 FILES DataBlade : Max 12 SBLOBs (1 blob = 30 Files) Shape of extracted data : fixed depth level but different lat-lon space DataBlade’s performance declines around 40 MB point (120 files)

BARRODALE COMPUTING SERVICES LTD. Preliminary test results Shape for single BLOB extract: time * 1 * 128 * 200 Shape for 10 BLOBS extract : 10 * 1 * 128 * 200 (per BLOB)

BARRODALE COMPUTING SERVICES LTD. Preliminary test results Shape for single File extract : 1*depth*400 *400 Shape for 100 Files extract : 100*1*lat*lon

BARRODALE COMPUTING SERVICES LTD. Preliminary test results Shape for 360 Files : 360 * 1 * lat * lon Shape for 30 Files : 30 *10 * lat * lon

BARRODALE COMPUTING SERVICES LTD. Conclusions (1)  We found that in general, for extracted data volumes below 10MB, the database outperformed GADS  Above 40MB, GADS was generally found to be capable of the fastest extractions  The performance of the DataBlade decreased dramatically when attempting to extract more than 100MB of data in a single query  With DataBlade, extracting large amounts of data from a single blob is also faster than extracting similar sized data from multiple blobs but the difference is small for small areas –Similarly with GADS the extraction time increases with the number of source files involved

BARRODALE COMPUTING SERVICES LTD. Conclusions (2)  The Grid DataBlade is optimised to support its entire feature set; in particular, it is optimised to retrieve relatively small (a few tens of megabytes) of data rapidly in the case where multiple users are querying the database simultaneously –Multiple simultaneous connections not yet tested  Data archives (where users typically download large amounts of data at a time) most efficiently implemented using something like GADS  Data browse tools (where one needs to quickly generate pictures of small subsets of data in different projections) best implemented using something like the DataBlade

BARRODALE COMPUTING SERVICES LTD. Future Work  GADS will be re-engineered as OGC-compliant Web Coverage Server  The DataBlade does not support spherical rotations of the latitude-longitude geographic coordinate system (i.e., shifts of poles). –A lot of the model data we are dealing with are actually stored as rotated grids with the North Pole shifted  There is currently no (efficient) way to fetch portions of a grid based on its contents –for example, “in what locations did the temperature exceed 25 Celsius during the summer of 2003?”  It is also not currently possible to create virtual datasets –Would like to dynamically create a density field from temperature and salinity  Evaluation of Grid DataBlade for PostgreSQL DBMS  Feeding results back to NetCDF engineers