Presentation is loading. Please wait.

Presentation is loading. Please wait.

Australian Geoscience Data Cube

Similar presentations


Presentation on theme: "Australian Geoscience Data Cube"— Presentation transcript:

1 Australian Geoscience Data Cube
A Collaboration between Geoscience Australia, CSIRO and NCI CEOS WGISS 40 Simon Oliver – Geoscience Australia Robert Woodcock - CSIRO The Australian geoscience data cube is an innovative project which is transforming the way we analyse large gridded datasets such as earth observation data. It is allowing us to harness the power of high performance computation though the creation of high performance data which in turn allows us to deliver useful information into the hands of decision makers and the public. The Australian Geoscience Data Cube (AGDC) is being developed as a partnership between Geoscience Australia (GA), Australia’s National Computational Infrastructure (NCI), and the Commonwealth Science and Industrial Research Organisation (CSIRO) with the main aim being to support the management and quantitative analysis of massive volumes of Earth observation (EO) and other geoscientific data. CEOS WGISS40

2 Overview Brief Review – Australian Geoscience Data Cube and CEOS
Update on progress: Open Source collaboration and wiki Version 1 API Version 2 Roadmap multidimensional storage units ingest support for multiple sensors PC, Cloud, & HPC deployment Analysis and production pipeline Provenance The Future: Data Cubes and WGISS Analysis Ready Data as an input to AGDC Discrete Global Grid Systems - OGC SWG Current Status and implications for AGDC evolution Discussion: Sentinel-2 / SPOT-5/6 prototype processing hub to ARD?...link to CWIC system? CEOS WGISS40

3 Data-Intensive Quantitative Science
The Australian Geoscience Data Cube (AGDC) : Supports management and quantitative analysis of massive volumes of Earth observation (EO) and other geoscientific data. Bring users to the data Pixels as observations The AGDC to as a sensor-independent system for management, analysis and sharing of EO data CEOS WGISS40

4 AGDC Overview A series of data structures and tools to enable efficient analysis of large earth observation archives in HPC environments Simple Data Structures Spatially regular tiles Managed by a relational database Calibrated and Standardised Unique Observations Surface Reflectance Observations Quality Assured Observations Flagged for cloud, cloud shadow, saturation and other quality indicators Open source software Analysis Ready Data CEOS WGISS40

5 Landsat processing pipeline
Analysis Ready Data preparation DataCube Solid science Taking the data to comparable quality assured measurements Adapting software systems for embarrassingly (massively) parallel processing, enabling quality assurance, building code workflows that allow processes to be iterated and improved / experiments – this is where workflows are important (I will discuss in a minute how we are tackling this)

6 Traditional remote sensing product process
Working as individuals, gathering Vector data, gathering EO, everyone has to do most everything. The traditional remote sensing product process, where the data selection, retrieval, process and derived product creation only begins when a request is made, is not able to meet increasing demand or provide the full value of earth observation data. (click for graphic change) The data cube addresses the first 80% of the traditional remote sensing process, this will make the data much more useful and available for rapidly producing new information products. If we can automate the data preparation step it frees a large number of downstream researchers to focus on algorithm development rather than the organisation and preparation of data for analysis. In this new approach all of the work that was duplicated, inefficient, ad-hoc, is made more routine operational and the data is ready-made, online as infrastructure, . The Data Cube paradigm essentially transitions us to a situation where fundamental data about our Earth is ‘already there’ as infrastructure, in the same way as electricity and running water are. A side effect of this approach for CSIRO is the elevation of the role of data preparation, calibration and validation and the mitigation of risk in Flagships. A system wide view is seen in which the work of a few key groups has wide impact on a large number of Flagships. At the moment there is considerable risk in that some Flagships are dependent on per project data preparation and cal/val activities in other Flagships where an independent choice could cut off data supply. Australian Geoscience Data Cube CEOS WGISS40

7 New Data Cube remote sensing paradigm
a common analytical framework for High Performance EO data processing simple data access and analysis robust processes quality assured unique observations Process once- use many times The Australian Geoscience Data Cube (AGDC) is a common analytical framework composed of a series of data structures and tools which facilitate the organisation and analysis of large gridded data collections. The standardised data infrastructure of the data cube removes the need for difficult and time-consuming pre-processing of the data for individual applications. Effort can be directed more productively toward developing more and better information products with increased value for the public. The Data Cube makes comprehensive information about our Earth available as information infrastructure, supporting the digital economy, enabling downstream users, such as industry, to leverage it to create economic activity and employment. The first data stream to be transformed into the data cube structure is the Australasian Landsat archive. Australia has collected Landsat data using its ground station at Alice Springs since 1979, however before the data cube only a few images at a time could be retrieved from the archive and processed to make useful information. Now all Landsat data from 1984 (Landsat 5) onwards is immediately available as surface reflectance. The next data collection to be included was MODIS ( in and ongoing) The success of the AGDC is due to simple data structures, robust processes, and calibrated, standardised and quality assured unique observations which together create High Performance Data. CEOS WGISS40

8 Update on Progress Update on progress:
Open Source collaboration and wiki Version 1 API Version 2 Roadmap CEOS WGISS40

9 Current Status Partners: GA, CSIRO and the NCI
International collaborators are increasingly involved including the USGS, NASA and CEOS. In the midst of moving to an updated version reflecting recent advances in technology Supporting the establishment of other international data cubes, initially with Kenya and Colombia AGDC is currently supporting a range of remote sensing applications across the water, vegetation and mineral domains Providing valuable information for environmental monitoring and modelling across all Australian jurisdictions CEOS WGISS40

10 WOfS Summary Product Example
Sum the derived temporal water stack: number of water observations per pixel Sum the derived “real” observations for every pixel from the Pixel Quality Produce the ratio as a percentage for display WOfS WMS Menindee Lakes as shown in WOfS, with associated legend CEOS WGISS40

11 Using tidal models to map tidal extents
Tidal Range of >10m Tidal Zone Extent Can be attributed with offsets of LAT to lowest observed tide and HAT to highest observed Tidal Zone Morphology Fraction of water observations over the time series. Can we attribute this with depths? CEOS WGISS40

12 CSIRO Examples Vicarious calibration sites
Identify climatic zones, spatial and temporal variation, and seasonal suitability for calibration activities Landsat MODIS blending Blend Landsat and MODIS scenes to produce Landsat-like data (25m resolution) with MODIS repeat cycle (~ every 4-days) Geoglam Rangelands Remote sensing derived information on rangeland and pasture cover and plant available water content for the globe CEOS WGISS40

13 Open Data and Code AGDC Web: http://www.datacube.org.au
AGDC Wiki: Code repositories are available through GitHub: Data is also available as individual files on the NCI THREDDS catalogue: CEOS WGISS40

14 AGDC v1 HPC Deployment Continental workflow support
Command line and python interface CEOS WGISS40

15 AGDC v1 - High Performance Computing
National Computational Infrastructure 57,472 cores (2.6 GHz) in 3592 compute nodes; 160 TBytes (approx.) of main memory; 10 PBytes (approx.) of usable fast file system (for short-term scratch space). And other CSIRO systems via managed replicas The Australian Government recognised the need to invest in High Performance Computing research infrastructure. As a result the Australian Geoscience Data Cube resides on the country’s largest supercomputer, the National Computational Infrastructure at ANU. Without access to this sort of computing power and storage none of what we are now doing with the AGDC would be possible. The NCI is accessible to researchers across Australian government and academia. This means it is not just a big computer, but a place to do big collaborative science and is producing an increase in derived information products available to Australian government, industry and the public. CEOS WGISS40

16 AGDC v1 API Applications
Bare Soil Landsat Clean Pixel Landsat Median Mosaic Wetness in the landscape – Tasseled Cap Wetness Index Big Data for Environmental Monitoring Command line for non-python users

17 Earth Observation Informatics Platforms
The growth issues will require fundamental changes to how EO research is done, and create massive opportunity Community Supply Distribution Coordination National International Quality Calibration Validation Provenance Versioning Platforms Data storage Data management Analysis Operations PC Cloud HPC Components of EOI Platforms Highlight key risks and mitigating changes in each area + Increase in Volume means EO data will likely be ++ single Supply to Oz, to a distribution node – not to each researcher (practical outcome of network limits at the suppliers!) ++ Large scale analysis will need to be near large compute – to difficult to move ++ Managed replication of subsets will be required to distribute “small scale” analysis and ensure most up to date data + Increase in Velocity means: ++ near real-time process – L0/1B data preparation through to L2 and beyond. Automation will be required ++ Access to ancillary data (calibration, validation, spectral libraries) will need to be equally straightforward + Increase in Variety: ++ need a to ease the burden of integrating disparate sources of data – CSIRO IM&T SISS, TERN, AuScope, IMOS, AGDC All of its means, massive opportunity for scientific advancement Highly dependent on being well coordinate as a community – security of supply at international levels, operations support via NEOS-IP and HPC centres (need to lobby together for these), sharing data in ways to support re-use, discovery and access Example (draft ToR) EOI Australian Satellite Calibration Working Group Proposed Terms of Reference: 3. To be a central point for coordination of a disparate group of cal-val activities related to Earth Observations from Space (EOS). 2. To provide a collective national forum to highlight the importance of cal-val for the improvement of EOS and to ensure its sustainability. 4. To identify and record plans, needs and priorities in cal-val facilities, activities and capability depending on national and international priorities. 5. Via relevant representatives, to communicate these priorities to Australian EOS community and to the relevant national committees (e.g. Australian Government Earth Observation from Space Working Group). 6. To support Australian representation on international forums involved in satellite calibration and validation. Supported Communities of Practice aid coordination – Cal-Val Working Group ToR, SAR CoP, AGDC, … CEOS WGISS40

18 AGDC v2 Multidimensional storage units
Ingest support for multiple sensors PC, Cloud, & HPC deployment Towards an Observation and Measurement approach to metadata Analysis and production pipeline Provenance UI Reference Implementation Apache v2 License CEOS WGISS40

19 The Future: Data Cubes and WGISS
Analysis Ready Data as an input to Data Cubes Discrete Global Grid Systems - OGC DGGS SWG Current Status and implications for Data Cube Discussion: Sentinel-2 / SPOT-5/6 prototype processing hub to ARD?...link to CWIC system? CEOS WGISS40

20 Analysis Ready Data Analysis Ready Data (ARD) is satellite data that have been processed and organized so users are not required to invest time and resources in specialized skills to apply corrections for: instrument calibration (gains, offsets); geolocation (spatial alignment); and radiometry (solar illumination, incidence angle, topography, atmospheric interference). In addition, ARD products are organized in a defined structure with associated metadata, quality flags and products.  CEOS WGISS40

21 Analysis Ready Data Where is ARD processing best performed?
What is ARD for different satellite/sensor types? e.g. SAR Limitations in ARD, making choices to early? Grids, resampling, resolutions…? Corrections? CEOS WGISS40

22 On Grids - A bit about DGGS
A DGGS is a spatial reference system that uses a hierarchical tessellation of cells to partition and address the globe. DGGS are characterized by the properties of their cell structure, geo- encoding, quantization strategy and associated mathematical algorithm CEOS WGISS40

23 OGC Discrete Global Grid Systems (DGGS) SWG
Sept/Oct 2015 – Candidate DGGS Core Standard Released for 30 day public comment period Nov/Dec 2015 – OGC DGGS v1.0 Core Standard adopted by OGC (assuming no major issues raised during public comment period) Beginning Jan 2016 – OGC DGGS SWG to begin elaboration of Extension Standards to the DGGS Core Standard Anticipated Extensions include: Interoperability interface protocols for OGC Web Services (e.g. WCS, WCPS, WCTiles, etc…) to facilitate DGGS-to-DGGS communication and processing 3D (and higher dimensional) DGGS Specifications Best Practice Guide CEOS WGISS40

24 Future: Possible WGISS involvement
AWS - S3 object storage study The Future: Data Cubes and WGISS Analysis Ready Data as an input to AGDC Discrete Global Grid Systems - OGC SWG Current Status and implications for AGDC evolution Discussion: Sentinel-2 / SPOT-5/6 prototype processing hub to ARD?...link to CWIC system? CEOS WGISS40


Download ppt "Australian Geoscience Data Cube"

Similar presentations


Ads by Google