Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Adventures in Web Services for Large Geophysical Datasets Joe Sirott PMEL/NOAA.

Similar presentations


Presentation on theme: "1 Adventures in Web Services for Large Geophysical Datasets Joe Sirott PMEL/NOAA."— Presentation transcript:

1 1 Adventures in Web Services for Large Geophysical Datasets Joe Sirott PMEL/NOAA

2 2 Motivation Zonal averages of precipitation trends From Zhang, et al Nature 448, 461-465(26 July 2007)‏

3 3 Seasonal zonal averages of Arctic temperature trends From Graversen, et al Nature 541, 53-56(3 Jan 2008)‏

4 4 Use case Calculate zonally averaged seasonal temperature trends from 20 th century climate experiment from four climate models (NASA GISS, NCAR PCM and CCSM, GFDL CM2.1, and Hadley CM3) in CMIP3 archives from 30N to 90N Total of 81 files in 36GB Time period of interest 1979-2000

5 5 Recipe is… Regrid all model data to common grid Calculate seasonal ensemble means for all models for 30N-90N, 1979 - 2000 Calculate zonal means from seasonal ensemble means Calculate seasonal trends from zonal mean Plot/download results

6 6 Traditional approach Find datasets/variables of interest Download individual data files or subset with OPeNDAP Analyze data locally

7 7 Problems with traditional approach Awkward user interface(s)‏ Obscure UI naming conventions makes it difficult to find variables of interest Datasets often aren’t aggregated Subsetting and/or aggregation services often fail with large datasets (e.g. out of memory errors) Requires download of 36GB of data (file download) or ~2.5GB (OPeNDAP) for final product ~5KB.

8 More modern approach Aggregated data Spatial or temporal subsetting Meaningful variable and dataset names Modern Web UI

9 Mandatory product plug

10 10 Dapper (dapper.pmel.noaa.gov/dapper) Web server that provides distributed access to in-situ or gridded data via OPeNDAP protocol Aggregates local files, or remote datasets via HTTP or OPeNDAP Streams data (no more “out of memory” errors)‏

11 11 DChart (dapper.pmel.noaa.gov) Browser based tool for visualizing or downloading in-situ or gridded ocean or atmospheric data Also aggregates data AJAX based user interface Access to ~3.5 TB of gridded data Configurable UI

12 12

13 What’s missing? Still requires download of ~2.5GB for final product ~5KB Lots of clicking to download multiple datasets BIG problem for AR5 data needs (>1PB)

14 Missing piece

15 Ideal analysis environment (scientist perspective) Highly interactive (i.e. command line) Scripting in familiar language of choice (bash, Python, Ruby, Matlab) Access to multiple tools (Matlab, nco, cdo, GrADS, Ferret, gdal, … ) Access to custom home-grown tools Storage of intermediate products (anomalies, statistics, etc.)

16 Limitations of Web services Users locked-in to backend analysis software Difficult to debug Steep learning curve How to handle long lived operations? Security problems No (or limited) scripting capabilities Not interactive

17 A cloud computing alternative Upload data to cloud Move computation to data Boot VM preloaded with common analysis tools Users can customize (and share) VM images and data Users have full ssh access to Xen VM(s) running Linux with local access to data stored in cloud

18 Amazon AWS Amazon EC2 –Uses customizable Linux XEN image –Start 1-100 hosts in parallel –$0.10/instance-hour Amazon S3 –Data storage service –$0.15 GB/month for storage –Data transfer in $0.10/GB –Data transfer out $0.18/GB

19 Cloud analysis architecture

20 Sample workflow (free version) 1. User authenticated via Web UI 2. EC2 instance booted with OPeNDAP access to datasets (stored on S3 or EC2 volumes) 3. User rpms installed (optional) 4. ssh access to instance using ssh keypair (generated when account issued) 5. User analyzes, downloads, visualizes,... 6. Instance restored to pool after user done (or after period of inactivity)

21 Analysis cloud advantages Scalable Data lives in same network as software No user software lock-in Users can work in familiar environment Security problems reduced Interactive Access to debugging tools BUT Lots of details to work out!

22 22 Questions?

23 23 More info PMEL Dapper Server http://dapper.pmel.noaa.gov/dapper PMEL DChart http://dapper.pmel.noaa.gov/dchart Downloads, propaganda http:// www.epic.noaa.gov/epic/software/dapper / http:// www.epic.noaa.gov/epic/software/dapper / http:// www.epic.noaa.gov/epic/software/dchart/ http:// www.epic.noaa.gov/epic/software/dchart/Joe.Sirott@noaa.gov


Download ppt "1 Adventures in Web Services for Large Geophysical Datasets Joe Sirott PMEL/NOAA."

Similar presentations


Ads by Google