Presentation is loading. Please wait.

Presentation is loading. Please wait.

Successful Strategies for Overcoming the Obstacles in Acquisition, Management, and Analysis of CMIP5 Data Jennifer Miletta Adams IGES/COLA AMS 2013.

Similar presentations


Presentation on theme: "Successful Strategies for Overcoming the Obstacles in Acquisition, Management, and Analysis of CMIP5 Data Jennifer Miletta Adams IGES/COLA AMS 2013."— Presentation transcript:

1 Successful Strategies for Overcoming the Obstacles in Acquisition, Management, and Analysis of CMIP5 Data Jennifer Miletta Adams IGES/COLA AMS 2013

2 Workflow Requirements No,,,, et al. Script-Based Flexible Automated Runs in a UNIX environment

3 Workflow Elements 1. Create list of desired data: all available models and ensembles for a subset of experiments, realms, frequencies, and variables 2. Keep track of what has already been acquired 3. Identify what is available 4. Get needed data 5. Make data user-friendly

4 Specification of Desired Data piControl/atmos/mon/Amon/ clt hfls hfss hurs pr prsn prw ps psl rlds rlus rlut rlutcs rsdt rsut rsutcs tas uas vas piControl/atmos/day/day/ clt hfls hfss huss pr prsn psl rlds rlus rlut tas uas vas piControl/ocean/mon/Omon/ msftmyz psu rhopoto so thetao tos uo vo piControl/ocean/day/day/ tos piControl/land/mon/Lmon/ evspsblsoi evspsblveg lai mrro mrso mrsos tran piControl/land/day/day/ mrsos piControl/landIce/mon/LImon/ snc snw piControl/seaIce/mon/OImon/ sic sit historical/atmos/mon/Amon/ clt hfls hfss hurs pr prsn prw ps psl rlds rlus rlut rlutcs rsdt rsut rsutcs tas uas vas historical/atmos/day/day/ clt hfls hfss huss pr prsn psl rlds rlus rlut tas ua uas va vas historical/atmos/fx/fx/ sftlf historical/ocean/mon/Omon/ msftmyz psu tauuo tauvo tos historical/ocean/day/day/ tos historical/land/mon/Lmon/ cropFrac evspsblsoi evspsblveg lai mrro mrso mrsos tran historical/land/day/day/ mrsos historical/landIce/mon/LImon/ snc snw historical/seaIce/mon/OImon/ sic sit transix transiy

5 Keep Track of Acquired Data I Local CMIP5 data files are stored under the following directory structure (10 keywords): /cmip5 /data /Experiment /Realm /Frequency /MIP-Table /Variable /Institute.Model /Ensemble /Version /datafiles.nc

6 Keep Track of Acquired Data II Use the “find” command to create a master list of all subdirectory names under /cmip5/data Updated daily, this master list shows all acquired data residing on local disk Users can sort or filter this list in order to discover if data they need has been acquired

7 Discovery of Available Data I Build a Dataset Search URL: http://pcmdi9.llnl.gov/esg-search/search?type=Dataset &latest=true &replica=false &facets=id &limit=0 &project=CMIP5 &experiment=piControl &realm=atmos &time_frequency=mon &cmor_table=Amon &variable=clt&variable=hfls….&variable=vas

8 Discovery of Available Data II Capture dataset search results into a text file called “tmp” using wget : wget –O tmp “$URL” Remove debris text from “tmp” to extract relevant information: grep 'name=\"cmip5.’ tmp > tmp2 sed s/' tmp3 sed s/'\">1 '//g tmp3 > tmp4 sed s/'\">2 '//g tmp4 > result

9 Discovery of Available Data III The result text file contains a list of dataset IDs and data nodes for all available data that match my search criteria: cmip5.output1.BCC.bcc-csm1-1-m.piControl.mon.atmos.Amon.r1i1p1.v20120705|bcccsm.cma.gov.cn cmip5.output1.BCC.bcc-csm1-1.piControl.mon.atmos.Amon.r1i1p1.v1|bcccsm.cma.gov.cn cmip5.output1.BNU.BNU-ESM.piControl.mon.atmos.Amon.r1i1p1.v20120626|esg.bnu.edu.cn cmip5.output1.CCCma.CanESM2.piControl.mon.atmos.Amon.r1i1p1.v20120623|dapp2p.cccma.ec.gc.ca cmip5.output1.CMCC.CMCC-CM.piControl.mon.atmos.Amon.r1i1p1.v20120627|adm07.cmcc.it etc. This list of what is available is compared to the master list of what has been acquired to determine what is needed

10 Get Needed Data a)Determine number of files for each data set b)Download wget script, give it a unique name c)Keep authentication certificates up-to-date d)Execute wget script e)Put files in proper place under /cmip5/data/

11 Determine Number of Files Build a File Search URL: http://pcmdi9.llnl.gov/esg-search/search?type=File &dataset_id= cmip5.output1.NCAR.CCSM4.rcp85.v1|tds.ucar.edu &variable=clt&variable=hfls….&variable=vas Extract number of files from result : wget –q tmp “$URL” –O - | grep numFound

12 Download WGET Script I Build a wget URL: http://pcmdi9.llnl.gov/esg-search/wget? &dataset_id= cmip5.output1.NCAR.CCSM4.rcp85.v1|tds.ucar.edu &limit=1000 &variable=clt&variable=hfls….&variable=vas If number of files > 1000: You need separate URLs: Append “&offset=1000” to 1 st URL to get 2 nd group of files Append “&offset=2000” to 1 st URL to get 3 rd group of files

13 Download WGET Script II Build a meaningful name for wget script: wget.cmip5.output1.NCAR.CCSM.rcp85.mon.atmos.Amon.r1i1p1.v1.sh Use wget to download the wget script : wget –q –O wgetname.sh “$URL”

14 User Access and Authentication Register with ESGF and get an OpenID and password e.g. https://pcmdi9.llnl.gov/esgf-idp/openid/jennifer Enroll in appropriate group (e.g. CMIP5 research) Obtain or renew certificates for user authentication Use the python utility “MyProxyClient” to renew certificates automatically: #!/bin/bash export X509_CERT_DIR=$HOME/.esg/certificates export X509_USER_PROXY=$HOME/.esg/credentials.pem < /homes/jma/pass /usr/local/bin/myproxyclient \ logon -s pcmdi9.llnl.gov -o $X509_USER_PROXY \ -p 7512 –T -l jennifer -S

15 Execute WGET Script Run wget script, capture all output in a log file: wgetname.sh -v > wget.log 2>&1 & A wget may fail for any number of reasons:  Data node down  Data node throttling number of simultaneous wgets  File not found  Checksum failure  Certificate expired, or authorization failed  Connection timeout  Forbidden If at first you don’t succeed, try, try, try again Failure is an option

16 Make Data User-Friendly Create GrADS descriptor files Aggregate files over time dimension Make use of ensemble dimension when appropriate Identify missing or overlapping time periods Assign non-standard dimensions (e.g. basin averages or fixed fields) Handle 365_day calendars Create PDEF files for non-rectilinear grids For ocean and sea ice realms ESMF’s RegridWeightGen utility generates the interpolation weights Vector fields must be rotated from grid-relative to Earth-relative coordinates before interpolation

17 Get Additional Information About CMIP5: http://cmip-pcmdi.llnl.gov/cmip5/ cmip5-helpdesk@stfc.ac.uk About ESGF: http://esgf.org/wiki/ESGF_Index/ esgf-user@lists.llnl.gov About this presentation: jma@iges.org


Download ppt "Successful Strategies for Overcoming the Obstacles in Acquisition, Management, and Analysis of CMIP5 Data Jennifer Miletta Adams IGES/COLA AMS 2013."

Similar presentations


Ads by Google