Presentation is loading. Please wait.

Presentation is loading. Please wait.

Describe workflows used to maintain and provide the RDA to users – Both are 24x7 operations Transition to the NWSC with zero downtime NWSC is new environment.

Similar presentations


Presentation on theme: "Describe workflows used to maintain and provide the RDA to users – Both are 24x7 operations Transition to the NWSC with zero downtime NWSC is new environment."— Presentation transcript:

1 Describe workflows used to maintain and provide the RDA to users – Both are 24x7 operations Transition to the NWSC with zero downtime NWSC is new environment – Processing adjustments and test Today - starting point for actionable plan – Focus on NWSC DAV, HPC, & CFDS Baseline metrics – 7000 unique users annually – 1.4 PB of primary data – HPSS (2x in total) – 450 TB GLADE, permanent data for users, areas for data preparation – Web servers and DB servers – DSG – Use 6 DAV servers, mirage 0-5 NWSC Planning for RDA – 21 Dec. 2011

2 Homogeneous architecture and OS Common file system for RDA product development, NCAR access, and connection to DSS web servers – CFDS usage metrics for NCAR users at NWSC? Read/write connectivity to DB servers from Caldera, Geyser, and Yellowstone Dedicated and shared compute resources for user driven workload and burst DSS needs to prepare data – For example: A DSS dedicated system or queues, minimum restrictions? Requirements for RDA Data Processing Services @ NWSC

3 NWSC RDA Systems Structure

4 Run Research Data Archive Management System (RDAMS) tools and daemons, executed as user “rdadata” – dsarch, archive files from work disk spaces to HPSS and to CFDS – gather-metadata, read all incoming files to verify content, and create metadata records for DBs – dsrqst, manage delayed mode user requests subsetting, process data extraction and re- dimensioning format conversion, e.g. GRIB2 to netCDF file staging, bulk data moves, HPSS file to CFDS /transfer – dsupdt, complex DB governed scripting to regularly download new data, routine growth for 150+ datasets RDA data processing examples and tools

5 Daemon managed data processing work flow - A system initialized daemon named “dsstart” checks on dsrqst daemon status -A cron job checks on the status of the “dsstart” daemon on each server

6 Current Scale of Activity System works well and demand is accelerating upward Subsetting, format conversion, file staging – 166 user requests/week – 1-2 hours, average execution time/request – 65 Tb/week, input data volume processed – 3 TB/week, output data volume for users 385 TB data added to RDA in FY 2011 – One case the data processing was too large for mirage servers. Used Lynx, 3-4 weeks, 5-7 concurrent streams


Download ppt "Describe workflows used to maintain and provide the RDA to users – Both are 24x7 operations Transition to the NWSC with zero downtime NWSC is new environment."

Similar presentations


Ads by Google