Presentation on theme: "The UNC ADCIRC Application on the SURAGrid Steve Thorpe MCNC SURAGrid Application Workshop February 22, 2006 Washington, D.C. You can grab."— Presentation transcript:
The UNC ADCIRC Application on the SURAGrid Steve Thorpe MCNC firstname.lastname@example.org SURAGrid Application Workshop February 22, 2006 Washington, D.C. You can grab a copy of these slides by following links from http://www.mcnc.org/thorpe/presentations
2Outline What is the UNC ADCIRC Application? Why ADCIRC is Important to Grid-Enable Current Status of Grid-Enabling Next Steps Towards SURAGrid Deployment Steps for Day 2 Future Plans Conclusions
3 Motivation: Weather Strikes Daily! Hurricane Season 2005 –26 named storms, 14 hurricanes, 3 with major impact –billions of dollars economic losses We need to … –provide early, accurate, and frequent forecasts, dissemination of information –provide infrastructure to solve inter-disciplinary problems –to be able to interact in real-time i.e. evaluate and adapt –innovate new methodologies for prediction Source: NOAA 2005 …2006…
4 Part of SURA’s SCOOP Program The S outheastern Universities Research Association’s (SURA) C oastal O cean O bserving and P rediction Program SCOOP goals –improve predictions and mitigate the impact of coastal phenomena such as extra-tropical storms and hurricanes –implement a comprehensive observing system that will validate accurate and timely short- and long-term predictions –open web portal access to basic and analyzed data and linked numerical models, available in real-time –“plug and play” model for next generation From www.openioos.org
5 SCOOP Participant Institutions Gulf of Maine Ocean Observing System (GoMOOS) Louisiana State University MCNC National Oceanic and Atmospheric Administration SURA Texas A&M University University of Alabama, Huntsville (UAH) University of Florida University of Maryland University of Miami University of North Carolina, Chapel Hill / RENCI Virginia Institute of Marine Science (VIMS)
6 ADCIRC Circulation Model Developed by: –Dr. Rick Luettich, UNC Chapel Hill’s Institute of Marine Sciences –Dr. Joannes Westerink, University of Notre Dame’s Department of Civil Engineering and Geologic Sciences ADCIRC is a finite element method (FEM) shallow water model for computing tidal and storm surge water level and depth averaged currents Primary model used by the NC SCOOP team
NC SCOOP Team UNC Marine Sciences Brian Blanton, Rick Luettich, Larry Mason (ITS) ADCIRC development/implementation RENaissance Computing Institute Lavanya Ramakrishnan, Brad Viviano, Howard Lander, Dan Reed Joint institute spanning UNC-CH, NCSU and Duke National grid efforts such as Linked Environments for Atmospheric Discovery (LEAD), TeraGrid MCNC Michael Garvin, Steve Thorpe, Chuck Kesler Non-profit org, committed to advancing education, innovation and economic development throughout NC by delivering next-generation IT services Subcontractor to RENCI 7
8 SCOOP Tasks* Data Standards Data Translation, Transport, & Management Modeling Configuration Tool Set Visualization Services Verification Services Storage and Computing Services Security Grid Management Middleware *NC Team has focused especially in the blue areas
9 NC SCOOP Project Goals Large scale model availability via web-based portal –users can access the model easily –enable model runs with different input datasets (e.g. meteorological data) –facilitate model output distribution Distributed data management for collecting, archiving and providing access to model output products Model execution in a Grid computing environment –automatically find available computational resources
10 High Level View of Where ADCIRC Fits In Computational Grid Executes ADCIRC Model UF Winds Analytical/GFDL NCEP Winds NAM/POLAR LDM SCOOP ARCHIVAL & Visualization LDM OPeNDAP Other Distribution Processing Visualization OPeNDAP LDM
11 NC SCOOP V1 - Hindcast Data Flow Manually specify model run parameters in Make tarball of needed Archived Files Third-party transfer between Portal host and Compute host Execution of requested simulation on Compute host 1 2 3 RENCI/UNC Portal Globus Gatekeeper Mass Storage NCEP Daily Model Runs OPeNDAP Server GridFTP Globus Gatekeeper GridFTP LDM To UAH MCNC Grid Globus Gatekeeper GridFTP LSF Queue MyProxy UNC Experimental SCOOP Machine 2341 4 NFS Mounted UNC Production System
12 NC SCOOP Portal (1/4) Models (ADCIRC) Data Access via OPeNDAP GridFTP Transfer
13 NC SCOOP Portal (2/4) OPeNDAP Access Access to operational (daily) ADCIRC output. Global Elevation Global Velocity OPeNDAP LDM
14 NC SCOOP Portal (3/4) Hindcast Submit Compute Job Set Run Dates Hurricane Ivan Current ADCIRC grid 16 CPU Decomposition
15 NC SCOOP Portal (4/4) Solution Display 14 day sim 16 cpu 8 min Note: storm surge and tides are reflected in the water levels. Hurricane Ivan
17Outline What is the UNC ADCIRC Application? Why ADCIRC is Important to Grid-Enable Current Status of Grid-Enabling Next Steps Towards SURAGrid Deployment Steps for Day 2 Future Plans Conclusions
18 NCEP and WANAF Wind Ensemble NCEP Ensemble Perturbation Forecast Storm Tracks for Forecast with Initial time = 2005071800 UFL-SCOOP Analytic Forecast Storm Tracks for Forecast with Initial time = 2005071800 Hurricane Emily Need lots of compute resources, as each wind forecast can drive a different ADCIRC simulation. We refer to a set of such runs as an ensemble.
19 2 ADCIRC Input Wind Predictions for Tropical Storm Arlene Standard NCEP model NAH, NCEP’s Hurricane model The NAH model’s improved results can substantially effect the skill of ADCIRC’s storm surge forecast.
20 OpenIOOS Translation Archive Services Catalog Services Verification / Validation Archive Services Catalog Services ADCIRC Model Resource Selection Data Management Application Env. Archive Services Catalog Services Observations Winds Model Results User Interface Resource Access Layer Visualization Automated Execution ADCIRC automatically triggered upon arrival of input data sets Since jobs are not manually initiated in this scenario, we need grid technologies to help find compute resources to run the jobs.
21 Benefits of Grid Enabling ADCIRC Take advantage of “compute on demand” cycles provided by grids Greater value of prediction results since get them quicker Allows us to run more models and at higher resolutions Can adjust compute location and model configuration dynamically based on: –Compute load on available resources –Number of CPUs available –Resolution of the model’s finite element mesh –Number of ensemble members –etc.
22Outline What is the UNC ADCIRC Application? Why ADCIRC is Important to Grid-Enable Current Status of Grid-Enabling Next Steps Towards SURAGrid Deployment Steps for Day 2 Future Plans Conclusions
23 Hurricane Forecast track NC SCOOP Team Has Developed a Storm Surge Ensemble Prediction System winds ADCIRC computes coastal water levels for each ensemble member across the SCOOP grid, with jobs migrating to available resources Ensemble solutions Compute forecast wind ensemble Develop and publish water level ensemble forecast National Hurricane Center University of Florida UNC, MCNC, UAH, UFL, LSU, VIMS, ….. SURAGrid UNC Create tars of winds, mesh, ICs
24 Ensemble System Can Improve Forecast Results Left: ADCIRC max water level for 72 hr Hurricane Katrina forecast starting 29 Aug 2005,driven by the "usual, always- available” ETA winds. Right: ADCIRC max water level over ALL of UFL ensemble wind fields for 72 hr Hurricane Katrina forecast starting 29 Aug 2005, driven by “UFL always-available” ETA winds. Images credit: Brian O. Blanton, Dept of Marine Sciences, UNC Chapel Hill
25 Real-Time Resource Selection API We developed a simple Java API –We want to use more than just MCNC and UNC resources for the multiple runs required by the ensemble system –The API answers the question “What is the best place for me to run this job, right now ? ” –Bases choice on current availability of compute and data resources –Allows arbitrary ranking algorithm(s) through user-supplied Java plug-ins –Uses Java CoG kit + the usual GT 3.2.1 tools: MDS for information sharing GridFTP for file transfer PreWS GRAM for job submission
26 Real-Time Resource Selection MDS GridFTP Gatekeeper MDS GridFTP Gatekeeper MDS GridFTP Gatekeeper How many resources can I get (queue)? If MDS available else Run a probe job to find no of cpus and rough estimate on time Are services up? Resource Chooser Monitoring Meta-scheduling Policy … Given a set of remote resources, what is the best one for me to run on, right now??
27 Additional Thoughts on the Resource Chooser Considered other meta scheduler technologies –Too complex, proprietary components, non-java, etc…. –The basics that come with GT 3.2.1 plus our simple API work for us Future possible directions for API –Speed up resource choosing by thread enabling –Add additional plug-ins to take into account job priority, additional resource characteristics (e.g. CPU speed, prior reliability…) –Apply to different models other than ADCIRC –Revisit whether to use other meta scheduler technologies?
28 SCOOP Partner Grid Resources We’ve established grid resources at MCNC and RENCI –GT 3.2.1 based Connected to non-NC SCOOP partner resources also –Not always easy! cluster 16 TB Storage LSF scoop.ncgrid.org GridFTP, MDS, GRAM RENCI Portal – www.scoop.unc.edu Users cluster Mass Storage System PBS dante1.renci.org GridFTP, MDS, GRAM MCNCUNC …….. … UAH … UFL … LSU … VIMS …
30 Grid Testbed Experiences Components at every site –Globus gatekeeper, GridFTP, PBS/LSF, MDS Globus setup at compute sites –firewall problems –CA trust problems –“old” style cert problems –MDS not set up –Job manager (PBS or LSF) not set up –Tools sometimes not installed (e.g. uudecode) This is NOT easy, it takes a LOT of effort! TAMU LSU UF UNC, MCNC UAHb
31 SURAGrid Partner Grid Resources Initial progress has been made at: –Louisiana State University (Tevfik Kosar, Harmat Kaiser, Gabrielle Allen) –Texas A&M University (Steve Johnson) –University of Alabama Huntsville (Sandi Redman) –University of Kentucky (Vikram Gazula) –University of Southern California (Nirmal Seenu) –TACC (Ashok Adiga) –Have I missed any others? Status is in various stages: –The “machine ordering” stage –The “gt3 installation” stage –The “configure scheduler” stage –The “gt3 connectivity/Firewall problem resolution” stage
32Outline What is the UNC ADCIRC Application? Why ADCIRC is Important to Grid-Enable Current Status of Grid-Enabling Next Steps Towards SURAGrid Deployment Steps for Day 2 Future Plans Conclusions
33 Next Steps Toward SURAGrid Deployment (1 of 2) Ensure your site has met the basic requirements (tomorrow, day 2 of the workshop, and beyond if necessary) Howard, Steve, and Lavanya will then work to test ADCIRC on your system: –First, “standalone” from the command line –Next, through the Globus Gatekeeper and using the resource choosing API –This assumes you’ve first completed the steps outlined in http://www.ccs.uky.edu/SCOOP (see upcoming slides)
34 Next Steps Toward SURAGrid Deployment (2 of 2) If/when the testing is successful, your resource(s) will be added to those chosen from for the regular ADCIRC runs We hope to have further testing done during the next couple weeks to 2 month time frame If all goes well, there will be many more CPUs available in time for hurricane season (June 1)!
35Outline What is the UNC ADCIRC Application? Why ADCIRC is Important to Grid-Enable Current Status of Grid-Enabling Next Steps Towards SURAGrid Deployment Steps for Day 2 Future Plans Conclusions
36 Ensure Your Site Has Met The Basic Requirements for ADCIRC (1 of 2) Pre-Web Services Versions of the Globus Toolkit services -We presume you’re using GT 3.2.1; although in theory GT versions 2.4 -> 4.x should work -See www.globus.org You need these three Globus services: –gridFTP server for file transfer –GRAM server for job submission –MDS for information sharing
37 Also you’ll need: –RENCI and MCNC’s installations will need to trust your CA –Your installation will need to trust RENCI and MCNC’s CAs –Back end queuing system such as LSF or PBS –Queuing system should have an mpich-based mpirun behind it –Adapter (this allows Globus to submit jobs to the back-end queuing system, and to publish information about the queuing system through MDS) –Linux x86-based system –Preferably a cluster (a single node almost definitely would never be chosen for an ADCIRC run) Ensure Your Site Has Met The Basic Requirements for ADCIRC (2 of 2)
38 Follow the steps in the URL http://www.ccs.uky.edu/SCOOP This site has some (basic) setup instructions –Thanks to University of Kentucky’s Vikram Gazula for this web space! Letting your GT installation trust the RENCI and MCNC certificate authorities Account setup for Lavanya, Howard, and Steve –Also their /etc/grid-security/grid-mapfile entries Setting up Globus with your back end scheduler How to test MDS using grid-info-search –Should get back an “Mds-Computer-Total-Free-nodeCount” entry when doing a “grid-info-search” testHosts.sh, a script that can do limited checks of your GRAM, MDS, and GridFTP installation
39 Setup Related Questions Before contacting us, have you checked the web page to see if your answer is buried somewhere within? –http://www.ccs.uky.edu/SCOOP Recommend email to “scoop-support at renci dot org” as a next try –Howard Lander, Steve Thorpe, and Lavanya Ramakrishnan are on this list. We’ll try to check email frequently tomorrow. If that doesn’t produce a satisfactory response, you can try IM’ing us (could try to set up a group chat). Our AIM addresses are –Steve:thorpe682 –Howard:howardlander –Lavanya:lavanyaRamakrish
40 Last Resort If you’re still striking out, you can try calling Howard and/or Steve: –Howard’s office: (919) 445-9651 –Steve’s home office:(610) 866-3286 Note: I usually pick up only after I hear a friendly voice on the machine! –Steve’s cell:(919) 724-9654
41Outline What is the UNC ADCIRC Application? Why ADCIRC is Important to Grid-Enable Current Status of Grid-Enabling Next Steps Towards SURAGrid Deployment Steps for Day 2 Future Plans Conclusions
42 Future NC SCOOP Plans (1 of 2) Resource Chooser API extensions –Threading to speed up choosing –Additional plug-ins (CPU speed, reliability, data location, priorities based on urgency…) –Apply to different models, not just to ADCIRC Improve fault tolerance of ADCIRC workflow –e.g. if high priority, submit same job multiple times –Or if job fails, restart ASAP
43 Future NC SCOOP Plans (2 of 2) Better integration with other SCOOP efforts –UFL’s Virtual Cluster setup –Data management –Catalog and archive access –“Coupling” different models (e.g. ADCIRC and SWAN) Model verification activities
44Outline What is the UNC ADCIRC Application? Why ADCIRC is Important to Grid-Enable Current Status of Grid-Enabling Next Steps Towards SURAGrid Deployment Steps for Day 2 Future Plans Conclusions
45 Concluding Thoughts Can be time consuming to establish grid connectivity among organizations… –Configuration of firewalls, NAT, GlobusToolkit, Certificate Authorities, job scheduler adapters –Systems administrator nervousness –etc….. …but it’s worth it! –SCOOP partners are creating a distributed, scalable, modular resource that empowers scientists at multiple institutions –This will advance the science of prediction of surge & wave impacts during storms, and other environmental hazards
46 Special Thanks … to Brian Blanton and Lavanya Ramakrishnan for their help with these slides! … to Philip Bogden, Joanne Bintz, Mary Fran Yafchak and others from SURA for their SCOOP project leadership … to Art Vandenberg for his tireless Gridification efforts … to you, the SURAGridsters who are generously sharing your resources!!
47 Questions? Also, don’t forget about scoop-support at renci dot org