Presentation on theme: "On-Demand Lake Circulation Modeling Chin Wu Nobuaki Kimura Environmental Fluid Mechanics Laboratory University of Wisconsin Paul Hanson Tim Kratz Tim Meinke."— Presentation transcript:
On-Demand Lake Circulation Modeling Chin Wu Nobuaki Kimura Environmental Fluid Mechanics Laboratory University of Wisconsin Paul Hanson Tim Kratz Tim Meinke Luke Winslow Center for Limnology, Trout Lake Station University of Wisconsin Kenneth Chiu Yinfei Pan Grid Computing Research Laboratory SUNY-Binghamton
On-Demand Circulation Modeling Hindcast: –Enter a date/time range. –Met data extracted from CFL database. –Model launched. –Results displayed.
Goals Develop a reusable solution for high-quality lake circulation modeling with validation. Provide historical results on demand for analysis. Support data assimilation, coupled models, etc. Support pluggable models. Support out-sourcing most ICT deployment to remote site. –Lake sites do not have the expertise, so they just provide the data.
Data Sources Model inputs –Meterological data –Water temperature etc. Model outputs (validation or assimilation) –Acoustic Doppler Current Profiler (ADCP) Provides 3D flow vectors of a vertical column of water. –One vector for each depth. –Measured waves Subsurface, via high-frequence temperature fluctuations. Surface
Challenges Cyberinfrastructure is a social/institutional problem as much as a technical problem. –E.g., technically, interoperability is a solved problem. Not a human-computer-interaction problem, but a human-institution-computer interaction challenge. (“Institution” in the broad sense.) –Two-way interaction, institutions can themselves be changed. Iterative –Develop rapid prototype, get feedback, repeat –Collaborative –Avoid over-engineering, be pragmatic.
Instrumented buoys and locations ADCP Thermistor chain Dissolve Oxygen sensor Buoy Map of Lake Trout RUSS deep ADCP
FLOW (Data retrieve and conversion) DataflowCAST HIND(HindCasting)NOW NowCasting) (NowCasting)FORE (Forecasting)extension CIRC (Circulation module) ANT (Animation of temp. & Velocity) COMP (Comparison between observation and prediction) Pre-processing Post-processing ADCP (Observed Velocity)
Workflows in WB-CAST Trout Lake Ethernet Radio Serial2Ethernet IP addr. + port # ADCP Unit Data- logger Ethernet Radio Trout Lake Station ADCP computer acqui r Oracle ADCP Binary ADCP Meta Madison CFL CFL SU N ADCP Binary Mod el results Madison EFM EFM Logger Data Logger Data ADCP Binary ADCP Binary Logger Data
Pre- & Post-processing of the circulation model Pre- & Post-processing of the circulation model Matlab-1 Conversion input files Model Compute Velocity W-temp, etc… Output Binary Matlab-2 Creating plot files extract_ adcp ADCP Binary Data base DB-Badger Extract data Logger- net data Linux computer FLOWCIRC ANTCOMP ADCP CFL
The way to drive the 3D circulation model 3DCirculationModel Input data Initial conditions bathymetry, water temprature, velocity surface elevation) Temperature profile Velocity profile Forcing in time serial (MED) wind, wind, Heat flux, …etc. Heat flux, …etc. Characteristics - Non-hydrostatic pressure - Bottom partial cell - Finite volume method OUTPUT
Computation using the Model WI-CAST NOW Time scale (hourly simulation) 9:00 10:00 11:00 Run1 Run2 Run3 Output1Output2Output3 12:00 WI-CAST HIND Time scale (3 hour simulation) 9:00 10:00 11:00 Only one run 12:00 Integration and animation Output1Output2Output3 Integration and animation An output is produced at each hour
Application Server Local Database Server (Mysql) Web Server (tomcat) Request Dir. Web Browser (Firefox, IE, Netscape …) Remote Database Server (Oracle) Internet Firewall Check Requests Find requests (Do Modeling …) Modeling & animation Finished (Update Database) Data acquisition
Client web browser initiates an Modeling & Animation (M&A) Request Server side will generate a request file corresponding to this request (this project using Java Server Pages), putting it into a request directory (Works like a request queue), then update one request record into the local database server. It’s default status is “pending” Client side hold the browser, waiting for M&A finish Server side, a daemon process called “checkrequest” will check if the request directory has requests If has, the request will be processed by the Application Sever (a logical sever here), this include “data acquisition” from a remote database in EFM which stored all the scientific data collected, then running the Modeling process, and finally render he output animation files When M&A finished in Application Server, it will update the request record in Local database server to have a “finish” status The client browser at last will find its request had been fulfilled. And goes into animation shows stage.
Future Work Gather feedback! Package as a toolkit? Service? Opal? Parallelization, increased resolution Job scheduling, Pragma integration? Data assimilation using MPC? Coupled models –Biological, chemical –Fluid-surface interactions Real-time wave reconstruction from captured video
Credits Chin Wu, Nobuaki Kimura (EFM-UWI) –Modeling, output components Paul Hanson, Luke Winslow (CFL-UWI) –Data extraction, processing Tim Kratz, Tim Meinke (TLS-UWI) –Equipment, deployment, sensor network Yinfei Pan, Kenneth Chiu (SUNY-Binghamton) –ADCP acquisition/management, job launching, monitoring, integration, web development
Acknowledgements We’d like to thank the generous support of Moore Foundation, the NSF LTER program, and NSF awards DBI , IIS , CNS , OCI
Application Server Overload Now One day’s prediction needs about one hour for a 2.2GHz Two-CPU machine to do computing … Thus, our further steps will be: –Make the computing available on a whole cluster –Then, make it works as web based computing (deploy web services ) The key problem here –The model computation continuous generate time sequence related meta & data files –It’s hard to make all the computing procedures and functions parallel immediately –Idea (store the modeling output whenever possible) On one cluster, the “checkrequst” daemon works as a job scheduler. On web, employ distributed data hash, also implementing four types of services: job schedule service, data locating service, data acquisition service, and data storage service.
Job Schedule Server Cluster Parallel Analysis Server (with Time Vector Sequence Database) Web Server Requests Central Database (Meta, Data and Animation) Getting the time range available during the time range requested Select a time vector sequence based on the minimum total expected time Form job division and schedule jobs onto computers in cluster Store back each time vector’s delay time corresponding to a time range and update the expected time Getting data from central database Workstation PS: Time Vector here is a set with many (procedure id, expected time) pairs. This procedures are supposed to do at the same time.