Presentation is loading. Please wait.

Presentation is loading. Please wait.

Building the Computational Infrastructure for DART

Similar presentations


Presentation on theme: "Building the Computational Infrastructure for DART"— Presentation transcript:

1 Building the Computational Infrastructure for DART
David Abramson Jagan Kommineni Tim Ho Ilkay Altinas

2 Outline The GriddLeS IO Library Kepler & Grid Workflows
Kepler + GriddLeS = Flexible Workflows Transparent Data Replication (SI5) Active Data (SI6)

3 GriddLeS: Reusing Legacy Code
Legacy applications within the workflow rather than rewriting new programs. Existing programs Are often written in a range of legacy languages such as Fortran and C Often use conventional file IO operations like READ and WRITE. May be old and are not well suited to modification. end if deltt = deltt2 * 0.5 do 100 m=1,mx do 200 j=1,jx if(j.eq.1.and.m.eq.1) go to 200 l = j+m-2 kl = float(l*(l+1)) dkl = kl-2. c Apply the horizontal diffusion pt(j,m) = pt(j,m) - dkl*hdiff*pm(j,m) ct(j,m) = ct(j,m) - dkl*hdiff*cm(j,m) zt(j,m) = zt(j,m) - dkl*hdiff*zm(j,m) ppv=pm(j,m)+deltt2*pt(j,m) if ( imp.eq.1 ) then c Do a semi-implicit time step ccv = ( cm(j,m) + deltt2* ( ct(j,m) + kl*( zm(j,m) + & deltt*(zt(j,m)-zmean*cm(j,m)*.5))))/ & ( 1. + deltt*deltt*kl*zmean ) zzv = zm(j,m) + deltt2*( zt(j,m) - zmean*(cm(j,m)+ccv)*.5 ) else c Do an explicit time step ccv=cm(j,m)+deltt2*(ct(j,m)+kl*z(j,m)) zzv = zm(j,m) + deltt2*( zt(j,m) - zmean*c(j,m) ) if (ifirst.eq.0) then c Here we do the Asselin time filtering. Note we filter AND update c ( [alpha]m=[alpha] ), so the '...m' appears on lhs rather than the c current values. nb that ppv is the future p value at this stage. pm(j,m)=p(j,m) + vnu*(pm(j,m)-2.*p(j,m)+ppv) cm(j,m)=c(j,m) + vnu*(cm(j,m)-2.*c(j,m)+ccv) zm(j,m)=z(j,m) + vnu*(zm(j,m)-2.*z(j,m)+zzv) p(j,m)=ppv c(j,m)=ccv z(j,m)=zzv c Do a forward time step w/o updating the previous step values or time c filtering c(j,m) = ccv z(j,m) = zzv 200 continue 100 continue c c turn off forward timestep flag (may already be off) ifirst=0 return end + Workstations The Grid

4 GriddLeS Legacy applications need to be shielded from IO details in Grid Local files Remote files Replicated files Producer-consumer pipes Don’t want to lock in IO model when application is written (or even Grid Enabled) Choice of IO model should be Dynamic Late bound

5 Flexible IO in GriddLeS
Late bound decision Local File read() write() seek() Remote File close() open() FileMultiplexer Cache GRS Remote Application Process Legacy Application Replica Replica Replica Replica

6 Interprocess Communication in GriddLeS
Writer Application Reader Application fd = open(‘blah’, “w”); : write(fd, …..) fd = open(‘blah’, “r”); : read(fd, …..) blah socket cache

7 open, read, write, seek, close, stat
GriddLeS Implementation Application FileMultiplexer FileMultiplexer: A small piece of software placed between the application and the operating system Trapping mechanism open, read, write, seek, close, stat Grid Buffer Client GNS Client SRB Client/ Globus Replica Client GForm Client Operating System Application attempts certain certain system calls, the FileMultiplexer grabs control and manipulates the results by using client modules (such as web service client, srb client, Globus replica client and gform client).

8 GriddLeS Architecture
Application Write, Read, etc Grid Buffer Client Server Grid FTP Local File System Remote File GNS File Multiplexer GRS Application Write, Read, etc Grid Buffer Client Server Grid FTP Local File System Remote File GNS File Multiplexer GRS GriddLeS Name Server (GNS)

9 Configuring an application
GriddLeS Name Service stores configuration information on a particular application Set of entries Keyed on file name, machine name Different behaviour Open local file Open remote file Open replicated file (performance based sourcing) Open pipe between applications Locate cache file(s) No fixed number (or location) of GNSs

10 Grid Workflows … Workflow captures the linkage of constituent tasks together in a hierarchical fashion to build larger complex tasks. Workflow is concerned with the automation of procedures whereby files and data are passed between participants according to a defined set of rules to achieve an overall goal It is possible to build Grid workflows in which a number of otherwise independent legacy applications are run in a “pipeline” These workflows are called virtual applications and can run on virtual organizations The individual components process data from an arbitrary source ranging from - data bases, files, replicas, data from other processes - real time data from scientific instruments Kepler workflow system

11 Genomics: Promoter Identification Workflow
Source: Matt Coleman (LLNL)

12 Ecology: GARP Analysis Pipeline for Invasive Species Prediction
Training sample (d) GARP rule set (e) Test sample (d) Integrated layers (native range) (c) Species presence & absence points (native range) (a) EcoGrid Query Layer Integration Sample Data +A3 +A2 +A1 Calculation Map Generation Validation User Integrated layers (invasion area) (c) Species presence &absence points (invasion area) (a) Native range prediction map (f) Model quality parameter (g) Environmental layers (native range) (b) Generate Metadata Archive To Ecogrid Registered Ecogrid Database Environmental layers (invasion area) (b) Invasion area prediction map (f) Selected prediction maps (h) Source: NSF SEEK (Deana Pennington et. al, UNM)

13 Source: NIH BIRN (Jeffrey Grethe, UCSD)

14 DRAG and DROP Utilities from Actor and Director Libraries
Sample Atmospheric Science Workflow DRAG and DROP Utilities from Actor and Director Libraries The Graph Editor consists of a Director library and an Actor Library Director: Governs the execution of a composite entity, model. Scheduling, dispatching threads, generate code etc … Actor: is an encapsulation of the parameterized actions. Building workflow is as simple as dragging actors from library

15 Kepler Directors Orchestrate Workflow Synchronous Data Flow
Consumer actors not started until producer completes Files copied from producer to consumer. Process Networks All actors execute concurrently Communication through TCP/IP Sockets Dedicated IO IO modes produce different performance results.

16 Integrating Kepler & GriddLeS
Application Write, Read, etc Grid Buffer Client Server Grid FTP Local File System Remote File GNS File Multiplexer SRB GriddLeS Name Server (GNS) Make Gridlet Actor Gridlet Run Application UPDATE GNS Atmospheric Science Workflow

17 Transparent Data Replication (SI5)

18 Transparent Replication
Real time data Data Fusion General Circulation model Topography Database Regional weather model Vegetation Database Emissions Inventory Photo-chemical pollution model Particle dispersion model Bushfire model

19 The GriddLeS Replication Service
GriddLeS Name Server Grid Buffer Client Local File Client Remote File Client Grid FTP Server Local File System Application GNS Client File Multiplexer Grid Buffer Server White, Read, etc GRS IO Network Monitor NWS Client NWS Server SRB Client GRS SRB Server GRS

20 Architecture of the GRS
GriddLeS Name Server Grid Buffer Client Local File Client Remote File Client Grid FTP Server Local File System Application GNS Client File Multiplexer Grid Buffer Server White, Read, etc GRS IO Network Monitor SRB Server SRB Client NWS Client NWS Server GRS RLS Server RLS Client GRS

21 Architecture of the GRS
GriddLeS Name Server Grid Buffer Client Local File Client Remote File Client Grid FTP Server Local File System Application GNS Client File Multiplexer Grid Buffer Server White, Read, etc GRS IO Network Monitor SRB Server SRB Client NWS Client NWS Server GRS RLS Server RLS Client GFarm Server GFarm Client GRS

22 Access to Metadata MD MD MD Application GriddLeS Name Server
Grid Buffer Client Local File Client Remote File Client Grid FTP Server Local File System Application GNS Client File Multiplexer Grid Buffer Server White, Read, etc GRS IO Network Monitor MD SRB Server SRB Client NWS Client NWS Server GRS MD RLS Server RLS Client MD GFarm Server GFarm Client GRS

23 Active Data (SI6)

24 Active Data General Circulation model Topography Database
Regional weather model Vegetation Database Emissions Inventory Photo-chemical pollution model Particle dispersion model Bushfire model

25 Active Data General Circulation model Topography Database
Regional weather model Vegetation Database Emissions Inventory Photo-chemical pollution model Particle dispersion model Bushfire model

26 Active Data General Circulation model Topography Database
Regional weather model Vegetation Database Emissions Inventory Photo-chemical pollution model Particle dispersion model Bushfire model

27 Active Data – File Fault
GriddLeS Name Server Grid Buffer Client Local File Client Remote File Client Grid FTP Server Local File System Application GNS Client File Multiplexer Grid Buffer Server White, Read, etc GRS IO Network Monitor MD SRB Server SRB Client NWS Client NWS Server GRS MD RLS Server RLS Client MD GFarm Server GFarm Client GRS

28 Active Data – Resourcing
Application White, Read, etc GRS IO Network Monitor Grid FTP Server Remote File Client MD SRB Server SRB Client NWS Client NWS Server Local File System Local File Client GRS MD RLS Server RLS Client Grid Buffer Server Grid Buffer Client MD GFarm Server GFarm Client GriddLeS Name Server GNS Client GRS File Multiplexer

29 Conclusion & Further work
Leverage existing workflow systems Flexible IO model allows dynamic decisions Developing pool of applications Requires some software modification!

30 Acknowledgements CSIRO Division of Atmospheric Sciences
John McGregor, Jack Katzfey and Martin Dix Funding & Support Australian Research Council Australian Government (DCITA, DEST) Hewlett Packard US National Science Foundation US Department of Energy

31 Questions?


Download ppt "Building the Computational Infrastructure for DART"

Similar presentations


Ads by Google