Presentation is loading. Please wait.

Presentation is loading. Please wait.

VGrADS and GridSolve Asim YarKhan Jack Dongarra, Zhiao Shi, Fengguang Song Innovative Computing Laboratory University of Tennessee VGrADS Workshop – September.

Similar presentations


Presentation on theme: "VGrADS and GridSolve Asim YarKhan Jack Dongarra, Zhiao Shi, Fengguang Song Innovative Computing Laboratory University of Tennessee VGrADS Workshop – September."— Presentation transcript:

1 VGrADS and GridSolve Asim YarKhan Jack Dongarra, Zhiao Shi, Fengguang Song Innovative Computing Laboratory University of Tennessee VGrADS Workshop – September 2005

2 GridSolve Architecture Agent server list server data server request server result Client [x,y,z,info] = gridsolve(‘dgesv’, A, B) Resource discovery Scheduling Load balancing Fault tolerance

3 GridSolve philosophy Simple-to-use access to complicated software libraries Access to better hardware and software Selection of best machine in your grid to service user request Heterogeneous computing Portability —non-portable code can be run from a client on any architecture as long as there is a server provisioned with the code Legacy codes easily wrapped into services

4 GridSolve Usage with VGrADS Simple-to-use access to complicated software libraries, with no knowledge of grid based computing. Selection of best machines in your grid to service user request Portability —Non-portable code can be run from a client on an architecture as long as there is a server provisioned with the code Legacy codes easily wrapped into services Plug into VGrADS Framework Using the vgES for resource selection and launching of application: —Integrated performance information —Integrated monitoring —Fault prediction —Integrating the software and resource information repositories

5 VGrADS/GridSolve Architecture Agent request Client [x,y,z,info] = gridsolve(‘foo”, A, B) Service Catalog Service Catalog data result vgDL Resource List Software Repository query software location Transfer Start server register Server info

6 Agent Agent is specific for the client —Initially agent contains no resource information; obtained from vgES Agent requests information from the service catalog about the possible services and their complexity in order to estimate the resources required (vgDL) For each service request —Estimate resources required from problem complexity –vgDL spec: vgdl = Clusterof [N]; node = {node.memory > 500MB, node.speed > 2000}; –vgid = vgCreateVG(vgserver, vgdl, 1000, ns-server-script) —Return the set of resources to the client —The ns-server-script launched by the vgES fetches and deploys needed services on selected VGrADS resources

7 Service Provider Compile a service —Use standard GridSolve —Write service interface description, compile, get binary Send binary to repository (e.g. IBP) Inform a service notification system or update catalog —This catalog can then be used by the agent to fetch and install the actual service binaries

8 Fault Tolerant GridSolve/VGrADS Demo Demonstrate fault tolerant PCG application running on dynamic grid resources —Diskless checkpointing of changing data (handled by application) —For surviving k failures, you need k additional checkpoint processes —Very low overhead for checkpoints End-to-end demonstration of VGrADS approach —Easy Matlab frontend —GridSolve provides service description, argument marshalling —VGrADS provides resource location, allocation and management —FT-MPI provides fault notification and a rebuild of the MPI environment —Fault tolerant PCG implementation provides a checkpoint/restart version of a parallel PCG

9 PCG: With Different MPI Implementations http://icl.cs.utk.edu/ft-mpi/ NProcsLAM- 7.0.4 MPICH2 -1.0 FT-MPIFT-MPI ckpt /2000 iters FT-MPI exit 1 proc @10000 iters 165K15522.5536.3517.8518.9521.7 329K30532.9542.9532.2533.3537.5 658K60545.5553.0546.5547.8554.2 1317K120674.3624.4622.9624.4637.1 64 dual-processor 2.4 GHz AMD Opteron nodes Nodes are connected with a Gigabit Ethernet. bcsstk17: The size is: 10974 x 10974 Non-zeros: 428650 Sparsity: 39 non-zeros per row on average Source: Linear equation from elevated pressure vessel

10 Question: vgDL for pcg Application Too many ways to express the vgDL —Give me best set of processors within some cluster that have at least M free memory all together for an application which will perform C floating point operations —Use a time constraint to help –Problem must be solved in T time –Still needs a nprocs_req Example –sample_cluster = ClusterOf(node)[%d] { node = [ ( (Memory_Avail>%d) && (Clock>%d) && (Processor==Pentium) && (OS==Linux) ) ] [ Rank=Memory_Avail] } ", nproc_req, (input_size+output_size)*2, total_mflop/time_constraint/nproc_req

11 Work in Fault Tolerance Determine appropriate checkpoint intervals and number of checkpoint processors —From historical information and statistical prediction —From monitoring —Use for task migration if problems emerge Other fault tolerance techniques —Checkpoint processors participate in computation —Algorithm based fault tolerance Generalize to library of fault tolerant routines —Conjugate gradient solver —Matrix multiply

12 Work in GridSolve/VGrADS Better construction of vgDL from service complexity information GridSolve monitor not yet integrated vgES monitor to be integrated Need to add more sites —Currently working on adding UCSD CSAG cluster

13 The End Questions?


Download ppt "VGrADS and GridSolve Asim YarKhan Jack Dongarra, Zhiao Shi, Fengguang Song Innovative Computing Laboratory University of Tennessee VGrADS Workshop – September."

Similar presentations


Ads by Google