Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 The LDCM Grid Prototype Jeff Lubelczyk & Beth Weinstein January 4, 2005.

Similar presentations


Presentation on theme: "1 The LDCM Grid Prototype Jeff Lubelczyk & Beth Weinstein January 4, 2005."— Presentation transcript:

1 1 The LDCM Grid Prototype Jeff Lubelczyk & Beth Weinstein January 4, 2005

2 Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EDC 2 Prototype Introduction A Grid infrastructure allows scientists at resource- poor sites access to remote resource-rich sites Enables greater scientific research Maximizes existing resources Limits the expense of building new facilities The objective of the LDCM Grid Prototype (LGP) is to assess the applicability and effectiveness of a data grid to serve as the infrastructure for research scientists to generate virtual Landsat-like data products

3 Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EDC 3 LGP Key POCs Sponsors LDCM - Bill Ochs, Matt Schwaller Code 500/580 - Peter Hughes, Julie Loftis LGP Team members Jeff Lubelczyk (Lead) Gail McConaughy (SDS Lead Technologist) Beth Weinstein (Software Lead) Ben Kobler (Hardware, Networks) Eunice Eng (Software Dev, Data) Valerie Ward (Software Dev, Apps) Ananth Rao ([SGT] Software Arch/Dev, Grid Expertise) Brooks Davis ([Aerospace Corp] Globus/Grid Admin Expert) Glenn Zenker ([QSS] System Admin) USGS Stu Doescher (Mgmt) Chris Doescher (POC) Mike Neiers (Systems Support) Science Input Jeff Masek, 923 (Blender) Robert Wolfe, 922 (Blender, Data) Ed Masuoka, 922 (MODIS, Grid) LDCM Prototype Liaison Harper Prior (SAIC) CEOS grid working group (CA) Ken McDonald Yonsook Enloe [SGT]

4 Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EDC 4 Grid - A Layer of Abstraction Application User Client Grid Middleware Security (Authentication, Authorization) Resource Discovery Storage Management Scheduling and Job Management Grid Middleware packages the underlying infrastructure into defined APIs A common package is the Globus Toolkit –Open source, low cost, flexible solution Storage Compute West Coast/Platform A Storage Compute On Campus/Platform A Storage Compute East Coast/Platform C

5 Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EDC 5 What the current data grid provides Security Infrastructure Globus Gate Keeper Authentication (PKI) Authorization Resource Discovery Monitoring and Discovery Service (MDS) [LDAP like] Storage Management and Brokering Metadata catalogs Replica Location Service Allows use of logical file names –Physical locations are hidden Storage Resource Management GridFTP –Retrieves data using physical file names Data formats and subsetting Job Scheduling and Resource Allocation GRAM (Globus Resource Allocation Manager) -- Provides a single common API for requesting and using remote system resources Globus Tookit 2.4.2 Globus Gate keeper GRAMGridFTP Note: Portions of the Globus Toolkit used in Capability 1

6 Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EDC 6 High Level Schedule Major Milestones 12/03 - Prototype start 6/04 - Demo of Capability 1 grid infrastructure Demonstrate simple file transfers and remote application execution at multiple GSFC labs and USGS EDC Ready to build application on top of basic infrastructure 12/04 - Demo of Capability 1 Provide and demonstrate a grid infrastructure that enables a user program to access and process remote heterogeneous instrument data at multiple GSFC labs and USGS EDC 3/05 - Demo of Capability 2 grid infrastructure Demonstrate file transfers and remote application execution at multiple GSFC labs, USGS EDC, and ARC/GSFC commodity resources to assess scaleability 6/05 - Demo of Capability 2 Enable the data fusion (blender) algorithm to obtain datasets, execute, and store the results on any resource within the Virtual Organization (GSFC labs, USGS EDC, ARC/GSFC)

7 Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EDC 7 The LDCM Demonstration … Prepares two heterogeneous data sets at different remote locations for like footprint comparison from a science users home site The MODIS Reprojection Tool (MRT) serves as our typical science application developed at the science users site (Building 32 in demo) mrtmosaic and resample (subset and reproject) Operates on MODIS and LEDAPS (Landsat) surface reflectance scenes Data distributed at remote facilities Building 23 (MODIS scenes) USGS/EDC (LEDAPS scenes) Solves a realistic scientific scenario using grid- enabled resources

8 Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EDC 8 Capability 1 Virtual Organization GSFC SEN 1Gbps Backbone edclxs66 USGS EDC Sioux Falls, SD LGP23 GSFC B23/W316 LGP32 Science User_1 GSFC B32/C101 Capability 1 Installed Equipment Dell/Linux Server Dual Xeon Processors 8 GB Memory 438GB Disk Storage Dell/Linux Server Quad Xeon Processors 16 GB Memory 438GB Disk Storage Dell/Linux Server Dual Xeon Processors 8 GB Memory 438GB Disk Storage MAX (College Park) OC48, 2.4Gbps Backbone USGS/EDC 1 Gbps Backbone vBNS+ (Chicago) OC48, 2.4Gbps Backbone 1 Gbps SEN: Science and Engineering Network MAX: Mid-Atlantic Crossroads DREN: Defense Research and Engineering Network vBNS+: Very high Performance Network Service OC12, 622 Mbps Shared with DREN 1 Gbps USGS/EDC GSFC

9 Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EDC 9 A Typical Science Application MODIS Reprojection Tool (MRT) Software suite distributed by LP DAAC Applications used include mrtmosaic.exe –Create 1 scene from adjacent scenes resample.exe (Subset) –Geographic –Band/Channel –Projection Each operate on MODIS and LEDAPS scene data Visualization Tool -- Software to display scenes HDFLook

10 Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EDC 10 Data MODIS - MOD09GHK MODIS/Terra Surface Reflectance Daily L2G Global 500m SIN Grid V004 Sinusoidal projection 7 Scenes Washington D.C. (H = 11,12, V = 5) Pacific NW(H = 9, V = 4) Obtained from LP DAAC ECS Data Pool LEDAPS - L7ESR LEDAPS Landsat-7 Corrected Surface Reflectance UTM projection 2 Scenes Washington D.C. (Path = 15, Row = 33) Pacific NW areas(Path = 48, Row = 26) Obtained from LEDAPS website Both compatible with the MRT All like-area scenes are as temporally coincident as possible

11 Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EDC 11 4 Scenarios to Illustrate Grid Flexibility Data Services (Move application to data) Transfer the MRT to the remote hosts and process the data remotely, sending the results back to the science facility Batch Execution (Parallel computing) Demonstrate the execution of the MRT in a parallel batch environment Local Processing (User prefers to process locally) Transfer the selected data sets to the science user site for processing Third Party Processing (No local resource usage) Perform a third party data transfer and process the data remotely Grid flexibility maximizes science resources

12 Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EDC 12 How we make this happen Command line interface to execute the LDCM Grid Prototype (LGP) driver program The LGP Driver Manages the execution of a specified application Transfers the application and data as needed Uses configuration files as inputs to describe: The executable and its location The data sets and their location The location of the resulting output file(s)

13 Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EDC 13 LGP Driver (Java 1.4.2) Data Capability 1 Software Framework LDCM Grid Prototype (LGP) Driver Provides a generic software system architecture based on Globus services LGP Driver high-level services Session Manager – grid session initiation and user authentication using proxy certificates Data Manager – file transfer using GridFTP Job Manager – job submission and status in a grid environment Utilizes the Java Commodity Grid Kits (CoGs) Supplies a layer of abstraction from underlying Globus services Simplifies the programming interface Java CoG 1.1 SessionJob Globus Tookit 2.4.2 Globus Gate keeper GRAMGridFTP

14 Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EDC 14 Demo Scenario 1: Input Data LEDAPS - L7ESRMODIS - MOD09GHK

15 Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EDC 15 data.txt LDCM VO Demo Scenario 1: Data Services USGS EDC LEDAPS GSFC B23 MOD09GHK 4) Move mrtmosaic.exe from B32 to B23 6) Move resample.exe from B32 to B23 5) Run mrtmosaic.exe with 2 MOD09GHK files 7) Run resample.exe on mrtmosaic output 1) Move resample.exe from B32 to EDC 2) Run resample.ex e on 1 LEDAPS file 3) Move LEDAPS resample output from EDC to B32 GSFC B32 MRT Grid Node GT 2.4.3 GridFTP Server 8) Move mrtmosaic resample output from B23 to B32 9) Display LEDAPS and MODIS resampled output using HDFLook

16 Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EDC 16 MRT MODIS Mosaic resample 3) Move LEDAPS resample output from EDC to B32 2) Run resample on 1 LEDAPS file mrtmosaic 1) Move resample from B32 to EDC 8) Move mrtmosaic resample output from B23 to B32 data.txt LDCM VO Demo Scenario 1: Data Services GSFC B23 MOD09GHK 4) Move mrtmosaic from B32 to B23 5) Run mrtmosaic with 2 MOD09GHK files MOD 09GHK MODIS Mosaic MOD 09GHK mrtmosaic USGS EDC LEDAPS MODIS Mosaic LEDAPS Subset MODIS Mosaic Subset MRT MODIS Mosaic resample MRT MODIS Mosaic resample GSFC B32 MRT LEDAPS – L7ESR MODIS – MOD09GHK 7) Run resample on mrtmosaic output 6) Move resample from B32 to B23 9) Display LEDAPS and MODIS resampled output using HDFLook

17 Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EDC 17 Capability 1 Task Requirements Completed Science user is at B32 and the data is at EDC and B23 2 - 3 instrument types 10 - 20 scenes Spatially and temporally coincident data Algorithm must run on B23, B32, and EDC Command-line invocation from client side Perform distributed computation Share distributed data Verified by executing the 4 scenarios

18 Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EDC 18 Next Steps -- Capability 2 Capability 2 (C2) Integrate with the Blender team Collaborate to identify meaningful C2 data sets Demonstrate blender algorithm Assess Grid performance Expand the VO to include ARC supercomputing if available Performance Goals –Demonstrate the processing of 1 days worth of data in the grid environment (~250 scenes) Grid Workflow -- increase automation

19 Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EDC 19 Grid Workflow Our current capabilities allow us to submit jobs only to a specified resource The goal of the next phase will be to provide the ability to submit a job to the Grid Virtual Organization Grid resource management Scheduling policy Maximize grid resources Manage sub tasks Reliable job completion Checkpointing and job migration Leverage wasted cpu cycles Next step: Examine Condor and Pegasus open source Globus toolkit workflow extensions

20 Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EDC 20 Univ. of MD Grid Workflow Engine Reflectance Pdts USGS/EDC VO Grid Operators Interface Concept of a Future Grid Architecture - LDCM example DAAC Landsat Data MODIS VIRS & other Research Archive Data Product Distribution Existing C1 Grid Infrastructure Proposed C2 Grid Infrastructure Future Grid Components Research Community VO Grid Config VO Grid Resource Status Failure Recovery Science Product Interface Data Manager Data Node/Manager NASA/GSFC Job Manager Grid Resource Manager Data Node/Manager Session Manager Product Status & Recovery Overall V0 Grid Management Scientist Data Site Research Community Data Server Product Def.

21 Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EDC 21 Acronym List FTPFile Transfer Protocol LDCMLandsat Data Continuity Mission LEDAPSLandsat Ecosystem Disturbance Analysis Adaptive Processing System LGPLDCM Grid Prototype LP DAACLand Processes Distributed Active Archive Center MODISModerate Resolution Imaging Spectroradiometer MRTMODIS Reprojection Tool

22 Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EDC 22 Condor, Condor-G, DAGman Condor addresses many workflow challenges for Grid applications. Managing sets of subtasks Getting the tasks done reliably and efficiently Managing computational resources Similar to a distributed batch processing system, but with some interesting twists. Scheduling policy ClassAds DAGman Checkpointing and Migration Grid-aware & Grid-enabled Flocking (linking pools of resources) & Glide-ins See http://www.cs.wisc.edu/condor/ for more detailshttp://www.cs.wisc.edu/condor/ Chart author: lee liming argonne national laboratory

23 Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EDC 23 Pegasus Workflow Transformation Converts Abstract Workflow (AW) into Concrete Workflow (CW). Uses Metadata to convert user request to logical data sources Obtains AW from Chimera Uses replication data to locate physical files Delivers CW to DAGman Executes using Condor Publishes new replication and derivation data in RLS and Chimera (optional) See http://pegasus.isi.edu/ for detailshttp://pegasus.isi.edu/ Chimera Virtual Data Catalog Replica Location Service Metadata Catalog Storage System Compute Server DAGman Condor t Chart author: lee liming argonne national laboratory


Download ppt "1 The LDCM Grid Prototype Jeff Lubelczyk & Beth Weinstein January 4, 2005."

Similar presentations


Ads by Google