Computing and Data Infrastructure for Large-Scale Science Deploying Production Grids: NASA’s IPG and DOE’s Science Grid William E. Johnston

Computing and Data Infrastructure for Large-Scale Science Deploying Production Grids: NASA’s IPG and DOE’s Science Grid William E. Johnston (wejohnston@lbl.gov) Lawrence Berkeley National Lab and NASA Ames www.ipg.nasa.govdoesciencegrid.org

Motivation for Science Grids Large-scale science and research engineering are done through the interaction of geographically and organizationally dispersed –people –heterogeneous computing systems –data management systems –instruments The overall motivation for “Grids” is to facilitate the routine interactions of these resources in order to support collaborative science and engineering that is –widely distributed –multi-institutional –data intensive –collaborative

Lessons Learned for Building Large-Scale Grids Four main points: –deploying operational infrastructure –cross site trust –dealing with Grid technology scaling issues –listening to the users

Basic Grid Services There is a set of basic functions that all Grids must have in order to be called a Grid (the “neck of the hourglass” of Grids) –the Grid Security Infrastructure (“GSI” - the tools and libraries that provide Grid security) –the Grid Information Service (“GIS” - the basic resource discovery mechanism) –Grid job initiator mechanism –A basic data management mechanism such as GridFTP –Grid event mechanism (however, this is still under development) Most of the deployment issues relate to these

Lessons Learned for Building Large-Scale Grids Like networks, successful Grids involve almost as much sociology as technology, and therefore establishing good working relationships among all of the people involved is essential.

Deploying Operational Infrastructure Establish an Engineering Working Group that involves the Grid deployment teams at each site –schedule weekly meetings / telecons –involve Globus experts in these meetings –establish an EngWG archived email list Set up liaisons with the systems administrators for all systems that will be involved (computation and storage) –this is especially important if the resources that you expect to incorporate if your Grid are not in your organization not in your part of your organization

Deploying Operational Infrastructure Identify the computing and storage resources to be incorporated into the Grid –be sensitive to the fact that opening up systems to Grid users may turn lightly or moderately loaded systems into heavily loaded systems –batch schedulers may have to be installed on systems that previously did not use them in order to manage the increased load –carefully consider the issue of co-scheduling! many potential Grid applications need this only a few available schedulers provide it (e.g. PBSPro) this is an important issue for building distributed systems

Grid Information System Plan for a GIS/GIIS sever at each distinct site with significant resources –this is important in order to avoid single points of failure Structure of the GIIS is one of the basic scaling issues for Grids

Cross Site Trust Set up or identify a CA to issue Grid user identity certificates – the basis of the GSI –the basic trust management mechanism The Certificate Policy Statement codifies how you will run your CA and to whom you will issue certificates –cross site trust is based on this Don’t try and invent your own CPS! –Look at ESnet CP (envisage.es.net) and Grid Forum CP

Defining / Understanding the Extent of “Your” Grid The “boundaries” of a Grid are primarily determined by two factors: –what CAs you trust this is explicitly configured in each Globus environment however there is no guarantee that every resource in what you think is “your” Grid trusts the same set of CAs – i.e. each resource potentially has a different space of users in fact, this will be the norm if the resources are involved in multiple virtual organizations as they frequently are in the high energy physics experiment communities –how you scope the searching of the GIS/GIISs this depends on the model that you choose for structuring your directory services

Maintaining Local Control Establish the conventions for the Globus mapfile –maps user Grid identities to system UIDs –this is the basic local control / authorization mechanism for each individual compute and storage platform

Take Good Care of the Users as Early as Possible  Establish a Grid/Globus application specialist group –they should be running sample jobs as soon as the prototype-production system is operational –they should serve as the interface between users and the Globus system administrators to solve Globus related application problems  Identify early users and have the Grid/Globus application specialists assist them in getting jobs running on the Grid

For the Full Talk grid.lbl.gov/~wej/Grids

Computing and Data Infrastructure for Large-Scale Science Deploying Production Grids: NASA’s IPG and DOE’s Science Grid William E. Johnston

Similar presentations

Presentation on theme: "Computing and Data Infrastructure for Large-Scale Science Deploying Production Grids: NASA’s IPG and DOE’s Science Grid William E. Johnston"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Computing and Data Infrastructure for Large-Scale Science Deploying Production Grids: NASA’s IPG and DOE’s Science Grid William E. Johnston

Similar presentations

Presentation on theme: "Computing and Data Infrastructure for Large-Scale Science Deploying Production Grids: NASA’s IPG and DOE’s Science Grid William E. Johnston"— Presentation transcript:

Similar presentations

About project

Feedback