Presentation is loading. Please wait.

Presentation is loading. Please wait.

O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Deployment, Deployment, Deployment March, 2002 Randy Burris Center for Computational Sciences.

Similar presentations


Presentation on theme: "O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Deployment, Deployment, Deployment March, 2002 Randy Burris Center for Computational Sciences."— Presentation transcript:

1 O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Deployment, Deployment, Deployment March, 2002 Randy Burris Center for Computational Sciences Oak Ridge National Laboratory

2 OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY Overview of this presentation  Our goal: let scientists (our customers) do science without worrying about their computer environment  Our clientele:  Four disciplines (climate, astrophysics, genomics and proteomics, high-energy physics)  National labs and universities  Using resources all over the country  Residing all over the place  We must deploy the result (“Deploy or die”)

3 OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY Well, OK. But…deploy what?  Where are the commonalities in our space?  Security and trust – nonexistent to extreme  Network connectivity – dialup to OC12  File sizes – bytes to terabytes  File location – local unit to partitions around the world  Visualization – static to dynamic real-time  And so on.  We can’t do it all.  So exactly what are we going to deploy?  And how should we proceed?

4 OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY Achieving successful deployment  For each of the 4 projects, define basic steps:  Define target environment(s)  Characterize successful deployment (in each)  Prototype in a close-to-production environment  Deploy in production  In parallel with the above:  Produce documentation at every step  Develop tools for support staff  Start now.

5 OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY Step 1: Define target environment(s)  We cannot support all combinations.  Security – {DCE, Kerberos, PKI, gss}, firewalls, …  Compute resource – MPP, cluster, workstation,…  User platform – MPP, cluster, Unix/linux, Windows, …  Storage Storage resource – HPSS, PVFS, … ? User API for access to data NetCDF, HDF5, both, something else? HRM, pftp, GridFTP, hsi, …  Network WAN – GigE/jumbo, FastE, OC12, OC3, ESnet, hops, … LAN – GigE, FastE, iSCSI, FibreChannel, …  Visualization – CAVE, workstations, Palm Pilots, …  We will have to choose.

6 OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY Step 2: Characterize successful deployment  A. Correct operation in the security environment  B. Optimized performance in the target network environment  C. Rugged infrastructure  D. Unobtrusive infrastructure  E. Thorough documentation for users and support staff

7 OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY Step 2: Characterize A: Security  I believe we must define the environment into which we intend to deploy.  Starting now  Because it will take a long time and will almost certainly require development.  Questions to which we need answers:  Are we concerned with DOE sites or DOE+NSF+…?  Are there circumstances where clear-text passwords are OK? Where no security is OK?  Must we support authentication in pki, gsi, dce and/or Kerberos?  Will all of our infrastructure work with firewalls at one or both ends of a transfer? Whose firewalls, what filtering parameters, …

8 OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY Step 2: Characterize B: Network  On what network are the end nodes?  What is our target environment – ESnet, ESnet+Internet2, Grid, www, …  What throughput is needed for effective science?

9 OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY Step 2: Characterize C: Rugged  Must not crash (of course)  Must be in service when needed  Must be secure  Must have a support plan (which does not require an army of support people)  Must have trouble-resolution mechanism and resources  Must be survivable over normal maintenance  System software patches and upgrades  Equipment upgrades

10 OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY Step 2: Characterize D: Unobtrusive  User should need minimal knowledge  The deeper the infrastructure, the less the user should need to know  User should be protected from mistakes  Try not to let the user screw things up  Documentation and real-time warnings  Effective defaults

11 OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY Step 2: Characterize E: Documentation  White papers to inform larger community  For users: how-to-use documents  For system-admin staff:  How to install, debug, maintain, troubleshoot  For user-support staff  How to troubleshoot  Tuning knobs  For programmers  Overview documents to give context  Correct interface documents  Correct documentation for all appropriate platforms

12 OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY Step 3: Prototype in close-to- production environment  Example of deployment approach on Probe:  Deploy early prototypes in Oak Ridge and NERSC Use Probe, Probe HPSS, Production HPSS and supercomputers Use (and require) documented code and procedures  As development progresses, evaluate and address deployment issues such as security, network performance, system-admin documentation  As prototype becomes more robust, migrate more functions to Oak Ridge and NERSC production environments  Continue to evaluate and address deployment issues that now include user and user-support documentation  Iterate as necessary  When this sequence is done, you’re in production.

13 OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY Overview of ORNL Architecture, March 2002 Stingray RS/6000 S80 Marlin RS/6000 H70 STK Library 220 GB SCSI RAID 360 GB Sun FibreChannel RAID Origin 2000 Reality Monster STK Library IBM and Compaq Supercomputers and 64-node linux cluster Probe Production Gigabit Ethernet (jumbo frames) Production HPSS Probe HPSS Disk Cache CAVE Other Probe Nodes Disk Cache 360 GB FibreChannel RAID 600 GB SCSI JBOD External Esnet Router

14 OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY Example: How Terascale Supernova Initiative could be prototyped Stingray RS/6000 S80 Marlin RS/6000 H70 Origin 2000 Reality Monster External Esnet Router IBM and Compaq Supercomputers Probe Production Production HPSS Probe HPSS CAVE Other Probe Nodes Bulk storage Data reduction, pre- viz manipulation Rendering

15 OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY We should start right away:  Select initial, intermediate and ultimate target environments  Including supported applications, platforms, security and target network  Describe in a white paper  Seek common elements in supported applications  Develop a deployment plan for common elements  Write white paper describing deployment plan Specify our approach to deploying support for those elements Identifying un-met requirements, and how to remedy Describing approach to ruggedness and unobtrusiveness  Address non-common elements in supported applications  Seek to minimize their impact  Specify our approach to deploying support for those elements  Develop deployment plans and describe them  Write white paper describing deployment plan

16 O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY DISCUSSION?

17 OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY Serious questions for early resolution  What is the role of HPSS?  HPSS will never be pervasive – expensive.  Treat HPSS sites as primary repositories?  Which file transfer protocol(s) do we support?  GridFTP, pftp, his

18 OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY Probe – “Place to be” Overview of ORNL Probe Cell, February 2002 Stingray RS/6000 S80 Marlin RS/6000 H70 STK Silo 200 GB SCSI RAID Disks Sun E250 Compaq DS20 360 GB Sun FibreChannel Disks 360 GB STK FibreChannel Disks FibreChannel Switch GSN Switch Origin 2000 Reality Monster RS/6000 B80 External Esnet Router To NERSC Probe Sun Ultra 10 STK Silo IBM and Compaq Supercomputers 3494 Library GSN Bridge RS/6000 44P-170 Probe Production Sun E450 IBM F50 SGI Origin 200 Gigabit Ethernet Intel Dual P-III Linux

19 OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY Backup slide 

20 OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY Technology on hand and available  Software  HPSS (unlimited instantiations) and HPSS development license  HDF5, NetCDF  R, ggobi  gcc suite  C on Solaris, AIX, IRIX and Tru64  Fortran on AIX  Oracle 8i and DB2 (current developer’s editions) on AIX  Globus 2.0/AIX and Solaris  HRM  Inter-HPSS hsi application  OPNET modeling product  MPI/IO testbed  18 nodes – IBM/AIX, Sun/Solaris, SGI/IRIX, Compaq/Tru64  GRID nodes (Sun/Solaris, IBM/AIX, possibly linux)  ESnet III OC12 externally, GigE jumbo and Fast Ethernet internally  Web100 and NET100 participation


Download ppt "O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Deployment, Deployment, Deployment March, 2002 Randy Burris Center for Computational Sciences."

Similar presentations


Ads by Google