Presentation is loading. Please wait.

Presentation is loading. Please wait.

TeraGrid: A Powerful, Parallel, Fast, and Free Computational Resource A Bytes ‘n Bites Presentation Michael J. Reale and Dr. James Wolf Information Technology.

Similar presentations

Presentation on theme: "TeraGrid: A Powerful, Parallel, Fast, and Free Computational Resource A Bytes ‘n Bites Presentation Michael J. Reale and Dr. James Wolf Information Technology."— Presentation transcript:

1 TeraGrid: A Powerful, Parallel, Fast, and Free Computational Resource A Bytes ‘n Bites Presentation Michael J. Reale and Dr. James Wolf Information Technology Center 10/14/2010

2 What is TeraGrid? TeraGrid is an open scientific discovery infrastructure combining leadership class resources at eleven partner sites to create an integrated, persistent computational resource. It is the world's largest, most comprehensive distributed cyberinfrastructure for open scientific research. U.S. researchers and educators may request access to any or all of TeraGrid’s resources at no cost to their research project.

3 What resources does it provide? Massively parallel computing power ▫More than a petaflop (10 15 ) of total computing capability Storage space ▫More than 30 petabytes of online and archival data storage, with rapid access and retrieval over high- performance networks A large selection of software installed and ready for use Advanced Support for TeraGrid Applications (ASTA)

4 How do I get started? Contact us and we’ll help you get a Startup Allocation! ▫ ▫ t.cgi t.cgi Also, you may want to check: ▫ #howstart #howstart

5 Outline Access Computation Queues and Wait Times Data Visualization Educational Allocations Training and Help

6 “Speak, friend, and enter…” -- J.R.R. Tolkien, The Fellowship of the Ring

7 Logging into TeraGrid There are three ways to log in that work with all TeraGrid sites: ▫The TeraGrid Portal ( ▫Globus Toolkit Software ▫GSI-SSHTerm All of the above use Single Sign-On (SSO)

8 The TeraGrid Portal Log into http://portal.teragrid.org Go to “My TeraGrid”  “Accounts” Click the “Login” link for the site you wish to connect to The Java SSH terminal will open in the browser

9 Globus Toolkit Software: Compiling from Source Get the source tarball (4.0.8 worked for me; the latest version, 5.0.2 did not) from ▫wget source-installer.tar.bz2 Unpack it (this will take a bit) ▫tar -xvf gt4.0.8-all-source-installer.tar.bz2 ▫cd gt*-installer Compile and install to $HOME/globus (this will also take a bit) ▫./configure --prefix=$HOME/globus ▫make gsi-myproxy gsi-openssh gridftp ▫make install Add to your path ▫export PATH=$HOME/globus/bin:$PATH You can get rid of the source directory and tarball when you’re done. The installed software takes about 91MB of space.

10 Globus Toolkit Software: TeraGrid Client Toolkit Installing the entire Globus Toolkit can be a bit of an ordeal. So, if you are running Linux or Mac, you can install the TeraGrid Client Toolkit, which contains a subset of the Globus Toolkit software needed to login and work with TeraGrid: ▫ Untar the file, and run the following: ▫cd teragrid-cleint-0.4.0 ▫./install-teragrid-client WARNING: It is not supported on all platforms, but you may be able to trick it into installing (see Appendix A).

11 Globus Toolkit Software: Logging In Let us assume that the Globus Toolkit is installed in $HOME/globus. The script on the next slide will set up your environment for logging into TeraGrid sites. If you named this script “”, you would call the following with your TeraGrid portal username: ▫source After entering your TeraGrid portal password, you should be able to log into any site using gsissh: ▫gsissh Note: depending on how you installed the Globus Toolkit, you may need to add the library path for Glite to LD_LIBRARY_PATH as well.

12 Globus Toolkit Software: Login Script #!/bin/bash GLOBUS_LOCATION=$HOME/globus export LD_LIBRARY_PATH=$GLOBUS_LOCATION/lib MYPROXY_SERVER_PORT=7514 export GLOBUS_LOCATION MYPROXY_SERVER MYPROXY_SERVER_PORT. $GLOBUS_LOCATION/etc/ grid-proxy-destroy unset X509_CERT_DIR unset X509_USER_CERT unset X509_USER_KEY unset X509_USER_PROXY myproxy-logon -T -l $1 export X509_CERT_DIR=$HOME/.globus/certificates export X509_USER_CERT=`(grid-proxy-info -path)` export X509_USER_KEY=`(grid-proxy-info -path)` export X509_USER_PROXY=`(grid-proxy-info -path)`

13 GSI-SSHTerm Go to https://security.ncsa.illinois.ed u/gsi-sshterm/ https://security.ncsa.illinois.ed u/gsi-sshterm/ Download the “Java Web Start Version” (make sure to pick the one using TeraGrid credentials) The program provides both an SSH terminal and an SFTP tool. Further instructions for use can be found here: mediawiki/images/a/a4/HowT oUseGSISSHApplet.pdf mediawiki/images/a/a4/HowT oUseGSISSHApplet.pdf

14 GSI-SSHTerm: Login Problems Make sure there are no trailing spaces in the address box when connecting. Otherwise, the program silently fails.

15 “On two occasions I have been asked, ‘Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?’ I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.” -- Charles Babbage

16 Computational Resource Types Regular ▫SMP ▫MPP ▫Cluster Visualization Special New

17 SMP (Symmetric Multiprocessing) Characteristics: ▫Multiple CPUs ▫Same cabinet ▫Share the same memory Good for large memory jobs (even serial ones) ▫Caveats:  Need to request enough nodes to get the desired memory  Memory access outside node is slower Examples: ▫NCSA – Cobalt  SGI Altix system  Primarily for large shared memory application  Will be replace by a new system called Ember (SGI UV System) ▫PSC – Pople  SGI Altix system  Primarily for shared memory and hybrid architectures ▫IU – Quarry  Used for web services hosting and Science Gateways

18 MPP (Massively Parallel Processing) Characteristics: ▫Uses up to thousands of processors ▫Same cabinet ▫Distributed shared memory Good for jobs that need high-performance but also need lots and lots of cores Examples: ▫IU – Big Red  IBM e1350 PowerPC; distributed, shared-memory cluster intended to run parallel as well as serial applications ▫NCAR – Frost  IBM BlueGene/L – 8,192 processors; highly scalable platform for developing, testing, and running parallel MPI applications ▫NICS – Kraken  Cray XT5 – 66,048 processors; intended for highly scalable applications (minimum startup allocation request is 100,000 SUs) ▫TACC – Ranger  Sun Constellation – 62,976 processors; intended for codes scalable to thousands of cores

19 Cluster Characteristics: ▫No real cap on the number of processors/nodes ▫Different physical machines (may also have heterogeneous composition of nodes) ▫Memory not shared Good for massively parallel jobs with little inter-process communication Examples: ▫NCSA – Abe (Part of ASQL)  Dell PowerEdge 1955 – 9,600 processors; intended for highly parallel, scalable applications ▫LONI – QueenBee (Part of ASQL)  Dell PowerEdge 1950 – 5,344 processors; intended for parallel applications scalable up to 5,344 cores ▫Purdue – Steele (Part of ASQL)  Dell PowerEdge 1950 – 7,144 processors (1600 are available in the longest running production queue)  Suited for a wide range of serial and small/medium parallel jobs  Longest Wall Time (720 hours) ▫TACC – Lonestar (Part of ASQL)  Dell PowerEdge 1955 – 5,840 processors; intended primarily for applications scalable up to 4,096 cores

20 Visualization Intended for visualization and graphical applications Examples: ▫TACC – Longhorn  Dell/NVIDIA Visualization and Data Analysis Cluster  A hybrid CPU/GPU system designed for remote, interactive visualization and data analysis, but it also supports production, compute-intensive calculations on both the CPUs and GPUs via off-hour queues ▫TACC – Spur  Sun Visualization Cluster  128 compute cores / 32 NVIDIA FX5600 GPUs  Intended for serial and parallel visualization applications that take advantage of large per-node memory, multiple computing cores, and multiple graphics processors ▫NICS – Nautilus  Visualization and Analysis  SGI UltraViolet  1024 cores (Intel Nehalem)  4 TB global shared memory  1 PB file system  Pre-production, but accepting allocations

21 Special NCSA – Lincoln ▫Dell PowerEdge 1950 / NVIDIA Tesla S1070 ▫Intended for applications that can make use of heterogeneous processors (CPU and GPU) Purdue – Condor (high throughput) ▫Pool of over 27,000+ processors ▫Various architectures and operating systems ▫Designed for high-throughput computing and is excellent for parameter sweeps, Monte Carlo simulations, and other serial applications SDSC – DASH ▫Intel Nehalem – 544 processors ▫vSMP (virtual shared memory) software from ScaleMP that aggregates memory across 16 nodes. This allows applications to address 768GB of memory ▫4 TB of Flash memory configurable as fast file I/O subsystem or extended fast virtual memory swap space.

22 New FutureGrid: A grid testbed (Indiana University and many other partners) ▫A collections of different systems ▫Focus on virtual machines ▫Not currently allocated via standard TeraGrid POPS allocation system ▫More info:  MATLAB on the TeraGrid ▫New system from Cornell ▫Not currently allocated via standard TeraGrid POPS allocation system ▫Need a local copy of MATLAB and the MATLAB Parallel Computing Toolbox ▫More info:  ▫How to request access: 

23 Resource Selection Spreadsheet List resource sites, queues, job statistics, and links to manuals ▫ /7/70/ResourceInfo_SelectionAid.xls /7/70/ResourceInfo_SelectionAid.xls

24 More information… The preceding slides are largely from Kim Dillman’s presentation given at TG’10: ▫ /1/12/ComputeSession-ChampionsTG10.pdf

25 “PATIENCE, n. A minor form of despair, disguised as a virtue.” -- Ambrose Bierce

26 Queue Policies Many different factors can limit and/or prioritize jobs. Possible limits: ▫# of jobs queued or running per user ▫# of jobs queued or running per project allocation ▫# of total nodes per user Priorities can be affected by: ▫Wall time request ▫# of CPUs (sometimes want more, sometimes less) ▫Time already spent waiting in the queue ▫Number of jobs previously run by the user

27 Some useful PBS Commands List queues defined: ▫qstat –Q Get details about the queues: ▫qstat –Qf List jobs in queue: ▫qstat –a List only your jobs: ▫qstat –u List details of a job: ▫qstat –f Show estimated job start (on some systems): ▫showstart

28 Karnak Prediction Service Gives the following: ▫System Information  # of running jobs, waiting jobs, and used processors  Information about status of queues ▫Wait Time Predictions ▫Start Time Predictions Warnings: ▫Not real-time information ▫Pages don’t display in IE for some reason (Firefox works)

29 “To know that we know what we know, and to know that we do not know what we do not know, that is true knowledge.” -- Copernicus

30 Transferring Data For large files, it’s best to use something that supports the GridFTP protocol: ▫Uberftp ▫Globus Toolkit functions (globus-url-copy) GridFTP supports: ▫Multithreaded transfers ▫Striping over several hosts ▫3 rd party transfers ▫Transfer rates as high as 750MB/s (network- permitting)

31 globus-url-copy example  export POPLE_HOME=`gsissh ‘cd $HOME; pwd’`  export RANGER_HOME=`gsissh ‘cd $HOME; pwd’`  globus-url-copy -vb -fast -stripe -tcp-bs 8M -p 8 file://$POPLE_HOME/gaussian/water.cube gsi$RANGER_HOME/water2.cube

32 FTP on TeraGrid It is now possible to use an FTP client to connect directly to TeraGrid resources using the following address: ▫ftp://vfs.teragrid.org WARNING: This is regular FTP, NOT SFTP.

33 Storage Options: General HOME directory ▫Permanent (non-purged) ▫Not very big ▫Visible to all nodes in a cluster SCRATCH space ▫Temporary ▫Shared among other users ▫Fairly large ▫Visible to all nodes in a cluster Parallel file systems ▫Temporary ▫Fast ▫Large ▫Visible to all nodes in a cluster Archival (mass) storage ▫Permanent (can be replicated) ▫Slow ▫Quite large ▫Visible to all TeraGrid sites A complete list of the filesystems available at each site can be found here:

34 Storage Options: GPFS-WAN Project Space If you have a lot of data that needs to be analyzed, you can request project space on GPFS-WAN (Global Parallel File System-Wide Area Network) ▫Total size: 475 TB (quotas based on request) ▫Not purged, but also not backed up ▫Data and directories will be removed one month after the assigned TeraGrid project allocation expires ▫Mounted on IU’s Big Red site More info: ▫

35 Storage Options: Indiana University HPSS Archive Default quota: 5TB (but you can ask for more) Fast access: 100’s MB/sec ▫Recommend GridFTP clients ( Two copies stored on two separate sites Use if you have: ▫Files of at least 1MB (single file can be up to 10TB) that are rarely updated and need to be kept a long time ▫Files are read often (frequently accessed files tend to stay on disk cache) Not good for: ▫Small files ▫Files that will frequently change Should ask for allocation on Big Red More info: ▫ tg10-archival-storage.pdf

36 Storage Options: NCSA Tape Archive OR Can login using ssh/gsissh Two copies of each file are made without requiring user interaction Directories: ▫Home directory  Quota: 1 TB (cannot request more) ▫Project directories (named with three letter NCSA PSN)  Quotas:  Startup: 1TB  TRAC: 5 TB  Can ask for supplement through POPS system  PI is owner of project folder; user subdirectories created under project space  Soft links in the user's home directory to their subdirectory in projects area

37 “I see nobody on the road,” said Alice. “I only wish I had such eyes,” the King remarked in a fretful tone. “To be able to see Nobody! And at such a distance too!” -- Lewis Carroll

38 TACC Longhorn Visualization Portal TACC's Dell XD Visualization Cluster ▫2048 compute cores ▫14.5 TB aggregate memory ▫512 GPUs ▫QDR InfiniBand interconnect ▫Connected by 10GigE to Ranger's Lustre parallel file system Select session type: ▫VNC (need to create password) ▫EnVision Select number of nodes (1 to 16, which translates to 8- 128 processors, respectively)

39 ParaView Free to download Can be run on your desktop/laptop, but also can take advantage of HPC resources Already installed on TACC Longhorn Among others, loads the following file formats: ▫*.cube files (Gaussian) ▫*.vtk files (LAMMPS, with a little work) Lots of rendering and visualization options Tutorial (for version 3.8): ▫

40 ParaView for Gaussian

41 ParaView for LAMMPS Your LAMMPS job must create a dump file ▫dump 1 peptide atom 10 dump.peptide Download the script: ▫ Use script to convert to a series of VTK files (creates peptide0000.vtk, peptide0001.vtk, etc.): ▫python –i ▫> d = dump(“dump.peptide”) ▫> v = vtk(d) ▫> v.many(“peptide”) Load all of the files into ParaView

42 ParaView for LAMMPS

43 ParaView on TACC Longhorn Assuming you are using the Visualization Portal with a VNC session… ▫In one xterm:  module load paraview  vglrun paraview ▫In the other:  module load paraview  env NO_HOSTSORT=1 ibrun tacc_xrun pvserver ▫WARNING: You can minimize the windows that open, but do NOT close them! ▫In the ParaView GUI window:  File  Connect  Add Server  Enter a name  Configure  Under “Startup Type,” select “Manual”  Select the name of your server configuration, and click "Connect"  In the xterm where you launched ParaView server, you should see "Client connected.“ ▫To increase the image quality, go to Options on the Portal page, and select a higher number for the JPEG image quality. More info:

44 Longhorn Visualization Portal: Using a VNC client Although you can connect to a Longhorn VNC session through a browser, the remote screen usually doesn’t display properly (cut off, and you can’t scroll). However, when you start a job, you are given an address to connect to Longhorn through a VNC client you can run locally. ▫See the “Jobs” tab in the Longhorn portal Some example VNC clients: ▫Windows/Linux:  TightVNC: TightVNC:  TurboVNC: TurboVNC:  UltraVNC: UltraVNC: ▫Mac:  Chicken of the VNC: Chicken of the VNC: WARNING! You MUST kill the job in the portal. Exiting the VNC client will NOT end the job!

45 TACC Portal Demo Log into the Longhorn Visualization Portal. Start a VNC session job. Get the VNC address. Connect using TightVNC. Start and run ParaView.

46 “A mind once stretched by a new idea never regains its original dimensions.” -- Anonymous

47 Educational Allocations If you are teaching a class and would like to use TeraGrid resources in the classroom, you can! Apart from the regular Startup and Research allocations, there is an “Educational” allocation option as well. ▫It is recommended that you request it fairly early, before the semester starts. ▫Once you have your class roster, you can add your students to the account (they should have their login information within one to two weeks). For more information: ▫

48 “Live as if your were to die tomorrow. Learn as if you were to live forever.” -- Gandhi “I cannot teach anybody anything, I can only make them think.” -- Socrates

49 Training and Help CI-Train Project ▫ TeraGrid @ Binghamton webpage (FAQ and assorted documentation): ▫ The Campus Champions (us!) ▫ GRID-L ▫Binghamton listserv for those interested in grid and parallel computing ▫Send email to LISTSERV@LISTSERV.BINGHAMTON.EDU with body text:  SUBSCRIBE GRID-L Firstname Lastname

50 “The important thing is not to stop questioning. Curiosity has its own reason for existing.” -- Albert Einstein

51 Appendix A: TeraGrid Client Toolkit Installation Problems If you run into problems, you may have to trick the thing into installing: ▫export TERAGRID_CLIENT_PLATFORM=linux-rhel-4 ▫export VDT_ALLOW_UNSUPPORTED=1 Replace linux-rhel-4 with the operating system nearest yours. ▫See for the list. Then, rerun the installation script. If you still have problems, check the install.log and contact us.

52 Appendix B: globus-url-copy Parameters: ▫-vb  display bytes transferred and average performance ▫-fast  Recommended when using GridFTP servers. Use MODE E for all data transfers, including reusing data channels between list and transfer operations. ▫-stripe  enable striped transfers on supported servers ▫-tcp-bs  specifies the size (in bytes) of the TCP buffer to be used by the underlying ftp data channels  8M is a good value, although technically the best is determined by:  bandwidth in Megabits per second (Mbs) * RTT in milliseconds (ms) * 1000 / 8  RTT = roundtrip time it takes a packet to get from the source to the destination (use ping to determine this) ▫-p  specifies the number of parallel data connections that should be used

53 Appendix C: Transferring to/from HPSS Globus-url-copy ▫4-way transfer (port 2814):  globus-url-copy -vb -fast -stripe -tcp-bs 8M -p 4 file://$POPLE_HOME/gaussian/water2.cube gsi ▫8-way transfer (port 2818):  globus-url-copy -vb -fast -stripe -tcp-bs 8M -p 8 file://$POPLE_HOME/gaussian/water2.cube gsi mreale/water.cube file://$POPLE_HOME/gaussian/water2.cube Uberftp ▫uberftp> open -P 2814 ▫uberftp> lopen ▫uberftp> parallel 4 ▫uberftp> blksize 8388608 ▫uberftp> put bigfile.tar "bigfile.tar,,4“

54 Appendix D: NCSA Tape Archive Check quotas/usage: mssquota Quota Information: mreale Home Directory Usage Quota % Used Date Updated ------------------------- ----------- ----------- ------ ------------------- /UROOT/u/ac/mreale 0.04 KB 1.00 TB 1.00 2010-09-07 01:51:19 Note: Project area information shown once there’s something in there. Transfer files: ▫globus-url-copy file:///absolute/path/tofile gsi ▫uberftp

Download ppt "TeraGrid: A Powerful, Parallel, Fast, and Free Computational Resource A Bytes ‘n Bites Presentation Michael J. Reale and Dr. James Wolf Information Technology."

Similar presentations

Ads by Google