Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.

Similar presentations


Presentation on theme: "Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop."— Presentation transcript:

1 Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop

2 Agenda Supercomputing 2004 Conference Application –Ultimate Integration Resource Overview Did it work? What did we take from it?

3 Supercomputing 2004 Annual Conference –Supercomputers –Storage Network hardware –Original reason for application Bandwidth Challenge –Didn’t apply due to time

4 Application Requirements Runs on Lemieux (PSC’s supercomputer) Application Gateways (AGW) Cisco CRS-1 –40Gb/sec OC-768 cards –Few exist Single application Be used with another demo on the show floor if possible

5 Ultimate Integration Application Checkpoint Recovery System –Program Garden variety Laplace solver instrumented to save its memory state in checkpoint files Checkpoints memory to remote network clients Runs on 34 Lemieux nodes

6 Lemieux TCS System 750 Compaq Alphaserver ES45 nodes –SMP Four 1GHz Alpha Processors 4 GB of Memory Interconnection –Quadrics Cluster Interconnect Shared memory library

7 Application Gateways 750 GigE connections are very expensive Reuse Quadrics network to attach cheap Linux boxes with GigE –15 AGWS Single processor Xeons 1 Quadrics card 2 Intel GigE –Each GigE card maxes out at 990Mb/sec –Only need 30 GigE to fill link to Teragrid Web100 kernel

8 Application Gateways

9 Network Cisco 6509 –Sup720 –WS-X6748-SFP –Two WS-X6704-10GE Used 4 10GE interfaces OSPF load balancing was my real worry – >30 GE streams over 4 links

10 Network Cisco CRS-1 –40 Gb/sec slot –16 slots –For Demo Two OC-768 cards –Ken Goodwin’s and Kevin McGratten’s big worry was the OC-768 transport Two 8 Port 10 GE cards –Running production IOS-XR code –Had problems with tracking hardware Ran both without 2 Switching Fabrics with no effects on traffic

11 Network Cisco CRS-1 –One at Westinghouse Machine Room –One on show floor Fork lift needed to place it –7 feet tall –939 lbs empty –1657 lbs fully loaded

12 The Magic Box Stratalight – OTS 4040 transponder “compresses” the 40Gbs signal to fit into the spectral bandwidth of a traditional 10G wave –http://www.stratalight.com/http://www.stratalight.com/ Uses proprietary encoding techniques The Stratalight transponder was connected to the Mux/DMUX of the 15454 as an alien wavelength

13 Time Dependences OC-768 wasn’t worked on until one week before the conference

14 OC-768

15

16

17 Where Does the Data Land? Lustre Filesystem –http://www.lustre.org/http://www.lustre.org/ Developed by Cluster File Systems –http://www.clusterfs.com/http://www.clusterfs.com/ POSIX compliant, Open Source, parallel file system Separates metadata and data objects to allow for speed and scaling

18 The Show Floor 8 Checkpoint Servers with a 10GigE and Infiniband connections 5 Lustre OSTs connected via Infiniband with 2 SCSI disk shelves (RAID5) Lustre meta-data server (MDS) connected via Infiniband

19 The Show Floor

20 The Demo

21 How well did it run? Laplace Solver w/ Checkpoint Recovery –Using 16 Application Gateways (32 GigE connections): 31.1Gbs Only 32 Lemieux nodes were available IPERF –Using 17 Application Gateways + 3 single GigE attached machines: 35 Gbs Zero SONET errors reported on interface Over 44TB were transferred

22 The Team

23 Just Demoware? AGWs –qsub command now has AGW option Can do accounting (and possibly billing) Mysql database with Web100 stats –Validated that AGW was cost effective solution OC-768 Metro can be done by mere mortals

24 Just Demoware?? Application receiver –Laplace solver ran at PSC –Checkpoint receiver program tested / run at both NCSA and SDSC Ten IA64 compute nodes as receiver ~10 Gb/sec Network to Network (/dev/null) –990 Mb/sec * 10 streams

25 Thank You


Download ppt "Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop."

Similar presentations


Ads by Google