Presentation is loading. Please wait.

Presentation is loading. Please wait.

UltraLight: Network & Applications Research at UF Dimitri Bourilkov University of Florida CISCO - UF Collaborative Team Meeting Gainesville, FL, September.

Similar presentations


Presentation on theme: "UltraLight: Network & Applications Research at UF Dimitri Bourilkov University of Florida CISCO - UF Collaborative Team Meeting Gainesville, FL, September."— Presentation transcript:

1 UltraLight: Network & Applications Research at UF Dimitri Bourilkov University of Florida CISCO - UF Collaborative Team Meeting Gainesville, FL, September 12, 2006 Gainesville, FL, September 12, 2006

2 D.Bourilkov UltraLight2 Overview a NSF Project

3 D.Bourilkov UltraLight3 The UltraLight Team  Steering Group: H. Newman (Caltech, PI), P. Avery (U. Florida), J. Ibarra (FIU), S. McKee (U. Michigan)  Project Management: Richard Cavanaugh (Project Coordinator), PI and Working Group Coordinators:  Network Engineering: Shawn McKee (Michigan); + S. Ravot (LHCNet), R. Summerhill (Abilene/HOPI), D. Pokorney (FLR), J. Ibarra (WHREN, AW), C. Guok (ESnet), L. Cottrell (SLAC), D. Petravick, M. Crawford (FNAL), S. Bradley, J. Bigrow (BNL), et al.  Applications Integration: Frank Van Lingen (Caltech); + I. Legrand (MonALISA), J. Bunn (GAE + TG); C. Steenberg, M. Thomas (GAE), Sanjay Ranka (Sphinx) et al.  Physics Analysis User Group: Dimitri Bourilkov (UF; CAVES, Codesh)  Network Research, Wan In Lab Liaison: Steven Low (Caltech)  Education and Outreach: Laird Kramer (FIU), + H. Alvarez, J. Ibarra, H. Newman

4 D.Bourilkov UltraLight4 TOTEM pp, general purpose; HI LHCb: B-physics ALICE : HI  pp  s =14 TeV L=10 34 cm -2 s -1  27 km Tunnel in Switzerland & France Large Hadron Collider CERN, Geneva: 2007 Start CMS Atlas Higgs, SUSY, Extra Dimensions, CP Violation, QG Plasma, … the Unexpected 5000+ Physicists 250+ Institutes 60+ Countries Challenges: Analyze petabytes of complex data cooperatively Harness global computing, data & NETWORK resources

5 D.Bourilkov UltraLight5 LHC Data Grid Hierarchy CERN/Outside Ratio Smaller; Expanded Role of Tier1s & Tier2s: Greater Reliance on Networks CERN/Outside Resource Ratio ~1:4 Tier0/(  Tier1)/(  Tier2) ~1:2:2 DISUN : 4 of 7 US CMS Tier2s Shown With ~8 MSi2k; 1.5 PB Disk by 2007 >100 Tier2s at LHC 10-40+ Gbps 2.5 - 30 Gbps

6 D.Bourilkov UltraLight6 Tier-2s ~100 Identified – Number still growing

7 D.Bourilkov UltraLight7 HENP Bandwidth Roadmap for Major Links (in Gbps) Continuing Trend: ~1000 Times Bandwidth Growth Per Decade; HEP: Co-Developer as well as Application Driver of Global Nets

8 D.Bourilkov UltraLight8 Data Samples and Transport Scenarios 10 7 Event Samples Data Volume (TBytes) Transfer Time (hrs) @ 0.9 Gbps Transfer Time (hrs) @ 3 Gbps Transfer Time (hrs) @ 8 Gbps AOD0.5-11.2 – 2.50.37-0.740.14 – 0.28 RECO2.5 - 56 - 121.8 – 3.70.69 – 1.4 RAW+RECO17.5 - 2143 - 8613 - 264.8 – 9.6 MC20983011  10 7 Events is a typical data sample for analysis or reconstruction development [Ref.: MONARC]; equivalent to just ~1 day’s running  Transporting datasets with quantifiable high performance is needed for efficient workflow, and thus efficient use of CPU and storage resources  One can only transmit ~2 RAW + REC or MC samples per day on a 10G path  Movement of 10 8 event samples (e.g. after re-reconstruction) will take ~1 day (RECO) to ~1 week (RAW, MC) with a 10G link at high occupancy  Transport of significant data samples will require one, or multiple 10G links

9 D.Bourilkov UltraLight9 UltraLight Goals Goal: Enable the network as an integrated managed resource Goal: Enable the network as an integrated managed resource Meta-Goal: Enable physics analysis & discoveries which otherwise could not be achieved Meta-Goal: Enable physics analysis & discoveries which otherwise could not be achieved Caltech, Florida, Michigan, FNAL, SLAC, CERN, BNL, Internet2/HOPI UERJ (Rio), USP(Sao Paulo), FIU, KNU (Korea), KEK (Japan), TIFR (India), PERN (Pakistan) NLR, ESnet, CENIC, FLR, MiLR, US Net, Abilene, JGN2, GLORIAD, RNP, CA*net4; UKLight, Netherlight, Taiwan Cisco, Neterion, Sun … Next generation Information System, with the network as an integrated, actively managed subsystem in a global Grid Hybrid network infrastructure: packet-switched + dynamic optical paths End-to-end monitoring; Realtime tracking and optimization Dynamic bandwidth provisioning; Agent-based services spanning all layers

10 D.Bourilkov UltraLight10 Large Scale Data Transfers Network aspect: Bandwidth*Delay Product (BDP); we have to use TCP windows matching it in the kernel AND the application On a local connection with 1GbE and RTT 0.19 ms, to fill the pipe we need around 2*BDP 2*BDP = 2*1Gb/s*0.00019s = ~ 48 KBytes Or, for a 10 Gb/s LAN: 2*BDP = ~ 480 KBytes Now on the WAN: from Florida to Caltech the RTT is 115 ms. So for 1 Gb/s to fill the pipe we need 2*BDP = 2*1Gb/s*0.115s = ~ 28.8 MBytes etc. User aspect: are the servers on both ends capable of matching these rates for useful disk-to-disk? Tune kernels, get highest possible disk read/write speed etc. Tables turned: WAN outperforms disk speeds!

11 D.Bourilkov UltraLight11 bbcp Tests bbcp was selected as a starting tool for data transfers on the WAN: Supports multiple streams, highly tunable (window size etc), peer-to-peer type Well supported by Andy Hanushevsky from SLAC Is used successfully in BaBar I have used it in 2002 for CMS production: massive data transfers from Florida to CERN; the only limit observed at the time was disk writing speed (LAN), network (WAN) Starting point Florida  Caltech: < 0.5 MB/s on the WAN, very poor performance

12 D.Bourilkov UltraLight12 Evolution of Tests Leading to SC|05 End points in Florida (uflight1) and Caltech (nw1): AMD Opterons over UL network Tuning of Linux kernels (2.6.x) and bbcp window sizes – coordinated iterative procedure Current status (for file sizes ~ 2GB): 6-6.5 Gb/s with iperf up to 6 Gb/s memory to memory 2.2 Gb/s ramdisk  remote disk write >the speed was the same writing to SCSI disk which is supposedly less than 80 MB/s or writing to a raid array, so de facto it always goes first to memory cache (the Caltech node has 16 GB ram) Used successfully with up to 8 bbcp processes in parallel from Florida to the show floor in Seattle; CPU load still OK

13 D.Bourilkov UltraLight13 bbcp Examples Florida  Caltech [bourilkov@uflight1 data]$ iperf -i 5 -c 192.84.86.66 -t 60 ------------------------------------------------------------ Client connecting to 192.84.86.66, TCP port 5001 TCP window size: 256 MByte (default) ------------------------------------------------------------ [ 3] local 192.84.86.179 port 33221 connected with 192.84.86.66 port 5001 [ 3] 0.0- 5.0 sec 2.73 GBytes 4.68 Gbits/sec [ 3] 5.0-10.0 sec 3.73 GBytes 6.41 Gbits/sec [ 3] 10.0-15.0 sec 3.73 GBytes 6.40 Gbits/sec [ 3] 15.0-20.0 sec 3.73 GBytes 6.40 Gbits/sec bbcp: uflight1.ultralight.org kernel using a send window size of 20971584 not 10485792 bbcp -s 8 -f -V -P 10 -w 10m big2.root uldemo@192.84.86.66:/dev/null bbcp: Sink I/O buffers (245760K) > 25% of available free memory (231836K); copy may be slow bbcp: Creating /dev/null/big2.root Source cpu=5.654 mem=0K pflt=0 swap=0 File /dev/null/big2.root created; 1826311140 bytes at 432995.1 KB/s 24 buffers used with 0 reorders; peaking at 0. Target cpu=3.768 mem=0K pflt=0 swap=0 1 file copied at effectively 260594.2 KB/s bbcp -s 8 -f -V -P 10 -w 10m big2.root uldemo@192.84.86.66:dimitri bbcp: uflight1.ultralight.org kernel using a send window size of 20971584 not 10485792 bbcp: Creating./dimitri/big2.root Source cpu=5.455 mem=0K pflt=0 swap=0 File./dimitri/big2.root created; 1826311140 bytes at 279678.1 KB/s 24 buffers used with 0 reorders; peaking at 0. Target cpu=10.065 mem=0K pflt=0 swap=0 1 file copied at effectively 150063.7 KB/s

14 D.Bourilkov UltraLight14 bbcp Examples Caltech  Florida [uldemo@nw1 dimitri]$ iperf -s -w 256m -i 5 -p 5001 -l 8960 ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 512 MByte (WARNING: requested 256 MByte) ------------------------------------------------------------ [ 4] local 192.84.86.66 port 5001 connected with 192.84.86.179 port 33221 [ 4] 0.0- 5.0 sec 2.72 GBytes 4.68 Gbits/sec [ 4] 5.0-10.0 sec 3.73 GBytes 6.41 Gbits/sec [ 4] 10.0-15.0 sec 3.73 GBytes 6.40 Gbits/sec [ 4] 15.0-20.0 sec 3.73 GBytes 6.40 Gbits/sec [ 4] 20.0-25.0 sec 3.73 GBytes 6.40 Gbits/sec bbcp -s 8 -f -V -P 10 -w 10m big2.root uldemo@192.84.86.179:/dev/null bbcp: Sink I/O buffers (245760K) > 25% of available free memory (853312K); copy may be slow bbcp: Source I/O buffers (245760K) > 25% of available free memory (839628K); copy may be slow bbcp: nw1.caltech.edu kernel using a send window size of 20971584 not 10485792 bbcp: Creating /dev/null/big2.root Source cpu=5.962 mem=0K pflt=0 swap=0 File /dev/null/big2.root created; 1826311140 bytes at 470086.2 KB/s 24 buffers used with 0 reorders; peaking at 0. Target cpu=4.053 mem=0K pflt=0 swap=0 1 file copied at effectively 263793.4 KB/s

15 D.Bourilkov UltraLight15 SuperComputing 05 Bandwidth Challenge 475 TBytes Transported in < 24 h Above 100 Gbps for Hours

16 D.Bourilkov UltraLight16 Outlook The UltraLight network is already very performant SC|05 was a big success The hard problem from the user perspective now is to match it with servers capable of sustained rates for large files > 20 GB (when the memory caches are exhausted); fast disk writes are key (raid arrays) To fill 10 Gb/s pipes we need several pairs (3-4) of servers Next step: disk-to-disk transfers between Florida, Caltech, Michigan, FNAL, BNL, CERN, preparations for SC|06 (next talk) More info: http://ultralight.caltech.edu


Download ppt "UltraLight: Network & Applications Research at UF Dimitri Bourilkov University of Florida CISCO - UF Collaborative Team Meeting Gainesville, FL, September."

Similar presentations


Ads by Google