Presentation is loading. Please wait.

Presentation is loading. Please wait.

Masaki Hirabaru 情報処理学会四国支部講演会 December 17, 2004 インターネットでの長距離高性能データ転送.

Similar presentations


Presentation on theme: "Masaki Hirabaru 情報処理学会四国支部講演会 December 17, 2004 インターネットでの長距離高性能データ転送."— Presentation transcript:

1 Masaki Hirabaru 情報処理学会四国支部講演会 December 17, 2004 インターネットでの長距離高性能データ転送

2 2 データ転送昔と今 パケット転送遅延 = サイズ 速度 距離 光速 + パケットサイズ = 1500 バイト 距離 = 10,000Km 1) 通信速度 = 9600bps 遅延 = 156ms + 33ms 2) 通信速度 = 1Gbps 遅延 = 15us + 33ms ネットワークの帯域が広くなったのに、スループットが出ない!

3 3 Radio Telescopes NICT Kashima Space Center Onsala Space Observatory 20m 34m 18m Urumqi 25m

4 4 VLBI* System Transitions K5 Data Acquisition Terminal 1st Generation 2nd Generation 1983~ Open-Reel Tape Hardware Correlator 1990~ Cassette Tape Hardware Correlator e-VLBI over ATM 3rd Generation 2002~ PC-based System Hard-disk Storage Software Correlator e-VLBI over Internet K3 Correlator (Center) K3 Recorder (Right) K4 Terminal K4 Correlator 64Mbps 256Mbps 1 ~ 2Gbps * 超長基線電波干渉計

5 5 VLBI (Very Long Baseline Interferometry) delay radio signal from a star correlator A/D clock A/D Internet clock e-VLBI geographically distributed observation, interconnecting radio antennas over the world Large Bandwidth-Delay Product Network issue ASTRONOMY GEODESY ~Gbps Gigabit / real-time VLBI multi-gigabit rate sampling A B A B d

6 6 Motivations MIT Haystack – CRL Kashima e-VLBI Experiment on August 27, 2003 to measure UT1-UTC in 24 hours –41.54 GB CRL => MIT 107 Mbps (~50 mins) 41.54 GB MIT => CRL 44.6 Mbps (~120 mins) –RTT ~220 ms, UDP throughput 300-400 Mbps However TCP ~6-8 Mbps (per session, tuned) –BBFTP with 5 x 10 TCP sessions to gain performance HUT – CRL Kashima Gigabit VLBI Experiment -RTT ~325 ms, UDP throughput ~70 Mbps However TCP ~2 Mbps (as is), ~10 Mbps (tuned) -Netants (5 TCP sessions with ftp stream restart extension) They need high-speed / real-time / reliable / long-haul high-performance, huge data transfer. UT1-UTC = -32338.7280 +/- 23.90 usec

7 7 TCP Dynamic Behavior time rate slow-start congestion avoidance available bandwidth

8 8 Example: From Tokyo to Boston TCP on a fast long path with a bottleneck bottleneck overflow queue loss Tokyo sender rate control Boston receiver loss detection feedback It takes 150ms to know the loss (buffer overflow). It keeps overflowing during the period… 150ms is very long for the high-speed network. 150ms at 1Gbps generates ~19MByte on the wire. Los Angeles 50ms 100ms bw 1G bw 0.8G Buffer 25MB

9 9 Conditions a signle TCP stream (x multi) memory to memory (no disk access) single bottleneck keep end-to-end principle (x relay) packet switched network (scalable) consume all the available bandwidth Target

10 10 Example How much speed can we get? Receiver Sender a-2) High- Speed Backbone L2/L3 SW GbE 100M RTT 200ms a-1) Receiver Sender High- Speed Backbone SW GbE 100M RTT 200ms GbE SW GbE 100M

11 11 Average TCP Throughput less than 20Mbps In case we limit the sending rate at 100Mbps This is TCP’s fundamental behavior.

12 12 Possible Bottlenecks CPU IO NIC Memory PCI, PCI-X Disk driver buffer Interrupt coalesce, delay MTU etc. 1 st Step: Tuning a Host with UDP Iperf theoretical UDP throughput 957 Mbps (IPv4)

13 13 2 nd Step: Tuning a Host with TCP Maximum socket buffer size (TCP window size) –net.core.wmem_max net.core.rmem_max (64MB) –net.ipv4.tcp_wmem net.tcp4.tcp_rmem (64MB) Driver descriptor length –e1000: TxDescriptors=1024 RxDescriptors=256 (default) Interface queue length –txqueuelen=100 (default) –net.core.netdev_max_backlog=300 (default) Interface queue descriptor –fifo (default) MTU –mtu=1500 (IP MTU) Linux 2.4.26 (RedHat 9) with web100 Web100 (incl. High Speed TCP) –net.ipv4.web100_no_metric_save=1 (do not store TCP metrics in the route cache) –net.ipv4.WAD_IFQ=1 (do not send a congestion signal on buffer full) –net.ipv4.web100_rbufmode=0 net.ipv4.web100_sbufmode=0 (disable auto tuning) –net.ipv4.WAD_FloydAIMD=1 (HighSpeed TCP) –net.ipv4.web100_default_wscale=7 (default)

14 14 Kwangju Busan 2.5G Fukuoka Korea 2.5G SONET KOREN Taegu Daejon 10G 0.6G1Gx2 QGPOP Seoul XP Genkai XP Kitakyushu Tokyo XP Kashima 0.1G Fukuoka Japan 250km 1,000km 2.5G TransPAC 9,000km 4,000km Los Angeles Cicago New York MIT Haystack HUT 10G 1G APII/JGN Abilene 0.1G Helsinki 2.4G Stockholm 0.6G 2.4G GEANT Nordunet funet Koganei 1G 7,000km Indianapolis I2 Venue 1G 10G 100km server (general) server (e-VLBI) Abilene Observatory: servers at each NOC CMM: common measurement machines Network Diagram for TransPAC/I2 Measurement (Oct. 2003) 1G x2 sender receiver Mark5 Linux 2.4.7 (RH 7.1) P3 1.3GHz Memory 256MB GbE SK-9843 PE1650 Linux 2.4.22 (RH 9) Xeon 1.4GHz Memory 1GB GbE Intel Pro/1000 XT Iperf UDP ~900Mbps (no loss)

15 15 TransPAC/I2 #1:Reno (Win 64MB)

16 16 Analyzing Advanced TCP Dynamic Behavior in a Real Network ( Example: From Tokyo to Indianapolis at 1G bps with HighSpeed TCP ) The data was obtained during e-VLBI demonstration at Internet2 Member Meeting in October 2003.

17 17 Receiver Linux TCP Sender Linux 2.4 ENP2611 Network Processor Emulator GbE RTT 200ms (100ms one-way) GbE Only 800 Mbps available Replaying in a Laboratory -Evaluation of Advanced TCPs- ENP2611 board

18 18 Test Result #1 Queue size 100 packets TCP NewReno (Linux) HighSpeed TCP (Web100)

19 19 Example of Advanced TCPs with different bottleneck queue sizes BIC TCP queue size 100 packetsFAST TCP queue size 100 packets BIC TCP queue size 1000 packetsFAST TCP queue size 1000 packets BIC TCP: http://www.csc.ncsu.edu/faculty/rhee/export/bitcp/ FAST TCP: TCP Vegas を基にした Delay Base の TCP http://netlab.caltech.edu/FAST/

20 20 RouterSwitchRouterSwitch 1Gbps 100Mbps1Gbps a)b) Device Queuing Delay (µs) Capacity (Mbps) Estimated Queue Size (1500B) Switch A6161100*50 Switch B22168100*180 Switch C20847100*169 Switch D738100060 Switch E36621000298 Router F148463100012081 Router G188627100015350 * set to 100M for measurement Measuring Bottleneck Queue Sizes Switch/Router Queue Size Measurement Result Typical Bottleneck Cases ReceiverSender Capacit y C packet train lost packet measured packet Queue Size = C x (Delay max – Delay min )

21 21 Kwangju Busan 2.5G Fukuoka Korea 2.5G SONET KOREN Taegu Daejon 10G 1G (10G) 1G Seoul XP Genkai XP Kitakyushu Kashima 1G (10G) Fukuoka Japan 250km 1,000km 2.5G JGN II 9,000km 4,000km Los Angeles Chicago Washington DC MIT Haystack 10G 2.4G APII/JGNII Abilene Koganei 1G(10G) Indianapolis 100km bwctl server Experiment for High-Performance Scientific Data Transfer 10G Tokyo XP / JGN II I-NOC *Performance Measurement Point Directory http://e2epi.internet2.edu/pipes/pmp/pmp-dir.html perf server e-vlbi server JGNII 10G GEANT SWITCH 7,000km TransPAC Pittsburgh U of Tokyo A BWCTL account available for CMM including Korean researchers International collaboration to support for science applications

22 22 VLBI Antenna Locations in North-East Asia Shintotsukawa 3.8m Tomakomai 11m, FTTH (100M) 70km from Sapporo Mizusawa 10m 20m 118km from Sendai Tsukuba 32m, OC48/ATMx2 SuperSINET Kashima 34m, 1Gx2 JGN, OC48/ATM Galaxy Yamaguchi 32m 1G, 75M SINET Gifu 11m 3m, OC48/ATMx2 SuperSINET Usuda 64m, OC48/ATM Galaxy Nobeyama 45m OC48/ATM Galaxy Nanshan (Urumqi) 25m 70km from Urumqi Koganei 34m, 1Gx2 JGN, OC48/ATM Galaxy Miyun (Beijing) 50m 50km from Beijing 2Mbps Yunnan (Kunming) 3m (40m) 10km from Kunming Sheshan (Shanghai) 25m 30km from Shanghai Observatory is on CSTNET at 100M Jeju 20m Tamna U Seoul 20m Yonsei U Ulsan 20m U Ulsan Daejon 14m Taeduk Ishigaki 20m Ogasawara 20m Chichijima 10m Iriki 20m Kagoshima 6m Aira 10m Legend connected not yet connected antenna under construction

23 23 Speed Races Internet2 Land Speed Record –Single Regular TCP stream –Speed x Distance –6.57 Gbps by Caltech (7.21 Gbps by U Tokyo) SC Bandwidth Challenge –More than 100 Gbps –AIST, U Tokyo, JAXA, …

24 24 Summary and Future Work High-performance scientific data transfer faces on network issues we need to work out. Big science applications like e-VLBI and High- Energy Physics need cooperation with network researchers. Deployment of performance measurement Infrastructure is on-going on world-wide basis.


Download ppt "Masaki Hirabaru 情報処理学会四国支部講演会 December 17, 2004 インターネットでの長距離高性能データ転送."

Similar presentations


Ads by Google