Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Building a connection-oriented internet Outline –What are we doing? - cheetah –Research problems –Engineering problems –Why we are doing this? - vision/motivation.

Similar presentations


Presentation on theme: "1 Building a connection-oriented internet Outline –What are we doing? - cheetah –Research problems –Engineering problems –Why we are doing this? - vision/motivation."— Presentation transcript:

1 1 Building a connection-oriented internet Outline –What are we doing? - cheetah –Research problems –Engineering problems –Why we are doing this? - vision/motivation –Hasn't this been attempted before? Malathi Veeraraghavan Univ. of Virginia mv@cs.virginia.edu Talk at Georgia Tech., March. 30, 2005

2 2 What are we doing? Building a wide-area network called CHEETAH: Circuit-switched High-speed End-to-End Transport ArcHitecture Writing software to run on Linux end hosts to use this network Applications –File transfers –Remote visualization –Web downloads

3 3 NSF-funded project Participants: –Malathi Veeraraghavan, UVA –Nagi Rao, Bill Wing, Tony Mezzacappa, ORNL –Ibrahim Habib, CUNY –John Blondin, NCSU $3.5M project for three years, 2004-2007 Acknowledgment: NSF EIN grant ANI-0335190

4 4 What’s the cheetah network? End hosts with two Ethernet NICs each –Primary NIC connected to the enterprise LAN/Internet –Secondary NIC connected to an MSPP Network nodes are MSPPs –An MSPP is an Ethernet-SONET gateway Solution leverages “king of LANs” (Ethernet) and “king of MANs/WANs (SONET)” Key aspect: Dynamic bandwidth sharing

5 5 Multi-Service Provisioning Platform (MSPP) 10/100M Ethernet 1Gbps Ethernet Crossconnect (VT1.5 or STS1) OC12/OC48/OC192 SONET card PC WAN access Control An OC1 rate SONET crossconnect with optional Ethernet interface cards Ethernet cards implement GFP to map Ethernet frames into SONET frames (EoS) Sycamore’s SN16000 implements GMPLS protocols

6 6 GMPLS protocols Triumvirate to build large-scale networks in “plug-and-play” mode –LMP to discover neighbors –OSPF-TE for routing –RSVP-TE for signaling Should be able to create distributed networks with “minimal” admin support

7 7 Signaling to setup/release Ethernet-EoS-Ethernet circuits Gateways available that can crossconnect a Gigabit Ethernet port to an equivalent-rate time-division or wavelength- division multiplexed signal dynamically Control Gigabit Ethernet interface card Time-division or wavelength-division multiplexing optical interface card Circuit based gateway Circuit based gateway Circuit based gateway Circuit based gateway Circuit based gateway signaling engine: dynamic call setup/release Gigabit Ethernet interfaces to hosts Gigabit Ethernet interfaces to hosts Setup connection (make reservation) Release connection (release resources) Transfer file

8 8 Holding times of circuits 100ms to few seconds/minutes –upper limit should be imposed for increased sharing; value depends on bandwidth requested –if 1Gbps, upper limit can be a few minutes –if 100Mbps, upper limit can be a couple of hours Apps –File transfer; e.g. 100MB on 1Gbps: 800ms –Remote viz. session: one or two hours at 100Mbps

9 9 Cheetah software on end hosts DNS query (to check if far end host is also on cheetah) Routing decision to check whether to use the TCP/IP path or a cheetah circuit Signaling client to request a circuit Fixed Rate Transport Protocol (FRTP) designed for circuits

10 10 Demo #1 (at SC2004): Web Application Web server (MVSTU2) Web client (MVSUT3) At the web server side –Hyperlink to file is a CGI script (download.cgi); filename embedded in hyperlink –Download.cgi is started automatically at server when user clicks hyperlink, which triggers CHEETAH FT sender –CHEETAH FT Sender initiates CHEETAH circuit setup by calling RSVP-TE client. –CHEETAH FT Sender starts data transfer using dual paths: FRTP/circuit and TCP/IP At the web client side –A RSVP-TE client is running as daemon to accept the circuit setup request. –A CHEETAH FT receiver is running as daemon to receive the user data Web Browser (e.g. Mozilla) Web Server (e.g. Apache) download.cgi Data transfer URL Response RSVP-TE Messages RSVP-TE client FRTP TCP CHEETAH FT sender RSVP-TE client FRTP TCP CHEETAH FT receiver

11 11 File transfers on circuits Seems like a good app. for high bandwidth –Can absorb “any” bandwidth you can allocate to the transfer (subject to PC limitations) –No intrinsic burstiness move bits from one disk to another

12 12 Better or worse than file transfers on TCP/IP? General thinking: –Circuits good for “large” files –eScience apps create large files –Emission delay of 2.2 hours for a 1TB file on a 1Gbps circuit Not a scalable network solution if used exclusively for such very large files

13 13 Cheetah solution: Leverage presence of Internet path Use second NICs at hosts for circuit connectivity leaving primary NIC for Internet access Connectionless Internet End host I End host II Circuit-Switched Network Attempt circuit setup If rejected, fall back to using TCP/IP Should we attempt a circuit setup for ALL file transfers? Two paths available

14 14 Expected delay on TCP/IP path Main factors: –Round-Trip Time (RTT) – main T prop –Prob. of packet loss on IP path, p, –Bottleneck link rate J. Padhye, V. Firoiu, D. Towsley, and J. Kurose, “Modeling TCP Throughput: A Simple Model and its Empirical Validation,” Proc. of ACM SIGCOMM 98, Aug. 31 - Sep. 4, Vancouver Canada, pp. 303-314. Throughput B(p): approximately reciprocal of expected delay Other terms: –W max : receiver window size –b= 2 (ACK-every-other- segment) –T 0 : initial time-out

15 15 Mean TCP delays 1.Input parameters plus the time to transfer a 1GB file and a 1TB file Loss Round- trip prop. delay Impact of propagation delay Low impact of bottleneck link rate in wide-area networks Impact of packet loss rate

16 16 Delays incurred in using an end-to-end circuit Circuit setup delay + File transfer delay m sig : message length; r s : signaling link rate Loads:  sig and  sp : sig. link and processor T sp : signaling protocol processing delay k: number of switches; T prop : r.t. prop. delay f: file size r c : circuit rate Acknowledgment: NSF ANI grant 0087487 for hardware signaling

17 17 Should the application attempt a circuit setup or not? Mean delay if a circuit setup is attempted P b : call blocking probability in the circuit-switched network If circuit setup fails, fall back to Internet path

18 18 Routing decision

19 19 Numerical results link rate = 1Gbps T prop = 0.1ms T prop = 50ms

20 20 When r c = 100Mbps and T prop = 0.1ms Crossover file sizes r c = 1Gbps, T prop = 0.1ms

21 21 Utilization considerations Example: in 50ms scenario, if we transfer a 100KB file over a 100Mbps path, transfer time is only 8ms. Circuit utilization is 8/(50+8) = 13.7% Two opposing factors –If the crossover file size (beyond which circuit setup is attempted) is increased per-circuit utilization increases traffic load decreases (Pareto distribution of file sizes), which means aggregate utilization decreases

22 22 Aggregate utilization u a  : traffic load m: number of circuits P b : call blocking probability  m uaua For a 1% call blocking probability P b = 0.01 24.8% 58.2% 84.6% 1 10 100 4 17 117 Assuming file size follows Pareto distribution –Define fractional offered load  (fraction of  ) 40KB81% 330KB71% 80MB51% 

23 23 Plot of utilization u with r c = 100Mbps, k=20 P b =0.3P b =0.01

24 24 Cheetah network deployment Control card OC192 card GbE/ 10GbE card GbE/10GbE Ethernet Switch To Cray ORNL Circuit based gateway To DC – Dragon Atlanta NLR WDM GaTech WDM GaTech WDM ORNL WDM NLR SOX/SLR NC GbE/10GbE Ethernet Switch To cluster computer NCSU MCNC/NLR Circuit-based gateway OC192 card Control card GbE/ 10GbE card OC192 card 10GbE OC192 (10 Gbps) 10 Gbps G. Tech SLR OC192 card Control card GbE/ 10GbE card OC192 card SLR

25 25 Connecting Cheetah to Dragon and Ultrascience networks Dragon Cheetah DOE Ultrascience network (ORNL) Acknowledgment: DOE grant

26 26 All this is fun, but What are the research problems? –Bandwidth sharing modes Low load performance Scheduled vs. immediate-request Fairness –Mismatch between multitasking end hosts and TDM circuits

27 27 Fixing the bandwidth for the transfer could be a bad thing: low load problem Varying bandwidth list scheduling algorithm –uses knowledge of file size to make varying bandwidth allocations for transfer –catch: requires circuit switches to be reprogrammed multiple times within lifetime of a transfer (circuit) Capacity C Packet Switch 1.... N 2 3 Each transfer gets C/N capacity 1 2 3 N The lone remaining transfer enjoys full capacity C Capacity C Circuit Switch 1.... N 2 3 Each transfer is allocated C/N capacity 1 2 3 N The lone remaining transfer continues with capacity allocation C/N

28 28 Scheduled vs. immediate-request calls Session type requests: long holding times (2 hours) specific rate remote visualizations scientists participate in sessions best served with an advance reservation File transfer requests: file sizes provided not holding times max rate specified but any rate can be allocated scientists not involved; just computers Small files (e.g. 1 GB on 1 Gbps takes 8 sec) should be handled in immediate-request mode Large files (e.g. 1 TB on 1 Gbps takes 2.2 hours) should be handled in scheduled mode should we allocate 10Gbps and finish in 800 sec? immediate-request? or scheduled? depends on m, the number of 10Gbps circuits

29 29 Fairness Call admission algorithms –Use Markov Decision Process (MDP) tools to balance fairness and overall throughput –Long-path and short-path calls –Large files (high-BW) and short files (low-BW) calls –Multi-level answer rather than binary accept/reject Both with Fixed bandwidth and Varying bandwidth

30 30 Multi-level problem Perhaps a new problem? –Real-time (interactive) audio-video applications generate data at a certain rate (constant or variable) implication: application requests the required bandwidth from the network, and answer is binary (accept or reject); multiple classes –File transfers: “any” bandwidth that the network can provide could be acceptable implication: application requests a MAX bandwidth, but the answer can be multi-level

31 31 Mismatch between multitasking end hosts and TDM circuits Variability in sender: –other processes (e.g. matlab) + disk access (disk head location) Variability in receiver: if buffer not emptied out, data loss occurs Network protocols network card network card Filesystem File transfer Matlab kernel user space Circuit-switched network Network protocols Filesystem File transfer Matlab

32 32 Effects of mismatch in nature of circuits and nature of hosts Choose a high circuit rate and receive buffer can overflow causing losses –impacts delay + utilization (retransmissions) Choose a low circuit rate and delay can be high If sending rate is not matched exactly with circuit rate –circuit lies idle; utilization impacted

33 33 Fixed Rate Transport Protocol (FRTP) Set up a circuit at a carefully chosen rate Send data at that rate –hard to meter out data at a fixed rate from a multitasking sender when that rate is high (Linux system time granularity: 10ms) No changes of sending rate –i.e., no flow control or congestion control Packet losses recovered through retransmissions –no timers needed, just negative ACKs because of in-sequence delivery

34 34 Experimental results 1.062590 1.790200 RELATIVE TRANSFER DELAY CIRCUIT UTILIZATION (%) CIRCUIT RATE (Mbps)

35 35 Current work Experimenting with RT schedulers to schedule file transfer task in a set rhythm Experimenting with file systems to characterize file write time to collect data to then determine circuit rate and receive buffer size

36 36 Engineering problems Need to use VLAN based switches between end hosts and MSPPs –Costly otherwise VLSR: Virtual Label Switch Router –External GMPLS controller for Ethernet (VLAN) switches Understood need for making it a connection-oriented internetwork Acknowledgment: DOE grant

37 37 Connection-oriented networks Circuit switched –Time Division Multiplexed (SONET) Equipment vendors: Sycamore, Ciena Network: Cheetah, UltraScience Net, CA*net 4 –Wavelength Division Multiplexed (WDM) Equipment vendors: Movaz, Calient, LambdaOptical Network: Dragon, OMNInet, Internet2 HOPI

38 38 Connection-oriented networks Packet switched –Multiprotocol Label Switching (MPLS) Equipment vendors: Cisco, Juniper Network: Internet2, ESnet –Virtual Local Area Network (VLAN) Equipment vendors: Dell, Intel, Foundry, Extreme Network: Enterprise local area networks Just need to “enable” connection-oriented network through already deployed boxes

39 39 Bandwidth sharing problem in heterogeneous network Problem: –Tradeoff of fairness and utilization becomes more difficult when these crossconnect granularities are considered a c d b e f 2, 30Mbps 5, 100Mbps 1, 150Mbps 1, 10Mbps 1, 50Mbps 1, 500Mbps 2, 50Mbps 1, 50Mbps Request for 30Mbps connection 1Mbps 51Mbps 10Gbps Switch granularity

40 40 Interconnecting these networks Tricky business! Involves many levels of interworking protocols –User (data) plane –Signaling protocols (for connection setup/release) –Routing protocols (for reachability, topology, loading data dissemination)

41 41 But We need to solve this internetworking problem for a true connection-oriented service to flourish! Acknowledgment: DOE grant

42 42 Why do this? Two simple views –Purpose of a communication link, and by extension, a communication network –Analogy with transportation modes

43 43 Why do this? View 1: Purpose of a communication link and by extension a communication network –To provide connectivity between a data sending entity and a data receiving entity –Quantify connectivity bandwidth is a primary measure –Shouldn’t we have a network that provides users specific bandwidth levels as requested, and when requested, on a dime?

44 44 Why do this? View 2: Analogy with people/goods transportation modes –unreserved travel: roadways –reserved travel: airline seat So why not at least two such networks for moving data?

45 45 Would anyone use it? Don’t know Depends on the business case –What’s the cost of building this network? –What’s the market? –Can the service price be set to turn a profit, i.e., to let companies survive?

46 46 Hasn't this been attempted before? ATM-to-the-desktop –Goal: to enable an end-to-end connection-oriented service –It was a homogeneous network – all ATM switches –Recognition of need to interwork with IP LANE, MPOA Soon morphed into ATM networks offering connection- oriented service to interconnect routers NOT end hosts –Application focus: mostly multimedia delay-sensitive but “low” bandwidth could be supported with simple priority queueing added to connectionless packet switches First difference: aiming for a heterogeneous internet using already deployed switches and gateways

47 47 Second difference Ipsilon’s IP switch –Flow classification at “airport” to trigger connection setup –Questions of scalability – notion of having to hold “state” information for millions of flows No, just the ones who requested bandwidth airport Call to make a reservation (if only for part of the distance: airport-to-airport) CL network CO network

48 48 Summary Rich new set of research problems Experimental challenges a plenty! Real opportunity to deploy a CO internetwork Web site: http://cheetah.cs.virginia.edu


Download ppt "1 Building a connection-oriented internet Outline –What are we doing? - cheetah –Research problems –Engineering problems –Why we are doing this? - vision/motivation."

Similar presentations


Ads by Google