Inter-Datacenter Bulk Transfers with NetStitcher Nikolaos Laoutaris Telefonica Research joint work with: Michael Sirivianos, Xiaoyuan Yang and Pablo Rodriguez
Additional applications Big data important for business, science, society at large Densification of IT with datacenters/cloud fuels the big data mill
Value proposition NetStitcher is a solution for moving Petabytes across the Internet TCP, single-path routing, and the end-to-end principle not good 4 bulk It is cost-effective because it uses leftover bandwidth Time for bulk
Trend 1 – Fault tolerance to catastrophic failures
Trend 2 – PoP replication for improved user QoS Start thinking long-tail …
Additional applications Scientific computing Distributed production/delivery of movies And things will get worse
A system for carrying bulk data for large customers — Volume ~ Tbytes / Pbytes — Delivery time ~ hours / days Main idea — Peak load dimensioning & backup paths lots of leftover bandwidth Create a volume service for interconnecting datacenters X TBs from A to B within the next Y hours NetStitcher in a nutshell time
sender receiver Leftover b/w appears whenever and wherever You may have guessed already: store & forward is the solution
Stitching together leftover bandwidth is tricky Time zone A Time zone B
A storage overlay that is aware of leftover bandwidth leftover network bandwidth leftover edge bandwidth
NetStitcher’s bag of tricks No in-network constraints and time-aligned sender and receiver bandwidth availability — NetStitcher can perform end-to-end transfer
NetStitcher’s bag of tricks In-network constraints and time-aligned sender and receiver bandwidth availability — NetStitcher can perform multi-path overlay routing
NetStitcher’s bag of tricks In-network constraints and misaligned sender, receiver and intermediate node bandwidth availability — NetStitcher can perform multi-path and multi-hop store and forward
How do we schedule around all these constraints ?
Time expansion of a dynamic graph Src(1)Src(3) I 1 (1) I 1 (3) I 1 (2) I 2 (1) I 2 (3)I 2 (2)I2I2 I1I1 Dst(1) Dst(3) Dst(2)Dst N Src-I1 N I1-I2 N I2-Dst N Src-I1(1) N Src-I1(2) N I1-I2(2) N I2-Dst(2) N Src-I1(3) SrcSrc(2) Source Sink S Src (2) S Src (1) S I1 (1) S Dst (1) Network constraint Storage Constraint Intermediate node Destination Uplink & downlink constraints Source
S Src (2) Uplink & downlink constraints S Src (1) S I1 (1) I 1 (2) Src(2) N Src-I1(2) N I1-I2(2)
S Src (2) Uplink & downlink constraints S Src (1) I 1 (2)- Src(2)* U Src(2) D I1(2) Src(2)+ N Src-I1(2) I 1 (2)* S I1 (1) Src(2) I 1 (2) Src uplink constraint Network constraint I 1 downlink constraint
But we need to predict the future (of bandwidth) International backbone traffic
Prediction is easy when data are bulk 1. Periodic patterns 2. We care about VOLUMES not RATES — VOLUME = RATE(t) d(t) In our NetStitcher implementation we use: A simple Sparse Periodic Auto-regression Predictor (Chen et al., NSDI’08) Recomputation of transmission schedule End-game “pull mode” to handle occasional churn/prediction failure
Case study 1: Equinix datacenters datacenter at 22 locations all over North America
How much data can we backup? 3 hours used for backup (3-6 am local time at the datacenter) 1 Gbps network access capacity NetStitcher can move ×5 more bytes
Case study 2: Telefonica CDN 49 servers in Europe, Latin America and USA. GMT-1 to GMT-8 Need to send a 4.2 TB file over 24h. Beyond leftover 95 th -percentile pricing with $7/Mbps/month Storage cost: $0.055 GB/month Miami 1 1 Washington Dallas Palo Alto Colombia Peru Chile Argentina Spain UK Ger Brazil cds 1 1 Cz Rep FR USA New York LatAm Argentin a Colombi a Chile Peru Brazil TIWs Total Phase I Phase II 2011 TIWS End Point Phase I TIWS Entry Point Phase I Phase I OTF End Point Phase I OTF Entry Point Phase I cds Service Center Phase I Phase II TIWS End Point Phase II cds Service Center Phase II NetStitcher 80-90% cheaper between Europe & US
Conclusion: A practical application of DTNs The utilization of a network can be improved but for this we need: 1. Delay elastic traffic to go into off-peak hours 2. In-network storage 3. High-level knowledge of traffic behaviour around the day
More info at: