Download presentation

Presentation is loading. Please wait.

Published bySavanna Galey Modified over 2 years ago

1
New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks Brian Cho Indranil Gupta University of Illinois at Urbana-Champaign

2
Motivation: Ad-hoc Data Processing Data-intensive research on OpenCirrus – Federated cloud: diverse geographic locations – Data scale of TBs Limited wide area bandwidth is a big bottleneck : Can take days or weeks to transfer over internet [Garfinkel 07] Success story: Washington Post – Hillary Clinton White House schedule Released as 17,481 pages non-searchable PDF images Convert to searchable text and deliver to newsroom within the same news cycle – Done within 26 hours with Amazon AWS Pay for bandwidth and computer usage 2

3
Pandora (People and networks moving data around) – First ever solution to transfer data cooperatively between multiple sources with internet and shipping edges – Produce optimal transfer plans that obey time deadlines and minimize dollar cost Better than internet-only and shipping-only strategies Bulk Transfer Options Internet Transfer – Grid: [GridFTP] – PlanetLab: [CoBlitz 06] Disk Shipping Transfer – [Jim Gray 03] – [PostManet 04] – [DOT 06] – Amazon AWS Import/Export 3

4
5-20 Mbps 1TB: 5-20 days 5-20 Mbps 1TB: 5-20 days Data Source (Illinois) Option 1: Internet Transfer Computation Provider (Amazon) Computation Provider (Amazon) Data Source (CMU) Data Source (CMU) $0.10 per GB 4 No Cost

5
Disk Interface 40 MB/s Disk Interface 40 MB/s Overnight: $60 per Disk Two-Day: $30 per Disk Ground: $10 per Disk Overnight: $60 per Disk Two-Day: $30 per Disk Ground: $10 per Disk Data Source (Illinois) Data Source (Illinois) Option 2: Disk Shipping Transfer Computation Provider (Amazon) Computation Provider (Amazon) Data Source (CMU) Data Source (CMU) 5 Overnight: $50 per Disk Two-Day: $25 per Disk Ground: $5 per Disk Overnight: $50 per Disk Two-Day: $25 per Disk Ground: $5 per Disk $0.02 per GB $80 per Disk $0.02 per GB $80 per Disk Overnight: $40 per Disk Two-Day: $15 per Disk Ground: $5 per Disk Overnight: $40 per Disk Two-Day: $15 per Disk Ground: $5 per Disk

6
Cooperative Transfer Solutions Good solutions – Meet deadlines – Minimize dollar cost Complexity – Global scale – Many strategies – Collaboration helps How to find the best solution? 6 Open Cirrus Sites

7
15 Days Data Source A Data Source A No Cost Data Source B Data Source B Example: Minimize Dollar Cost Cloud Service Provider Cloud Service Provider 0.8 TB 1.2 TB Loading: $40 Handling: $80 Loading: $40 Handling: $80 Total Cost: $125 Total Time: 20 Days Total Cost: $125 Total Time: 20 Days 5 Days. Ground: $5 14 hours 7

8
Data Source A Data Source A 1 Day Overnight: $40 Data Source B Data Source B Example: Meet Deadline (3 days) while Minimizing Dollar Cost Cloud Service Provider Cloud Service Provider 0.8 TB 1.2 TB Loading: $40 Handling: $80 Loading: $40 Handling: $80 Total Cost: $210 Total Time: 3 Days Total Cost: $210 Total Time: 3 Days 1 Day. Overnight: $50. 14 hours 6 hours 8

9
Outline Motivation Problem Formulation – Graph Model – Flow Over Time Solution: Pandora Experimental Results Conclusion 9

10
Graph Model: Internet Links 10 inet_out inet_in inet_out inet_in Incoming/ Outgoing BW Incoming/ Outgoing BW Capacity (Mb/s) Cost ($/GB) Transit time (almost instantaneous) Capacity (Mb/s) Cost ($/GB) Transit time (almost instantaneous) Site ASite B

11
Graph Model: Shipment Links 11 inet_out inet_in ship_in inet_out inet_in ship_in Incoming/ Outgoing BW Incoming/ Outgoing BW Disk Interface BW e.g., 40 MB/s Cost: Loading ($/GB) Disk Interface BW e.g., 40 MB/s Cost: Loading ($/GB) Capacity (Mb/s) Cost ($/GB) Transit time (almost instantaneous) Capacity (Mb/s) Cost ($/GB) Transit time (almost instantaneous) Capacity (almost infinite) Cost: Shipping and Handling ($/Disk) Transit time (Hrs) Capacity (almost infinite) Cost: Shipping and Handling ($/Disk) Transit time (Hrs) Site ASite B

12
Data Transfer Over Time Goal: Meet time deadline T while minimizing dollar cost C Hard problem on graph with both Internet and Shipment links – NP-Hard – Formal problem and proof in paper Solution: Pandora computes optimal and approximate solutions 12

13
Solution: Pandora Overview Transform into static time-expanded network – Decomposition of shipping edges Solve min-cost flow on static network – Mixed Integer Program – Optimizations to reduce computation time 13

14
Time-expanded Network Intuitively, incorporate time into graph to create an extended graph representation Make T=deadline copies of each vertex Draw edges according to transit time Draw holdover edges [Ford Fulkerson 58] Disk shipment represented as time-expanded network 14 τ = 1 τ = 3 T = 5 time

15
Decomposed Shipping Edges Decompose shipping edges to fixed cost edges 1.Transit time 2.Fixed cost 3.Capacity 15 cost = $130 capacity = 2 TB cost = $110 capacity = 2 TB cost = $100 cap = 2 TB

16
Fixed-cost edges make min-cost flow calculation NP-Hard Mixed-Integer Program (MIP) – Binary variable y e defined on fixed-cost edges Goal: Minimize dollar cost Subject to – Capacity constraints (flow e capacity e y e ) – Conservation of flow – Demands of sources and sink Proof of NP-Hardness and formal MIP in paper Solution: Min-cost Flow Calculation using Mixed-Integer Program 16

17
Optimizations: Overview Size of MIP grows linearly with deadline T – Worst-case running time grows exponentially with T Reduce size of the MIP – Reduce number of shipment edges – Δ -condensed time-expanded networks More optimizations in paper 17

18
Optimizations: Reduce number of shipment edges Can remove redundant shipment edges Example: – Overnight shipment sent anytime before 4pm will arrive at destination at 8am 18 8am 4pm 3pm 2pm 1pm noon 7am

19
Optimization: Δ-condensed Time-expanded Network Each batch of consecutive Δ time units condensed into one virtual time unit Solution has – Minimum cost – Deadline approximation depending on Δ More details in paper [Fleischer Skutella 07] 19 Δ = 2

20
Experimental Setup Trace-driven – Wrote scripts to communicate with FedEx web services: queried package rates and destination time – Internet BW from PlanetLab measurements GNU Linear Programming Kit (GLPK) 20

21
Experimental Results: 8 sources, 0.25 TB per node, Heterogeneous BW 21 Direct Internet – Cost: $200 – Time: 280 hrs – Cannot take advantage of heterogeneous bandwidth Direct Overnight – Cost: $1,500 – Time: 38 hrs – Cannot fill disks to capacity 2 2 3 3 4 4 5 5 6 6 1 1 7 7 8 8 t t 0.25 TB x 8 Width proportional to BW

22
Experimental Results: 8 sources, 0.25 TB per node, Heterogeneous BW 22 1 1 2 2 3 3 4 4 5 5 8 8 t t 7 7 6 6 1.92 TB 0.14 TB 0.06 TB 0.08 TB Direct Internet – Cost: $200 – Time: 280 hrs – Cannot take advantage of heterogeneous bandwidth Direct Overnight – Cost: $1,500 – Time: 38 hrs – Cannot fill disks to capacity Pandora Deadline=96hrs – Cost: $183 – Time: < 96 hrs

23
Experimental Results: Optimizations Reducing shipment edges decreases computation time Using Δ-condensed time-expanded networks decreases computation time – Deadlines met in our experiments 23 2 sources 1 source

24
Conclusion First ever solution to transfer data cooperatively between multiple sources with internet and shipping edges Produce optimal transfer plans that obey time deadlines and minimize dollar cost Better than internet-only and shipping-only strategies Reasonable computation time by using optimizations 24

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google