Presentation is loading. Please wait.

Presentation is loading. Please wait.

SDN + Storage. Outline Measurement of storage traffic Network aware placement Control of resources SDN + Resource allocation – Predicting Resources utilization.

Similar presentations


Presentation on theme: "SDN + Storage. Outline Measurement of storage traffic Network aware placement Control of resources SDN + Resource allocation – Predicting Resources utilization."— Presentation transcript:

1 SDN + Storage

2 Outline Measurement of storage traffic Network aware placement Control of resources SDN + Resource allocation – Predicting Resources utilization – Bring it all together

3 HDFS Storage Patters – Maps reads from HDFS Local read versus Non-local read Rack locality or not 80% Locality!!!

4 HDFS Storage Patters – Maps reads from HDFS Local read versus Non-local read Rack locality or not 80% Cross-rack Traffic Cross-rack Traffic

5 HDFS Storage Patters – Reducers writes to HDFS 3 copies of file written to HDFS 2 rack local and 1 non-rack local Fault tolerance and good performance THERE MUST BE CROSS RACK TRAFFIC Ideal Goal: Minimize Congestion

6 Real Life Traces Analyze Facebook traces: – 33% of time spent in network – Network links are highly utilized; why? – Determine cause of network traffic 1.Job output 2.Job input 3.Pre-processing 1.Job output 2.Job input 3.Pre-processing

7 Current Ways To Improve HDFS Transfers Change Network Paths – Hedera, MicroTE, C-thru, Helios Change Network Rates – Orchestra, D3 Increase Network Capacity – VL2, Portland (Fat-Tree)

8 The case for Flexible Endpoints 90% 80% 20% 90% Traffic Matrix limits benefits – of techniques that change paths – of network rates Ability to Change Matrix is important

9 Flexible Endpoints in HDFS Recall: Constraint placed by HDFS – 3 replicas – 2 fault domains – Doesn’t matter where as long as constraints are met The source of transfer is fixed! – However destination, location of 3 replicas is not fixed

10 Sinbad Determine placement for block replica – Place replicas to avoid hotspots – Constraints: 3 copies Spread across 2 fault domains Benefits – Faster writes: – Faster transfers

11 Sinbad: Ideal Algorithm Input: – Blocks of diff size – Links of diff capacity Objective: – Minimize write time (transfer time) Challenges: Lack of future knowledge – Location & duration of hotspots – Size and arrival times of new replicas

12 Sinbad Heuristic Assumptions – Link utilizations are stable True for 5-10 seconds – All block have same size Fixed-size large blocks Heuristic: – Pick least-loaded link/path – Send block from file with least amount to send

13 Sinbad Architecture Recall: original DFS is master-slave architecture Sinbad has similar

14 Sinbad Determine placement for block replica – Place replicas to avoid hotspots – Constraints: 3 copies Spread across 2 fault domains Benefits – Faster writes: – Faster transfers

15 Orchestrating the Entire Cluster How to control Compute, Network, Storage? Challenges from SinBAD – How to determine future replica demands? You can’t control job arrival You can control task scheduling If you predict job characteristics you can determine future – How to determines future hot spots? Control all network traffic (SDN) Use future

16 Ideal Centralized Entity Controls: – Storage, CPU, N/W Determines: – Which task to run – Where to run the task – When to start Network transfer What rate to transfer at Which network path

17 Predicting Job Characteristics To predict resources that a job needs to complete, what do you need?

18 Predicting Job Characteristics – Job’s DAG (job’s traces history) – Computations time for each node – Data transfer size between nodes – Transfer time between nodes

19 Things you absolutely know! Input data – Size of input data – Location of all replicas – Split of input data Job’s D.A.G – # of Map – # of Reduce Map HDFS Reduce HDFS 200 GB 3 Mappers 2 Reducers

20 Approaches to Prediction: Input/intermediate/Output Data Assumption: – Map & Reduce run same code over and over – Code gives the same ratio of reduction E.g. 50% reduction from Map to intermediate E.g. 90% reduction from intermediate to output Implications: – Given size of input, you can determine size of future transfers Problems: – Not always true!!! Map HDFS Reduce HDFS 200 GB 100 GB 10 GB 10 GB

21 Approaches to Prediction: Task Run Time Assumption: – Task is dominated by reading input – Time to run a task is essentially time to read input If Local: Time to read from Disk If non-local: Time to read across Network Implication: – If you can model read time you can determine task run time Problems: – How do you model disk I/O? – How do you model I/O interrupt contention? Map HDFS Reduce HDFS 200 GB 100 GB 10 GB 10 GB

22 Predict Job Runs Given: – Prediction of tasks, transfers, and of Dag Can you predict job completion time? – How do you account for interleaving between jobs? – How do you determine optimal # of slots? – How do you determine optimal network bandwidth?

23 Really easy right? – But what happens if the network only has 2 slots You can’t run map in parallel Map HDFS Reduce HDFS 200 GB 100 GB 10 GB 10 GB sec 40 sec0 sec

24 Which tasks to run in which order? How many slots to assign? Map HDFS Reduce HDFS 200 GB 100 GB 10 GB 10 GB sec 3 sec 13 sec33 sec

25 Approaches to Prediction Job Run Times Assumption: – Job Runtime  Function (# slots) Implication: – Given N slots, I can predict completion time Jockey Approach [EuroSys’10] – Track job progress: fraction of completed tasks – Build a map of [{% done + # of slots}  time to complete] – Use simulator to build map – Iterate through all possible combination of # of slots and %done. Problems: – Ignores network transfers: Network congestion – Cross job contention on server can impact completion time – Not all tasks are equal: # of tasks done isn’t a good representation of progress

26 Open Questions What about background traffic? – Control messages – Other bulk transfer What about unexpected events? – Failures? – Loss of data? What about protocol inefficiencies? – Hadoop scheduling – TCP inefficiencies – Server scheduling

27


Download ppt "SDN + Storage. Outline Measurement of storage traffic Network aware placement Control of resources SDN + Resource allocation – Predicting Resources utilization."

Similar presentations


Ads by Google