The Only Constant is Change: Incorporating Time-Varying Bandwidth Reservations in Data Centers Di Xie, Ning Ding, Y. Charlie Hu, Ramana Kompella 1.

Slides:



Advertisements
Similar presentations
Network Resource Broker for IPTV in Cloud Computing Lei Liang, Dan He University of Surrey, UK OGF 27, G2C Workshop 15 Oct 2009 Banff,
Advertisements

The Only Constant is Change: Incorporating Time-Varying Bandwidth Reservations in Data Centers Di Xie, Ning Ding, Y. Charlie Hu, Ramana Kompella 1.
Cloud Service Models and Performance Ang Li 09/13/2010.
Towards Predictable Datacenter Networks
University of Minnesota Optimizing MapReduce Provisioning in the Cloud Michael Cardosa, Aameek Singh†, Himabindu Pucha†, Abhishek Chandra
LIBRA: Lightweight Data Skew Mitigation in MapReduce
SLA-Oriented Resource Provisioning for Cloud Computing
Locality-Aware Dynamic VM Reconfiguration on MapReduce Clouds Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng.
The Who, What, Why and How of High Performance Computing Applications in the Cloud Soheila Abrishami 1.
Kangaroo: Video Seeking in P2P Systems Xiaoyuan Yang †, Minas Gjoka ¶, Parminder Chhabra †, Athina Markopoulou ¶, Pablo Rodriguez † † Telefonica Research.
Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.
Green Cloud Computing Hadi Salimi Distributed Systems Lab, School of Computer Engineering, Iran University of Science and Technology,
1 Pricing Cloud Bandwidth Reservations under Demand Uncertainty Di Niu, Chen Feng, Baochun Li Department of Electrical and Computer Engineering University.
Reciprocal Resource Fairness: Towards Cooperative Multiple-Resource Fair Sharing in IaaS Clouds School of Computer Engineering Nanyang Technological University,
Xavier León PhD defense
60 GHz Flyways: Adding multi-Gbps wireless links to data centers
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.
Ashish Gupta, Marcia Zangrilli, Ananth I. Sundararaj, Peter A. Dinda, Bruce B. Lowekamp EECS, Northwestern University Computer Science, College of William.
Inferring the Topology and Traffic Load of Parallel Programs in a VM environment Ashish Gupta Resource Virtualization Winter Quarter Project.
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.
Inferring the Topology and Traffic Load of Parallel Programs in a VM environment Ashish Gupta Peter Dinda Department of Computer Science Northwestern University.
By- Jaideep Moses, Ravi Iyer , Ramesh Illikkal and
Bandwidth Measurements for VMs in Cloud Amit Gupta and Rohit Ranchal Ref. Cloud Monitoring Framework by H. Khandelwal, R. Kompella and R. Ramasubramanian.
TitleEfficient Timing Channel Protection for On-Chip Networks Yao Wang and G. Edward Suh Cornell University.
Edge Based Cloud Computing as a Feasible Network Paradigm(1/27) Edge-Based Cloud Computing as a Feasible Network Paradigm Joe Elizondo and Sam Palmer.
Data Mining on the Web via Cloud Computing COMS E6125 Web Enhanced Information Management Presented By Hemanth Murthy.
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
Network Sharing Issues Lecture 15 Aditya Akella. Is this the biggest problem in cloud resource allocation? Why? Why not? How does the problem differ wrt.
Sharing the Data Center Network Alan Shieh, Srikanth Kandula, Albert Greenberg, Changhoon Kim, Bikas Saha Microsoft Research, Cornell University, Windows.
Location-aware MapReduce in Virtual Cloud 2011 IEEE computer society International Conference on Parallel Processing Yifeng Geng1,2, Shimin Chen3, YongWei.
Network Aware Resource Allocation in Distributed Clouds.
David G. Andersen CMU Guohui Wang, T. S. Eugene Ng Rice Michael Kaminsky, Dina Papagiannaki, Michael A. Kozuch, Michael Ryan Intel Labs Pittsburgh 1 c-Through:
Improving Network I/O Virtualization for Cloud Computing.
An Online Auction Framework for Dynamic Resource Provisioning in Cloud Computing Weijie Shi*, Linquan Zhang +, Chuan Wu*, Zongpeng Li +, Francis C.M. Lau*
CSE 548 Advanced Computer Network Security Document Search in MobiCloud using Hadoop Framework Sayan Cole Jaya Chakladar Group No: 1.
C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection Marco Canini (UCL) with Lalith Suresh, Stefan Schmid, Anja Feldmann (TU Berlin)
CloudNaaS: A Cloud Networking Platform for Enterprise Applications Theophilus Benson*, Aditya Akella*, Anees Shaikh +, Sambit Sahu + (*University of Wisconsin,
1 Finding Constant From Change: Revisiting Network Performance Aware Optimizations on IaaS Clouds Yifan Gong, Bingsheng He, Dan Li Nanyang Technological.
Dynamic Resource Monitoring and Allocation in a virtualized environment.
Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China Dec 19.
Quantifying and Improving I/O Predictability in Virtualized Systems Cheng Li, Inigo Goiri, Abhishek Bhattacharjee, Ricardo Bianchini, Thu D. Nguyen 1.
MC 2 : Map Concurrency Characterization for MapReduce on the Cloud Mohammad Hammoud and Majd Sakr 1.
Automated Bandwidth Allocation Problems in Data Centers Yifei Yuan, Anduo Wang, Rajeev Alur, Boon Thau Loo University of Pennsylvania.
Optimizing Live Migration of Virtual Machines across Wide Area Networks using Integrated Replication and Scheduling Sumit Kumar Bose, Unisys Scott Brock,
The Only Constant is Change: Incorporating Time-Varying Bandwidth Reservations in Data Centers Di Xie, Ning Ding, Y. Charlie Hu, Ramana Kompella 1.
1 Integrating security in a quality aware multimedia delivery platform Paul Koster 21 november 2001.
1 Enabling Efficient and Reliable Transitions from Replication to Erasure Coding for Clustered File Systems Runhui Li, Yuchong Hu, Patrick P. C. Lee The.
MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.
Network-Aware Scheduling for Data-Parallel Jobs: Plan When You Can
Efficient Live Checkpointing Mechanisms for computation and memory-intensive VMs in a data center Kasidit Chanchio Vasabilab Dept of Computer Science,
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
Optimizing Live Migration of Virtual Machines across Wide Area Networks using Integrated Replication and Scheduling Sumit Kumar Bose, Unisys Scott Brock,
Feifei Chen Swinburne University of Technology Melbourne, Australia
Author Utility-Based Scheduling for Bulk Data Transfers between Distributed Computing Facilities Xin Wang, Wei Tang, Raj Kettimuthu,
Towards Predictable Data Centers Why Johnny can’t use the cloud and what we can do about it? Hitesh Ballani, Paolo Costa, Thomas Karagiannis, Greg O’Shea.
Zeta: Scheduling Interactive Services with Partial Execution Yuxiong He, Sameh Elnikety, James Larus, Chenyu Yan Microsoft Research and Microsoft Bing.
EuroSys Doctoral Workshop 2011 Resource Provisioning of Web Applications in Heterogeneous Cloud Jiang Dejun Supervisor: Guillaume Pierre
6.888 Lecture 6: Network Performance Isolation Mohammad Alizadeh Spring
R2C2: A Network Stack for Rack-scale Computers Paolo Costa, Hitesh Ballani, Kaveh Razavi, Ian Kash Microsoft Research Cambridge EECS 582 – W161.
Md Baitul Al Sadi, Isaac J. Cushman, Lei Chen, Rami J. Haddad
By Chris immanuel, Heym Kumar, Sai janani, Susmitha
Cloud-Assisted VR.
Abstract Major Cloud computing companies have started to integrate frameworks for parallel data processing in their product portfolio, making it easy for.
Cloud-Assisted VR.
Haishan Zhu, Mattan Erez
Multi-hop Coflow Routing and Scheduling in Data Centers
CloudMirror: Application-Driven Bandwidth Guarantees in Datacenters
Cloud Computing MapReduce in Heterogeneous Environments
Towards Predictable Datacenter Networks
Presentation transcript:

The Only Constant is Change: Incorporating Time-Varying Bandwidth Reservations in Data Centers Di Xie, Ning Ding, Y. Charlie Hu, Ramana Kompella 1

Review  Towards Predictable Datacenter Networks  SIGCOMM ’11  Virtual Network Abstractions: Virtual Cluster & Virtual Oversubscribed Cluster  Oktopus system: allocation methods – greedy algorithm  Performance guarantees, Tenants costs, Provider revenue 2

Contrast 3 PaperTowards Predictable Datacenter Networks The Only Constant is Change: Incorporating Time-Varying Network Reservations in Data Centers ConferenceSIGCOMM 11SIGCOMM 12 TeamMicrosoft ResearchPurdue University ProblemPerformance guarantee Tenants costs Provider revenue Datacenter utilization Tenants cost Virtual NetworkVC/VOCTIVC (Time-Interleaved Virtual Clusters) Allocation methodsGreedy algorithmsDynamic Programming

Cloud Computing is Hot 4 Private Cluster

Key Factors for Cloud Viability Cost Performance BW variation in cloud due to contention Causing unpredictable performance 5

Reserving BW in Data Centers SecondNet [Guo’10] – Per VM-pair, per VM access bandwidth reservation Oktopus [Ballani’11] – Virtual Cluster (VC) – Virtual Oversubscribed Cluster (VOC) 6

How BW Reservation Works 7... Virtual Cluster Model Time Bandwidth N VMs Virtual Switch 1. Determine the model 2. Allocate and enforce the model 0T B Only fixed-BW reservation Request

Network Usage for MapReduce Jobs Hadoop Sort, 4GB per VM Hadoop Word Count, 2GB per VM Hive Join, 6GB per VM Hive Aggregation, 2GB per VM 8 Time-varying network usage

Motivating Example 4 machines, 2 VMs/machine, non-oversubscribed network Hadoop Sort – N: 4 VMs – B: 500Mbps/VM 1Gbps 500Mbps Not enough BW 9

Motivating Example 4 machines, 2 VMs/machine, non-oversubscribed network Hadoop Sort – N: 4 VMs – B: 500Mbps/VM 10 1Gbps 500Mbps

Under Fixed-BW Reservation Model 11 1Gbps 500Mbps Job3 Job2 Virtual Cluster Model Job1 Time Bandwidth

Under Time-Varying Reservation Model 12 1Gbps 500Mbps TIVC Model Job1 Time Job2 Job3 Job4 Job5 J1J2J3J4J5 Bandwidth Doubling VM, network utilization and the job throughput Hadoop Sort

Temporally-Interleaved Virtual Cluster (TIVC) Key idea: Time-Varying BW Reservations Compared to fixed-BW reservation – Improves utilization of data center Better network utilization Better VM utilization – Increases cloud provider’s revenue – Reduces cloud user’s cost – Without sacrificing job performance 13

Challenges in Realizing TIVC What are the right model functions? How to automatically derive the models? How to efficiently allocate TIVC? 14

How to Model Time-Varying BW? 15 Hadoop Hive Join

TIVC Models 16 Virtual Cluster T 11 T 32

Hadoop Sort 17

Hadoop Word Count 18 v

Hadoop Hive Join 19

Hadoop Hive Aggregation 20

Our Approach Observation: Many jobs are repeated many times – E.g., 40% jobs are recurring in Bing’s production data center [Agarwal’12] – Of course, data itself may change across runs, but size remains about the same Profiling: Same configuration as production runs – Same number of VMs – Same input data size per VM – Same job/VM configuration 21 How much BW should we give to the application?

Impact of BW Capping 22 No-elongation BW threshold

Generate Model for Individual VM 1.Choose B b 2.Periods where B > B b, set to B cap 23 BW Time B cap BbBb

Maximal Efficiency Model Enumerate B b to find the maximal efficiency model 24 BW Time B cap BbBb

TIVC Allocation Algorithm Spatio-temporal allocation algorithm – Extends VC allocation algorithm to time dimension – Employs dynamic programming 25

TIVC Allocation Algorithm Bandwidth requirement of a valid allocation 26

TIVC Allocation Algorithm Allocate VMs needed by a job Dynamic programming with depth & VMs 27 Depth + VM numbers + Observation: suballocation of K1 VMs in a depth-(d-1) subtree can be reused in searching for a valid suballocation of K2 VMs in the parent depth-d subtree (K2 > K1)

Challenges in Realizing TIVC What are the right model functions? How to automatically derive the models? How to efficiently allocate TIVC? 28

Proteus: Implementing TIVC Models Determine the model 2. Allocate and enforce the model

Evaluation Large-scale simulation – Performance – Cost – Allocation algorithm Prototype implementation – Small-scale testbed 30

Simulation Setup 3-level tree topology – 16,000 Hosts x 4 VMs – 4:1 oversubscription 31 50Gbps 10Gbps … … … 1Gbps … 20 Aggr Switch 20 ToR Switch 40 Hosts ………

Batched Jobs Scenario: 5,000 time-insensitive jobs 32 42%21%23%35% 1/3 of each type Completion time reduction All rest results are for mixed

Varying Oversubscription and Job Size % reduction for non-oversubscribed network

Dynamically Arriving Jobs Scenario: Accommodate users’ requests in shared data center – 5,000 jobs, Poisson arrival, varying load 34 Rejected: VC: 9.5% TIVC: 3.4% Rejected: VC: 9.5% TIVC: 3.4%

Analysis: Higher Concurrency Under 80% load 35 7% higher job concurrency 28% higher VM utilization Rejected jobs are large 28% higher revenue Charge VMs VM

Tenant Cost and Provider Revenue Charging model – VM time T and reserved BW volume B – Cost = N (k v T + k b B) – k v = 0.004$/hr, k b = $/GB 36 12% less cost for tenants Providers make more money Amazon target utilization

Testbed Experiment Setup – 18 machines – Tc and NetFPGA rate limiter Real MapReduce jobs Procedure – Offline profiling – Online reservation 37

Testbed Result 38 TIVC finishes job faster than VC, Baseline finishes the fastest TIVC finishes job faster than VC, Baseline finishes the fastest

Conclusion Network reservations in cloud are important – Previous work proposed fixed-BW reservations – However, cloud apps exhibit time-varying BW usage We propose TIVC abstraction – Provides time-varying network reservations – Automatically generates model – Efficiently allocates and enforces reservations Proteus shows TIVC benefits both cloud provider and users significantly 39

Thanks 40