Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Finding Constant From Change: Revisiting Network Performance Aware Optimizations on IaaS Clouds Yifan Gong, Bingsheng He, Dan Li Nanyang Technological.

Similar presentations


Presentation on theme: "1 Finding Constant From Change: Revisiting Network Performance Aware Optimizations on IaaS Clouds Yifan Gong, Bingsheng He, Dan Li Nanyang Technological."— Presentation transcript:

1 1 Finding Constant From Change: Revisiting Network Performance Aware Optimizations on IaaS Clouds Yifan Gong, Bingsheng He, Dan Li Nanyang Technological University Tsinghua University 1 1 2 1 2

2 Outline Introduction RPCA-based approach Evaluation results Conclusions 2

3 Outline Introduction RPCA-based approach Evaluation results Conclusions 3

4 4 Infrastructure as a Service is Pervasive Many cloud providers supply IaaS clouds Amazon EC2 Google Compute Engine Rackspace IaaS clouds have emerged as a popular computing infrastructure for many distributed applications Life Science [Nagavaram 2011] Physics [Nunez 2010] Big data processing [He 2013]

5 5 Network Heterogeneity in IaaS cloud Topology of data center in IaaS cloud switch machines rack Network Heterogeneity switch machine rack Low Performance High Performance Heterogeneity

6 Network Performance-aware Optimization: State-of-the-art Network Performance-aware Optimization is an effective approach to optimize distributed applications on cloud MPI Collective Operation [Gong 2013] Topology Mapping [Hoefler 2011] Workflow Management [Spooner 2005 ] 6 Definition of Network Performance-aware Optimization Estimating or measuring the all-link network performance and carefully selecting communication links for minimizing the network transfer time of the application

7 Example of MPI Broadcast MPI Broadcast 1 2 3 4 5 6 7 0 Binominal Tree Algorithm 1234567 0 Root Machine data 7

8 Network Performance-aware Optimization of MPI Broadcast 8 1 2 3 4 5 6 7 0 1 1 1 1 2 m0 Topology m1 m6 m7 m2 m3 m4 m5 1 1 1 1 2 6 6 2 6 6 6 2 Total Cost=6+6+6=18 1 6 2 5 3 7 4 0 6 2 2 2 2 2 2 Total Cost=6+2+2+2=12 m0m1m2m3m4m5m6m7 m006666622 m160222266 m262022266 m362202266 m462220266 m562222066 m626666602 m726666620 Performance Matrix

9 Network Interference Problem 9 Machine 0 Application Machine 1 Application Background Application Network Interference Interference Network performance is changing as the time goes by –network interference

10 Distribution of Latency and Bandwidth between a pair of instance (virtual machine) in a long period Cloud Performance Dynamics: Latency/Bandwidth The range of Bandwidth(45~110MB/s) 10 Network performance is changing as the time goes by Interference largely impacts the network performance Network performance-aware optimization could not work

11 Our Work 11 Our solution: Robust Principal Component Analysis  Decouple the constant component from the dynamic network performance while minimizing the difference between the network performance and the constant component  Use the interference component to determine the effectiveness of network performance aware optimizations  Utilize the constant component to guide network performance aware optimizations Problem  How to distinguish network interference in the cloud?  How to enable existing network performance-aware optimizations in the dynamic cloud environment?

12 12 Robust Principal Component Analysis(RPCA) Frame 1 at t 1 Frame 2 at t 2 RPCA Low rank matrix D Sparse matrix E Mathematic Problem minimize rank(D) + λ ∥ E ∥ subject to A = D + E Video surveillance Figures are cited from [Wright 2009] * Frame 3 at t 3 Frame 4 at t 4 + Video

13 Outline Introduction RPCA-based approach Evaluation results Conclusions 13

14 Definition (b)TP-Matrix A Performance Matrix We can measure the performance of each link for a set of instances and the all-link network performance are defined as Performance Matrix Temporal Performance Matrix (TP-matrix) Performance matrix can only reflect a snapshot of network performance at a certain point of time. We define Temporal Performance Matrix as the combination of Performance matrix in a continuous time interval.

15 15 Analogy to RPCA in computer vision Frame 1 at t 1 Frame 2 at t 2 Frame 3 at t 3 Frame 4 at t 4 Performance Matrix at t1 02.16 6 0 0 0 2 2 66 66 66.1 Reshape 0 2.1 660 0 0 22 66666 6.1 t1t1 Performance Matrix at t2 026 6 0 0 0 2.1 2 6.16 6 6 Reshape 0 2 6 6 0 0 0 2.1 2 6.1 6 6 6 t2t2 Combine t 1 t 2, …, t 4 0 2.1 660 00 22 66666 6.1 0 26 6 0 0 0 2.1 2 6.1 6 66 02 6 0 0 022 2 6666 6 0 2.1 6 6.1 0 00 22 2 6 6 6 6 t1t1 t3t3 t2t2 t4t4 0 2 6 0 0 0 2 2 2 6 6 6 6 6 t3t3 0 2.1 66.10 0 0 2 2 2 6 6 6 6 t4t4 Temporal Performance Matrix Video

16 16 02660 0 022 2 66666 6 02660 0 022 2 66666 6 02660 0 022 2 66666 6 02660 0 022 2 66666 6 + 0.1 RPCA Video (Original matrix A) Analogy to RPCA in computer vision (con’t) RPCA Temporal Performance Matrix A 0 2.1 660 00 22 66666 6.1 0 26 6 0 0 0 2.1 2 6.1 6 66 02 6 0 0 022 2 6666 6 0 2.1 6 6.1 0 00 22 2 6 6 6 6 t1t1 t3t3 t2t2 t4t4 matrix D (Low rank ) matrix E (Sparse) + Matrix D (Constant) Matrix E (Interference)

17 17 RPCA-based Approach 026 6 0 0 0 2 2 2 66 66 66 02660 0 022 2 66666 6 02660 0 022 2 66666 6 02660 0 022 2 66666 6 02660 0 022 2 66666 6 02660 0 022 2 66666 6 + Reshape Matrix D (Constant) Matrix E (Interference) Performance Matrix 0.1 Network Performance-aware Optimization Calculate quantity of interference

18 A is the temporal performance matrix from direct measurement Use Robust PCA to distinguish the interference minimize rank(D) + λ ∥ E ∥ subject to A = D + E D is constant and E represents the interference Use matrix E to calculate quantity of interference in the cluster and predict the efficiency of network performance- aware optimizations Norm (E) = ∥ E ∥ / ∥ A ∥ Use matrix D to optimize some applications 18 Basic procedure

19 Model calibration How to reduce the calibration overhead for each performance matrix? How many performance matrix to calibrate? Update maintenance When to re-calibrate the network performance? 19 Implementation Details Detailed techniques are in the paper.

20 Outline Introduction RPCA-based approach Evaluation results Conclusions 20

21 Experiment Setup 21 Experiments with Amazon EC2 196 instances m1.medium Simulation with ns-2 Tree-structured topology 1024 machines: totally 32 racks (10Gb/s) and each rack contains 32 servers (1Gb/s) Application MPI Broadcast/Scatter Topology Mapping Conjugate gradient (CG) N-body

22 Comparisons 22 Baseline This simulates the scenario of running directly in the cloud environment, essentially without network performance aware optimizations Heuristics We capture the TP-matrix and use the average value of each link to optimize the applications. RPCA The network performance aware optimizations are guided by the constant part captured by our RPCA approach.

23 23 The impact of interference on different applications in the virtual cluster (RPCA approach) The impact of interference on different approaches in the virtual cluster (MPI Scatter) 25% 32% 37% 24% 32% Norm(E) The impact of interference The interference in Amazon EC2 is relatively low (about 0.1) Performance improvement in Amazon EC2 is about 25%-37%

24 Overall Performance of CG 24 31%14% The trend of improvement When the vector size is small, our algorithm is slower than MPICH2-based CG, due to the calibrating and calculation overheads. As the vector size increases, the performance gain compensates the overhead, with 31% and 14% performance improvement over baseline and Heuristics.

25 Background traffic simulation 25 Choose the links and then vary two parameters to control the background traffic  The distribution of waiting time between sending the message  poisson distribution and the expected value is λ  Message size The impact of network interference on Norm(E) Norm(E) (a) Different expected values (λ) of Poisson Distribution (b) Different message size Norm(E) clearly has positive correlations with the background traffic in the simulated cluster.

26 Outline Introduction RPCA-based approach Evaluation results Conclusions 26

27 Conclusions Network is dynamic in the cloud and network performance- aware optimizations could not be directly used because of network interference. We distinguish the constant and interference from the network and calculate the quantity of interference Norm (E) = ∥ E ∥ / ∥ A ∥ We study the relationship between the interference and the network aware optimization efficiency We propose approaches to enable existing/new network performance-aware optimizations 27

28 Thank you! 28 0 2.1 6 6 2 0 6 6 6 6 0 2 6 6.1 2.1 0 0 2 6 6 2.1 0 6.1 6 6.1 6 0 2 6 6.1 2.1 0


Download ppt "1 Finding Constant From Change: Revisiting Network Performance Aware Optimizations on IaaS Clouds Yifan Gong, Bingsheng He, Dan Li Nanyang Technological."

Similar presentations


Ads by Google