Presentation is loading. Please wait.

Presentation is loading. Please wait.

Black Box Methods for Inferring Parallel Applications' Properties in Virtual Environments Ashish Gupta March 2008 PhD Final Talk Committee: Prof. Peter.

Similar presentations


Presentation on theme: "Black Box Methods for Inferring Parallel Applications' Properties in Virtual Environments Ashish Gupta March 2008 PhD Final Talk Committee: Prof. Peter."— Presentation transcript:

1 Black Box Methods for Inferring Parallel Applications' Properties in Virtual Environments Ashish Gupta March 2008 PhD Final Talk Committee: Prof. Peter Dinda Prof. Fabian Bustamante Prof. Yan Chen Prof. Dongyan Xu (Purdue University)

2 Introduction

3 3 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Background Virtual Machine Distributed Computing Virtuoso –Middleware for autonomic Virtual Machine distributed computing –Presents a simple abstraction for distributed computing, insulating from underlying computational, networking and middleware complexities –VMM abstracts computational resources –VNET abstracts different networking domains into one – also ideal point for monitoring –Autonomic Resource Management

4 4 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Problem Introduction Adaptation –Resources can be heterogeneous (CPU, memory etc) –If shared, then resources availability can also highly dynamic –Application demands also change ! Autonomic computing: –What is available ? –What is required ? –How can we effectively match the two ? –One of the major components  What does the distributed application want ? –Should work with existing unmodified applications and OS

5 5 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Adaptation in Virtuoso Input for adaptation problem formalization in my colleague Ananth Sundararaj’s thesis Different input components Many more inferable properties VM/application demands Resource availability User constraints

6 6 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Thesis Statement My thesis is that it is feasible to infer various useful demands and behavior of a parallel application running inside a collection of VMs to a significant degree using a black box model. To evaluate this thesis, I enumerate and define various demands and types of behavior that can be inferred, and also design, implement and evaluate ideas and approaches towards inferring these. One of the demands I infer is the communication behavior and the runtime topology of a parallel application. I also show how to infer some very useful runtime properties of a parallel application like its runtime performance, its slowdown under external load and its global bottlenecks. Significantly all of this is done using black-box assumptions and without specific assumptions about the application or the operating system. I also give evidence of how automatic black box inference can assist in adapting the application and its resource usage resulting in improved performance. Chapter 2 Chapter 3 Chapter 4 Appendices A, B

7 7 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Black box assumption and impact Black Box – Can make no assumptions about the implementation, behavior or internal state of the Guest OS/module beyond its external interface Lowers barrier to adoption of the new inference techniques helps deploy my work to legacy applications Mainly accomplished by looking at external signals: traffic, host load etc. Not tied to Virtuoso Other systems like softUDC [91], XenoServer [52], SODA [87], Violin [88], In-VIGO [13], VioCluster [136] can also benefit from black box application inference Potentially any virtual distributed system

8 8 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Components of my dissertation Virtual Topology and Traffic Inference Framework Black box metrics for Absolute Performance Ball in The Court principles to compute application slowdown Global Bottlenecks using time decomposition Increasing Application Performance In Virtual Environments through Run- time Inference and Adaptation Free Network Measurement For Adaptive Virtualized Distributed Computing Inference Adaptation 1 2 3 4 5 6

9 9 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Topics I cover Virtual Topology and Traffic Inference Framework Black box metrics for Absolute Performance Ball in The Court principles to compute application slowdown Global Bottlenecks using time decomposition Increasing Application Performance In Virtual Environments through Run- time Inference and Adaptation Free Network Measurement For Adaptive Virtualized Distributed Computing Inference Adaptation 2 3 4 5 A B

10 10 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments BSP application model Multiple processes executing a common kernel Execution alternates between one or more computing phases and one or more communication phases Combination of a schedule of computation and communication phases  Super-step A very popular model for implementing a large variety of scientific applications and parallel algorithms Original paper: Valiant [167]

11 11 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Patterns Synthetic workload generator developed before. Models a BSP application Can execute many different types of topologies common in BSP programs Some parameters Topology Number of processors Message size # of iterations Flops per element Memory Reads/Writes per element Topologies N-dimensional mesh N-dimensional torus N-dimensional hypercube Binary reduction tree All to All

12 12 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Patterns application’s capabilities 2-D Mesh 3-D Toroid3-D Hypercube Reduction Tree All-to-All

13 13 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments NAS parallel benchmarks Developed by NASA [17, 172] Set of programs to evaluate performance of parallel supercomputers Representative of CFD applications To generate realistic parallel application workloads 5 kernel benchmarks: EP, MG, FT, IS, CG –Five very frequently used numeric methods

14 14 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Different contributions of my dissertation Virtual Topology and Traffic Inference Framework Black box metrics for Absolute Performance Ball in The Court principles to compute application slowdown Global Bottlenecks using time decomposition Increasing Application Performance In Virtual Environments through Run- time Inference and Adaptation Free Network Measurement For Adaptive Virtualized Distributed Computing Inference Adaptation 2 3 4 5 A B

15 Collective Inference for Topology: VTTIF

16 16 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Goal of VTTIF Low Level Traffic Monitoring ? Application Topology An online topology inference framework for a VM environment

17 17 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Traffic Analyzer Rate based Change detection Traffic Matrix Query Agent VM Network Scheduling Agent VNET daemon VM VNET overlay network To other VNET daemons Physical Host VTTIF Architecture

18 18 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Inferred topology Parallel Integer Sort

19 Black Box Measures of performance

20 20 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments The problem Performance of BSP applications – an important goal Lot of work dedicated to improving performance of parallel applications, e.g. –Virtuoso –VioCluster How do we measure performance in a black box fashion ? –The current way in Virtuoso is manual (e.g. Lin et.al. [106]) Impact –Would enable superior adaptation algorithms –Automated evaluation and adaptation cycle –Generate reports on effectiveness of different adaptation/scheduling methods

21 21 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Cost model for BSP applications Popular strategy: break super-step into its components: –computational cost –Communication cost of the global exchange of the data –Synchronization cost Computation cost Communication cost Number of super steps Sync latency cost Speed of computation in FLOPS Static model of performance + requires detailed application profiling and access to source code

22 22 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Super-step approach Super-step structure an invariant –No. of steps depend on parameters and data Another possible measure of performance: number of super-steps executed per second, or the iteration rate  dynamic metric Multiple super-steps for dynamic applications  iteration rate is not constant

23 23 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments A new black box metric: Round Trip Iteration Rate (RIR) Based solely on communication behavior of the application Correlated to the iteration rate Indirectly measures number of process interactions  this indicates progress as synchronization happens at end of a super-step Approach: Examined various properties of the traffic trace Inter send-packet delay exhibits interesting properties

24 24 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Inter send-packet delay 176 Receive from 175 176 Send to 175 176 Receive from 175 176 Send to 175

25 25 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Inter send-packet delay Traffic trace for Patterns Message size = 4000 bytes Computation per iteration: 100 MFlops Clustering based on inter-send delays Count in cluster matches actual iteration rate output by application (325)

26 26 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Patterns: clusters without load Actual: 568 Reported: 569

27 27 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Patterns: clusters with external CPU load Actual: 142 Reported: 153 Actual execution time ratio: 3.922 Ratio from reported iteration rate: 3.72 Within 5%

28 28 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Why Does Circled Bin Correspond To Iteration Rate? Each iteration consists of send, receive and compute phases Number of items in large inter send-packet delay cluster represent the group that represent an inter-process interaction Each inter-process interaction represents progress in the super-step time Send packets

29 29 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Plotting inter send-packet delay for MG 1.No clean clusters for a more complex application 2.Delays shift towards right on greater load

30 30 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Computing the RIR metric Previous examples were for Patterns – static performance case, easy clustering Applications like MultiGrid from NAS benchmarks  changing iteration rate For a given packet time series, 1. Count send pairs whose inter-packet delay exceeds by c * RTT 2. Send pair must be interleaved by one receive Based on BSP principles

31 31 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments RIR time series For dynamic applications, RIR changes with time Need a time series for RIR over the trace

32 32 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Outputting RIR time series - Workflow Sniff Packets Send packets that obey the conditions Slide a 1 second window over these send packets Sliding interval =  t Get a new time series denoting RIR for each 1 second sliding window Sampling duration = T Derivative Metrics from the above time series Average RIR CDF Power Spectrum Super-phase period Using tcpdump/libpcap for the VM traffic Send packets satisfying the two conditions 1 sec Slide by  t i 1, i 2, i 3, i 4, i 5, …. i n Each number represents number of iterations for a particular 1 sec window instance

33 33 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Representing Dynamic Performance Define a spectrum of metrics for dynamic performance RIR avg RIR-CDF RIR-PS RIR-PSE Long term stationary average of RIR time series CDF of RIR time series indicating spread of iteration rates Phase structure, periodicities, application fingerprinting, statistical scheduling Summary of the periodic behavior for multiple supersteps

34 34 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Computing the stationary Average – sampling issues Sliding window resolution (sampling rate) –Needs to be high enough to capture the any important high frequency behavior Capture duration –Enough to capture the stationarity of the signal (repetition of all super- steps) Assumption: –iteration dynamics of the application are indeed empirically stationary for the long run. –For a dynamic application that consist of repetitive phases, it means capturing enough of its performance behavior to capture this repetitive element. Capturing stationarity Power Spectrum based techniques help us determine the right sampling rate and duration Details in dissertation

35 35 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Effectiveness of RIR avg For MG application, running under different load conditions (100% and 60% CPU load), predicted execution time error rates were 13% and 7% respectively (completely black box) Value of c = 1.1 here (c*RTT factor)

36 36 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments RIR time series graph A super-step

37 Can we predict slowdown of application if we put it under load ? For the IS and MG applications, which application may be hurt more if one of the processes from each application shares the physical host with an external computational load? The impact: we can now determine in advance, the impact of external load if we must choose one of these applications to be influenced by the load. Scheduling fact: Depending the scheduling algorithm, affect of load on different RIR regions can be different (Govindan et.al. [60]) Reason: Scheduling handled differently for CPU bound processes vs I/O bound processes “Providing enough CPU is not enough, an equally important consideration is to provide the CPU at the right time”

38 38 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Role of CDF RIR-CDF can be used to predict which application will be more affected by external load (for dynamic applications) Very useful when extra load needs to be introduced over existing applications due to demand Scheduling using the CDF: –From the normal CDF, we predict a slowdown CDF based on a slowdown mapping –Slowdown CDF  How will the RIR-CDF of the application look like after load ? –Slowdown mapping  What is the slowdown for a particular RIR under a particular load ? Slowdown mapping

39 39 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Other metrics Power spectrum of RIR –Gives idea about the super-step structure of the application –Length of Super-step –A summary of the power spectrum and the significant frequencies serves as a fingerprint/super-step snapshot

40 40 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments A sample power spectrum for 4 processes Consistency across processes

41 41 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Example for MG Significant frequency separation Exec time = 19.44 seconds Super-phase period = 4.1 seconds (1/0.244) Number of super-phases ~4

42 42 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Recap We can deduce performance for a BSP application using black box means. We can predict performance for an application, when imposed with load Entirely based on Based on packet analysis New metric called RIR : Round trip Iteration Rate More complex metrics for dynamic applications –RIR avg –RIR-CDF –RIR Power Spectrum

43 Ball In the Court Methods

44 44 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments The problem Last chapter: Predicting performance under load (Slowdown CDF) This chapter: Can we predict performance of application if an existing external load was removed ? I.e. What is the slowdown of the application under the current load conditions ?

45 45 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Ball in the Court BSP super-step  computation, communication, sync Communication according to a certain schedule and then computation in between Each process acts (computes) on a message before sending out the next one This acting on a message is called “Ball in the Court” (BIC) Ball is the responsibility of the process to do some local processing and then interact with other processes, court is the local host. BIC delay

46 46 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Developing a strategy Focus on one process If just one process is loaded, entire application slowed down because of one process All processes operate in sync, and iterations can only proceed if the loaded process does its duties stretched Under load

47 47 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Why BIC delay? The traffic trace captures the behavior of the process for the entire duration If the process slows down, the trace time length will increase. There will be corresponding changes in the trace as well. What are those changes ?  Seed of the idea

48 48 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Approach to computing the BIC delay Let’s compute the time differential for event pairs, for *.176 Some computation (6 ms)

49 49 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments With Load, Some BIC Delays Get Larger Loaded process Unloaded process 1. Sending an ack  process’s responsibility 2. This responsibility was hugely inflated in the loaded case ( 64 us to 23822 us) 3. BIC delays for other receives are similar (receive  BIC for other processes)

50 50 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments An Algorithm for BIC Delay? Can we estimate the total BIC delay from the traffic trace alone ? Investigated this question with different approaches. Each packet can be of the following types: –1. Send Packet (SP) –2. Send Ack (SA) –3. Receive Packet (SP) –4. Receive Ack (RA) We can pair up consecutive packets to form event pairs  SA followed by SP  SA-SP For 4 event possibilities  we have 16 event pairs

51 51 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments BIC events * - S* event pairs  can be classified as BIC events Intuitively, –process has either received a packet and is responsible to send the next one –Just sent a packet and is responsible to send the next one as well

52 52 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Computing the BIC delay 1. Count all BIC event pairs and sum up their time differentials  total BIC delay 2. BIC delays partitioned by the recipient IP address –E.g.: RP (175) – SP (173)  delay goes into the bucket of 173 Some special cases and concerns for counting BIC delays are explained in the dissertation (Page 140)

53 53 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Using BIC Delays to Measure Application Imbalance Get a measure of the local computing/processing time for which its responsible Comparing this local time with another unloaded process  gives an idea of imbalance for the loaded process Gives an idea of how much extra time the loaded process is taking The way we compare  produces different strategies for computing slowdown

54 54 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Using BIC Delays to Measure Application Imbalance Compute BIC delay Imbalance Algorithm Slowdown

55 55 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Computing the no-load runtime using BIC delays Global BIC imbalance algorithm: –Assumes all processes are executing the same BSP kernel –1. Compute the BIC delay for the loaded process –2. Select a partner process for comparison What if the loaded process was put in the shoes of the partner process ? Choice of partner process can affect results. Select least loaded for most optimistic results. Or select one that reflects possible migration conditions –3. Compute the “imbalance” by finding the difference in the BIC delays of these two processes

56 56 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Global BIC algorithm

57 57 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Evaluation with the IS application Balanced case Total runtime = 14.36 seconds Total runtime = 152.67 seconds Loaded case Imbalance between 176 and 175 = 124.49 seconds Actual difference = 138.41 seconds Using completely black box means, we could determine, how slow is the application running with load, without knowing about the unloaded case

58 58 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments More sophisticated Imbalance Algorithms Process-level BIC imbalance algorithm A better fit for more dynamic applications like the NAS benchmarks Sometimes work done by all processes is not the same Algorithm takers into account inter-process level interactions and BIC delays Multi-iteration BIC-delay Bias-Aware Imbalance Algorithm BIC algorithm for multi-load situations Covered in detail in the dissertation Gives a range of slowdown (optimistic and pessimistic values) Evaluated with MG and IS benchmarks

59 59 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Other contributions in the Chapter Metrics for computing imbalance amongst processes –Gives idea of heterogeneity experienced by different processes –Draws attention to applications that can use lot of help –Three imbalance metrics Two global: Standard Deviation from average, Squared distance Process Level imbalance metric

60 60 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Conclusions How slow is the application running because of externally imposed load ? Ball in the Court concept and ways to measure it Various algorithms to compute the imbalance –Global Imbalance Algorithm –Process-level Bias Aware BIC Imbalance Algorithm –Multi-load BIC Imbalance Algorithm Application Imbalance metrics

61 Global bottlenecks and Time Decomposition

62 62 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Global Bottleneck In the BIC discussion, we could identify the slow processes using their BIC delays This chapter extends the argument to network and I/O resources Looks at the problem of Global Time Decomposition –Where does the application spend its time ? –At network resource and process level Time decomposition can point out potential sources of imbalance Some metrics output are: –Cumulative Message Latency –Average Latencies observed –Cumulative message transfer time –Average Bandwidth observed

63 63 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments BIC analysis for local imbalance Global Time Decomposition Measure current performance Indicates if application is highly imbalanced, Indicates amount of slowdown, and Which host to focus on? Shows time allocation for network components, and Which non-host resources to focus on ? Is application running at expected speed compared to previous known instances ? Multiple application case: which application will be slowed down more from load ? Eliminate a bottleneck using adaptation mechanisms NO Traffic based iteration rate metrics RIR time series Dynamic Performance Metrics Global BIC imbalance algorithm Process-level Bias Aware BIC imbalance algorithm Multi-load BIC imbalance algorithm Imbalance metrics Global Time decomposition techniques for various network metrics Unix tools to find root case Adaptation Mechanisms Migration Network overlay links Routing rules Network reservation Flowchart for diagnosis Chap 3 Chap 4 Chap 5 Appendices A,B Tools Questions/problems

64 64 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Different contributions of dissertation Virtual Topology and Traffic Inference Framework Black box metrics for Absolute Performance Ball in The Court principles to compute application slowdown Global Bottlenecks using time decomposition Increasing Application Performance In Virtual Environments through Run- time Inference and Adaptation Free Network Measurement For Adaptive Virtualized Distributed Computing Inference Adaptation 2 3 4 5 A B * An API also defined in the dissertation

65 65 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Monitoring and inference Application performance measure Adaptation algorithm Adaptation mechanisms Adaptation Applications Optimization metric 1.Overlay topology 2.Forwarding rules 3.VM migration 1.Application Throughput 1.BSP 2.Transactional ecommerce 1.Application throughput 1.VTTIF 2.Network monitoring 1.Single metric 2.Combined metric

66 66 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Effect on BSP Application Throughput For high Compute/Communicate Ratio, Migration + Topology dramatically improves performance Adapting to External Load Imbalance External load removed, but I/O still dominates, so topology helps External load removed, can drive higher I/O

67 67 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Free Network Measurement For Adaptive Virtualized Distributed Computing

68 Closest Related Work Wood et. al. [178] - Black box and gray box strategies for Virtual Machine Migration, NSDI 2007 Goal: To improve performance of Stand-alone applications running inside VMs Black box and gray box strategies Black box  externally visible parameters like CPU load, disk swapping activity and network usage Focuses on stand-alone applications I focus on distributed parallel applications  collective properties

69 Closest Related Work Aguilera et. al. [14] – Performance debugging for distributed systems of blackboxes, OSP 2003 Goal: To find components that contribute to high latency amongst critical message paths in a distributed application Somewhat analagous to BIC chapter Assume constant delay per component Unclear how their techniques translate to parallel applications (cyclic computation) No estimate of actual slowdown Reynolds et. al. [133] – WAP5: black-box performance debugging for wide-area systems, WWW 2006 Focus on wide area systems now, more focus on the network delay aspects Major work focuses on identifying causal relationships between messages Assume a one to one message casualty  not so in Parallel applications Acknowledge that will not work for barrier type communications

70 70 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Related Publications Gupta, A., Dinda, P. A. Inferring the topology and traffic load of parallel programs running in a virtual machine environment. In Proceedings of the 10th Workshop on Job Scheduling Strategies for Parallel Processing (JSPPS 2004 (June 2004). A. Sundararaj, A. Gupta, P. Dinda, Dynamic Topology Adaptation of Virtual Networks of Virtual Machines, Proceedings of the Seventh Workshop on Languages, Compilers and Run-time Support for Scalable Systems (LCR 2004) Sundararaj, A. Gupta, P. Dinda, Increasing Distributed Application Performance in Virtual Environments through Run-time Inference and Adaptation, In Proceedings of the 14th IEEE International Symposium on High Performance Distributed Computing (HPDC 2005) Ashish Gupta, Ananth Sundararaj, Marcia Zangrilli, Peter Dinda, Bruce B. Lowekamp, Free Network Measurement For Adaptive Virtualized Distributed Computing, In Proceedings of 20th IEEE International Parallel & Distributed Processing Symposium, 2006.

71 71 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Other Work Robert Schweller et al., Reversible Sketches: Enabling Monitoring and Analysis over High-speed Data Streams, Journal paper in IEEE/ACM Transactions on Networking, 2006. Robert Schweller et al., Monitoring Flow-level High-speed Data Streams with Reversible Sketches, In Proceedings of IEEE INFOCOM 2006. Robert Schweller, Ashish Gupta, Elliot Parsons, Yan Chen, Reverse Hashing Algorithms for Sketch- based Change Detection on Highspeed Networks:, In Proceedings of ACM SIGCOMM Internet Measurement Conference, October 2004, Taormina, Sicily P. Dinda, G. Memik, R. Dick, B. Lin, A. Mallik, A. Gupta, S. Rossoff, The User In Experimental Computer Systems Research, Proceedings of the Workshop on Experimental Computer Science (ExpCS 2007) A. Gupta, B. Lin, P. Dinda, Measuring And Understanding User Comfort With Resource Borrowing, In Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing (HPDC 2004), Honolulu, Hawaii Bin Lin, A. Gupta, Peter Dinda, Measuring, Understanding, and Exploiting Direct User Input In Resource Scheduling, Journal paper in submission Accepted Posters Ashish Gupta, Peter Dinda, Fabian Bustamante, Distributed Popularity Indices, Poster Presentation at ACM SIGCOMM 2005, Philadelphia Ashish Gupta, Manan Sanghi, Peter Dinda, Fabian Bustamante, Magnolia: a novel DHT architecture for Keyword based search, Poster Presentation at Network System Design and Implementation (NSDI 2005), Boston Ashish Gupta et al., Free Network Measurement For Adaptive Virtualized Distributed Computing, Poster Presentation at Supercomputing 2005, Seattle Sketch- based reverse hashing work User feedback work

72 72 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Components of my dissertation Virtual Topology and Traffic Inference Framework Black box metrics for Absolute Performance Ball in The Court principles to compute application slowdown Global Bottlenecks using time decomposition Increasing Application Performance In Virtual Environments through Run- time Inference and Adaptation Free Network Measurement For Adaptive Virtualized Distributed Computing Inference Adaptation 1 2 3 4 5 6

73 Thank you !

74 Backup slides

75 75 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Infrastructure Virtuoso cluster, which is an IBM e1350 with 32 compute nodes, each of which is a dual 2.2 GHz Intel HT Xeon Processors (except the management node which is faster), 1.5 GB RAM, and 40 GB of disk. The machines also have 1 Gbit interfaces and in some evaluation scenarios in later chapters(Chapters 3, 4, 5), I use 4 of these machines which form a mini-cluster of their own. These machines are then connected via a 100 Mbit switch. Virtual Machine Monitor Used: I use both VMWare and the popular open source Xen VMM in my evaluations. VMWare GSX Server 2.5 (VTTIF evaluation) Xen version 3.0.3-rc3-1.2798.f in my evaluations in Linux Kernel release 2.6.18-1.2798.fc6xen. (other chapters)

76 76 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Increasing Application Performance In Virtual Environments Through Run-time Inference and Adaptation Dynamically adapt unmodified applications on unmodified operating systems in virtual environments to available resources The adaptation mechanisms are application independent and controlled automatically without user or developer help Demonstrate the feasibility of adaptation at the level of collection of VMs connected by Virtual Networks Show that its benefits can be significant for two classes of applications Published in LCR 2004, HPDC 2005

77 77 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Informally stated: Input Network traffic load matrix of application Topology of the network Output Mapping of VMs to hosts Overlay topology connecting hosts Forwarding rules on the topology  Such that the application throughput is maximized Optimization Problem (2/2) Topology + Migration The algorithm is described in detail in the paper

78 78 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Evaluation Applications Patterns: A synthetic BSP benchmark TPC-W: Transactional web ecommerce benchmark Benefits of adaptation (performance speedup) Adapting to compute/communicate ratio Adapting to external load imbalance

79 79 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Effect on BSP Application Throughput Adapting to Compute/Communicate Ratio For high Compute/Communicate Ratio, Migration + Topology dramatically improves performance Less time spent in I/O, so migration alone is enough Since I/O dominates, drop in latency improves performance Even for small amount of I/O, it takes up significant time

80 80 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments TPCW No TopologyTopology No Migration1.2161.76 Migration1.42.52 Throughput (WIPS) With Image Server Facing External Load

81 81 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Free Network Measurement for Adaptive Virtualized Distributed Computing ADAPTATION : A FOUR STEP PROCESS 1.Automatically infer application demands (network/CPU) 2.Monitor resource availability (bw/latency/CPU) 3.Adapt distributed application for better performance/cost effectiveness 4.Reserve Resources when possible WREN PAPER, Published in IPDPS 2006

82 82 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments What is WREN ? Developed by my colleague Marcia Zangrilli from WM College 1. Observes incoming/outgoing packets 2. Online analysis to derive latency/bandwidth information for all host pair connections 3. Answers network queries for any pair of hosts What does it do ?

83 83 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments An important Contribution: Problem Formalization

84 84 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Two approaches to adaptation

85 85 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments VTTIF in a dynamic topology environment Parameters: Smoothing Window  sliding window duration over which updates are aggregated Update Rate Detection Threshold

86 86 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Questions Are these techniques applicable to other applications ? Why do you think they are generic enough even for BSP applications ? How did you test their generality ? –Because the reasoning for their derivation was based on the BSP model not on any particular application –Tested on Patterns and then tried on other more complex applications. –VTTIF was used by other components like adaptation What modifications would you suggest to the –Network hardware –VMM –Router level ? How could you improve time synchronization issues ? What is the closed work to you and how is it different from whats out there –Black box inference NSDI 2007 –Black box debugging from HP labs

87 87 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments How about applying the techniques beyond BSP applications ? –Distributed apps –P2P –Web apps What are the limitations of these techniques ? When will they not work ? What are the assumptions ? What is the main research contribution to your work..?

88 Bin

89 Evidence of Adaptation Driven by Inference

90 90 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Evidence of Adaptation How can automated inference assist automated adaptation without user intervention and application knowledge ? Significant work already done in this area by me Using multiple mechanisms for adaptation Combining resource availability with application demands to match needs Non-scientific apps also investigated (multi-tier web sites) More in following slides…

91 91 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Informally stated: Input Network traffic load matrix of application Topology of the network Output Mapping of VMs to hosts Overlay topology connecting hosts Forwarding rules on the topology Such that the application throughput is maximized Optimization Problem (2/2) Topology + Migration The algorithm is described in detail in the paper

92 92 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Effect on BSP Application Throughput For high Compute/Communicate Ratio, Migration + Topology dramatically improves performance Adapting to External Load Imbalance External load removed, but I/O still dominates, so topology helps External load removed, can drive higher I/O

93 93 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Evaluation Scenario 2 : Large 256 host topology. 32 potential hosts, 8 Virtual Machines Results for Multi Constraint Cost Function : Bandwidth and Latency Annealing easy to adapt and finds good mappings compared to heuristic

94 94 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments The New Adaptation Process

95 95 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Two approaches to adaptation

96 96 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Evaluation Scenario 2 : Large 256 host topology. 32 potential hosts, 8 Virtual Machines Results for Multi Constraint Cost Function : Bandwidth and Latency Annealing easy to adapt and finds good mappings compared to heuristic

97 97 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Summary of contributions Virtual Topology and Traffic Inference Framework Black box metrics for Absolute Performance Ball in The Court principles to compute application slowdown Global Bottlenecks using time decomposition Increasing Application Performance In Virtual Environments through Run-time Inference and Adaptation Free Network Measurement For Adaptive Virtualized Distributed Computing 1.Infer the traffic matrix for a BSP application 2.Spatial Communication Topology of the application 3.Online implementation and low performance overhead 4.Handling dynamic situations and avoiding oscillations 5.Integration with adaptation system

98 98 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Summary of contributions Virtual Topology and Traffic Inference Framework Black box metrics for Absolute Performance Ball in The Court principles to compute application slowdown Global Bottlenecks using time decomposition Increasing Application Performance In Virtual Environments through Run-time Inference and Adaptation Free Network Measurement For Adaptive Virtualized Distributed Computing 1.How to evaluate BSP application performance under black box conditions ? 2.New set of metrics based on traffic 3.RIR (Round trip Iteration Rate)  correlated to the iteration rate 4.Complex metrics for dynamic applications 1.RIR avg 2.RIR-CDF 3.RIR-PS 5.Scheduling using these advanced metrics (Slowdown CDF) 6.Other applications beyond performance 1.Fingerprinting 2.Statistical Scheduling

99 99 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Summary of contributions Virtual Topology and Traffic Inference Framework Black box metrics for Absolute Performance Ball in The Court principles to compute application slowdown Global Bottlenecks using time decomposition Increasing Application Performance In Virtual Environments through Run-time Inference and Adaptation Free Network Measurement For Adaptive Virtualized Distributed Computing 1.Quantitative estimate of the application slowdown under external load 2.Amount of speedup achievable 3.Can greatly help in adaptation/scheduling decisions  predicts the benefit of migration in advance 4.Concept of Ball In the Court delays 5.Using BIC delays to compute imbalance in application 6.Algorithms to compute slowdown 1.Global Imbalance Algorithm 2.Process Level BIC imbalance algo 3.Multi-load BIC imbalance algo 7.Metrics to indicate amount of imbalance in application as a whole

100 100 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Summary of contributions Virtual Topology and Traffic Inference Framework Black box metrics for Absolute Performance Ball in The Court principles to compute application slowdown Global Bottlenecks using time decomposition Increasing Application Performance In Virtual Environments through Run-time Inference and Adaptation Free Network Measurement For Adaptive Virtualized Distributed Computing 1.How can we find the global bottleneck for the application ? 2.Time Decomposition of execution time  Helps identify regions of great imbalance 3.Includes the network aspects 4.Many new metrics for latency and bandwidth time components 5.Evaluation with various load scenarios

101 101 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Summary of contributions Virtual Topology and Traffic Inference Framework Black box metrics for Absolute Performance Ball in The Court principles to compute application slowdown Global Bottlenecks using time decomposition Increasing Application Performance In Virtual Environments through Run-time Inference and Adaptation Free Network Measurement For Adaptive Virtualized Distributed Computing

102 102 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Summary of contributions Virtual Topology and Traffic Inference Framework Black box metrics for Absolute Performance Ball in The Court principles to compute application slowdown Global Bottlenecks using time decomposition Increasing Application Performance In Virtual Environments through Run-time Inference and Adaptation Free Network Measurement For Adaptive Virtualized Distributed Computing

103 103 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Overview of the Talk Introduction and Background Overview of contributions Dynamic Topology Inference Measures of Absolute Performance Ball in the Court Delays and predicting unloaded performance Global Bottleneck Detection

104 104 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Overview of the Talk Runtime Adaptation using black box inference Contributions Recap Conclusion

105 105 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Virtual Machine Distributed Computing Advantages –Helps overcome heterogeneity of resources, middleware, operating systems –Easier to administer (Virtual administration) –Autonomic Resource Management

106 106 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Impact Further the goal of autonomic computing Black box approach  impacts huge set of applications Help in lower the entry barriers for distributed and parallel autonomic computing Shared resources  no new tweaks needed to existing applications

107 107 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Impact – Adaptation is just one application for inference – Application/Resource management and accounting – Dynamic problem detection and debugging – Bottleneck detection – Intrusion detection –Abnormal behavior of part of application –Poor correlation with others

108 108 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Application Model BSP Parallel applications Two benchmarks being considered –Patterns –NAS Benchmarks

109 Citations [167] VALIANT, L. A bridging model for parallel computation. Communications of the ACM 33, 8 (Aug. 1990), 103 – 111.

110 110 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Two threads to my dissertation 1.Application inference Inference has to be fully automated – zero intervention And no application or OS changes Research question: Whether and to what extent can we infer the application’s demands and behavior using only passive observations i.e. black box model ? E.g. Computational load, communication behavior, application topology, performance behavior, prediction and current bottlenecks System inference not part of my thesis… But I’ve contributed there too

111 111 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Two threads to my dissertation 2. Applying inference: Autonomic adaptation Virtuoso implements dynamic adaptation according to changing application demands Inference provides updated view Different adaptation mechanisms like VM migration, overlay networking etc. Adapt application if performance degrades or opportunity exists to boost performance Limited evidence of how inference-adaptation combination can boost performance

112 112 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Other inference properties Traffic topology for BSP-style distributed applications – work done in my initial VTTIF paper Measuring performance of a BSP application Computing affect of load on a BSP application using passive measurements Finding global bottlenecks in a BSP application

113 113 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Impact Was the first paper to propose and demonstrate collective inference for a virtual distributed environment Was my first Virtuoso project

114 114 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments A VNET virtual layer VNET Layer Physical Layer A Virtual LAN over wide area VNET Abstraction: A set of VMs on same Layer 2 network Virtual Ethernet LAN

115 115 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Overall Design VNET Abstraction: A set of VMs on same Layer 2 network Extend VNET to include the required features Monitoring at Ethernet packet level The Challenge here Lacks manual control How to detect interesting parallel program communication ?

116 116 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Dynamic Topology Inference by VTTIF 1. Fast updates Smoothed Traffic Matrix 2. Low Pass Filter Aggregation 3. Threshold change detection Topology change output VNET Daemons on Hosts VNET Daemon at Proxy Aggregated Traffic Matrix

117 117 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Contributions Understanding the correlation between network traffic and inter- process interactions Defining a black box measure of application performance (RIR) –Techniques to derive it Advanced metrics for Dynamic Applications and their performance Frequency Domain based metrics for greater insight about application Applications to advanced scheduling and application fingerprinting

118 118 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Solving the issues Sampling rate: –Compute at a “high enough” sample rate to capture enough high frequency dynamics –Compute power spectrum to calculate total energy in the signal (power spectrum is magnitude of Fourier Transform)  Integrate the power spectrum –Sample at a slightly lower rate and see reduction in energy. –Repeat until we start losing energy beyond a certain threshold (5%)  this indicates at this rate we start losing significant part of signal –If actual sampling rate is much higher (5 to 10 times) than this cutoff point, we are doing alright Gives us 95% cutoff sampling rate

119 119 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Solving the issues

120 120 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Capturing the right time duration Again a Power Spectrum based approach Helps us determine if our sample size captures most of the energy in the signal Ideally we will capture as much as resources allow us The process: – After capturing, we test the amount of energy loss, if we shorten the sample size. –Find the cut-off point as last time –Compare the cut-off sample size and our measured sample size After sampling rate and sampling duration is satisfactory, RIR avg computed by averaging the time series

121 121 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Effect of load on the RIR time series 1. Expansion effect 2. Basic structure of time series is same 3. Each phase more irregular (more spikes) 4. Captures the time dynamics of the application

122 122 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments The RIR-CDF metric More exhaustive than the RIR avg Statistical scheduling Application fingerprint Performance comparison of runs at more detailed level than RIR avg Global metric

123 123 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments The RIR-CDF metric CDF indicates the RIR region application spends most of its time in RIR-CDF can be used to predict which application will be more affected by external load (for dynamic applications) Slowdown mapping I show how we can predict slowdown for MG vs IS

124 124 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Taking guidance from RIR-CDF : a proof of concept CDF indicates the RIR region application spends most of its time in Scheduling fact: Depending the scheduling algorithm, affect of load on different RIR regions is different (Govindan et.al.) Reason: Scheduling handled differently for CPU bound processes vs I/O bound processes “Providing enough CPU is not enough, an equally important consideration is to provide the CPU at the right time” Higher RIR  more inter-process interaction  can use more efficient communication Lower RIR  more compute intensive or large messages How do applications with different RIRs suffer from external load ?

125 125 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Effect of load on different RIR areas Difference of performance over 12.46 times at the extremes under full load Applications with higher iteration rates always seem to be more drastically affected by external load then those with lower iteration rates.

126 126 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Scheduling using the CDF The slowdown mapping can be complicated and built over a long history. We use results from Patterns benchmark to provide a simpler mapping Input: RIR value Output: Fractional slowdown Assumed load = 100% More realistic mappings may take bandwidth and CPU utilization as input

127 127 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Scheduling using the CDF For the IS and MG applications we have seen till now, which application may be hurt more if one of the processes from each application shares the physical host with an external computational load? The impact: we can now determine in advance, the impact of external load if we must choose one of these applications to be influenced by the load. Map each RIR value to its slowdown RIR value from the mapping Slowdown CDF

128 128 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Slowdown CDF For MG Slowdown CDF For IS More time spent in low regions Avg = 0.103 Avg = 0.065

129 129 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Actual slowdown MG  2.4 times IS  15.52 times IS is affected much more from the load as predicted. A better mapping function can yield more powerful slowdown estimates for complex applications. Shows how we can make complex decisions for dynamic applications using RIR – CDF. Other applications  Statistical scheduling ideas –examples

130 130 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Power Spectrum Other applications of the Power Spectrum –Statistical Scheduling decisions –Application categorization –Component separation More discussion in dissertation Steps in computing the Power Spectrum – numerous measures taken to get a representative power spectrum (Windowing functions, zero padding, eliminating the DC bias) Evaluation with another application: IS (Integer Sort) from the NAS benchmarks  similar results Implementation details and making it work online

131 131 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Why do I want to do this? Close synchronization in BSP: a single slow process can slowdown entire application drastically IS application  One process loaded  15 fold increase in execution time Same external load affects different applications differently No uniform decision making method for improving performance for loaded applications Therefore, very useful to know the impact of any decisions beforehand before actually executing the actual steps of adaptation

132 132 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments The Benefits If we can know how badly the application is affected from the load, Appropriate steps can be taken to alleviate the load, especially in multi-application scenarios –Migrating the process VM –Moving the external load Constraints: –Black box, no intrusion into the application –No extra loading or removal of load (no perturbation)

133 133 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments Process Level BIC imbalance Algorithm A better fit for more dynamic applications like the NAS benchmarks Sometimes work done by all processes is not the same Algorithm takers into account inter-process level interactions and BIC delays Phenomenon of Load Bias also observed  A heavily loaded process affects the BIC delay of other unloaded processes too (shifting of work) Covered in detail in the dissertation Gives a range of slowdown (optimistic and pessimistic values) Evaluation with MG –Range of slowdown = [28.67s, 50.78s ] –Actual slowdown = 42.67s


Download ppt "Black Box Methods for Inferring Parallel Applications' Properties in Virtual Environments Ashish Gupta March 2008 PhD Final Talk Committee: Prof. Peter."

Similar presentations


Ads by Google