Presentation is loading. Please wait.

Presentation is loading. Please wait.

LACSI 2002, slide 1 Performance Prediction for Simple CPU and Network Sharing Shreenivasa Venkataramaiah Jaspal Subhlok University of Houston LACSI Symposium.

Similar presentations


Presentation on theme: "LACSI 2002, slide 1 Performance Prediction for Simple CPU and Network Sharing Shreenivasa Venkataramaiah Jaspal Subhlok University of Houston LACSI Symposium."— Presentation transcript:

1 LACSI 2002, slide 1 Performance Prediction for Simple CPU and Network Sharing Shreenivasa Venkataramaiah Jaspal Subhlok University of Houston LACSI Symposium 2002

2 LACSI 2002, slide 2 Distributed Applications on Networks: Resource selection, Mapping, Adapting Data Sim 1 Vis Sim 2 Stream Model Pre ? Application Network

3 LACSI 2002, slide 3 Resource Selection Framework Measured & Forecast Network Conditions (Current Resource Availability) Predict Application Performance under Current Network Conditions Resource Selection And Scheduling Network Model Application Model subject of this paper Focus on building logical network maps

4 LACSI 2002, slide 4 Building “Sharing Performance Model” Sharing Performace Model: predicts application performance under given availability of CPU and network resources. 1.Execute application on a controlled testbed –monitor CPU and network during execution 2.Analyze to generate the sharing performance model application resource needs determine performance application treated as a black box cp u router cp u

5 LACSI 2002, slide 5 Resource Shared Execution Principles Network Sharing –sharing changes observed bandwidth and latency –effective application level latency and bandwidth determine time to transfer a message CPU Sharing –scheduler attempts to give equal CPU time to all processes –a competing process is first awarded “idle time”, then competes to get overall equal share of CPU

6 LACSI 2002, slide 6 CPU sharing (1 competing process) Application using CPU CPU idle Competing process using CPU dedicated execution CPU shared execution If an application keeps the CPU 100% busy for dedicated execution, the execution time will double on sharing the CPU with a compute intensive process. CPU time slice time corresponding progress

7 LACSI 2002, slide 7 CPU sharing (1 competing process) dedicated execution CPU shared execution If CPU is mostly idle (less than 50% busy) for dedicated execution, execution time is unchanged with CPU sharing If CPU is busy 50 – 100 % time for dedicated execution, execution time increases between 0 and 100% slowdown is predictable if usage pattern is known Application using CPU CPU idle Competing process using CPU dedicated execution CPU shared execution

8 LACSI 2002, slide 8 Methodology for Building Application’s Sharing Performance Model 1.Execute application on a controlled testbed and measure system level activity –such as CPU and network usage 2.Analyze and construct program level activity –such as message exchanges, synchronization waits 3.Develop sharing performance model by modeling execution in different sharing scenarios This paper limited to predicting execution time with one shared node and/or link in a cluster

9 LACSI 2002, slide 9 Measurement and Modeling of Communication 1.tcpdump utility to record all TCP segments exchanged by executing nodes. 2.Sequence of application messages inferred by analyzing the TCP stream (Singh & Subhlok, CCN 2002) Goal is to capture the size and sequence of application messages, such as MPI messages –can also be done by instrumenting/profiling –more precise but application not a black box (access to source or ability to link to a profiler needed)

10 LACSI 2002, slide 10 Measurement and modeling of CPU activity 1.CPU status is measured at a fine grain with top based program to probe the status of CPU (busy or idle) at a fine grain (every 20 milliseconds) CPU utilization data from the Unix kernel over a specified interval of time 2.This provides the CPU busy and idle sequence for application execution at each node 3.The CPU busy time is divided into compute and communication time based on the time it takes to send application messages

11 LACSI 2002, slide 11 Prediction of Performance with Shared CPU and Communication Link It is assumed that we know: –load on the shared node –expected latency and bandwidth on shared link Execution time for every computation phase and time to transfer every message can be computed  estimate of overall execution time

12 LACSI 2002, slide 12 Validation Resource utilization of Class A, MPI, NAS benchmarks measured on a dedicated testbed Sharing performance model developed for each benchmark program Measured performance with competing loads and limited bandwidth compared with estimates from sharing performance model (All measurements presented are on 500MHz Pentium Duos, 100 Mbps network, TCP/IP, FreeBSD. dummynet employed to control network bandwidth)

13 LACSI 2002, slide 13 Discovered Communication Structure of NAS Benchmarks 0 1 3 2 BT 0 1 3 2 CG 0 1 3 IS 0 1 3 2 EP 0 1 3 2 LU 0 1 3 2 MG 0 1 3 2 SP 2

14 LACSI 2002, slide 14 CPU Behavior of NAS Benchmarks

15 LACSI 2002, slide 15 Predicted and Measured Performance with Resource Sharing

16 LACSI 2002, slide 16 Conclusions (2 more slides though) Sharing performance model can be built by non- intrusive execution monitoring of an application treated as a black box; it can estimate performance with simple sharing fairly well Major challenges –Prediction related to data set – hope is that resource selection may still be good even if the estimates are off –Prediction with traffic on all links and computation loads on all nodes ? –Is the overall approach practical for large scale grid computing ?

17 LACSI 2002, slide 17 Sharing of resources on multiple nodes and links Impact of sharing can be estimated on individual nodes… but impact on overall execution is difficult to model because of the combination of –synchronization waits with unbalanced execution –independent scheduling (lack of gang scheduling) (e.g., one node is ready to communicate but the other is swapped out due to independent scheduling…) Preliminary result: lack of gang scheduling has a modest overhead (~5-40%) for small clusters (upto ~20 processors), not an order of magnitude overhead

18 LACSI 2002, slide 18 Scalability of Shared Performance Models i.e., is the whole idea of using network measurement tools and application info to make resource selection decisions practical ? Jury is still out Alternate approach being studied is… –automatically build an execution skeleton  short running program that reflects the execution behavior of an application –performance of skeleton is a measure of full application performance – run it for estimation of performance on a given network


Download ppt "LACSI 2002, slide 1 Performance Prediction for Simple CPU and Network Sharing Shreenivasa Venkataramaiah Jaspal Subhlok University of Houston LACSI Symposium."

Similar presentations


Ads by Google