Presentation is loading. Please wait.

Presentation is loading. Please wait.

David Oppenheimer UCB ROC Retreat 12 January 2005 A case for resource discovery in shared distributed platforms.

Similar presentations


Presentation on theme: "David Oppenheimer UCB ROC Retreat 12 January 2005 A case for resource discovery in shared distributed platforms."— Presentation transcript:

1 David Oppenheimer UCB ROC Retreat 12 January 2005 A case for resource discovery in shared distributed platforms

2 Introduction Application performance is a function of 1.resources available to the application 2.resources needed by the application or, “application sensitivity to resource constraints” At summer retreat, described SWORD  at app deployment time, find best set of nodes given 1.resources available on a set of distributed nodes 2.application sensitivity to resource constraints  assumptions 1.available resources vary among nodes enough to matter spare CPU, mem, disk space; inter-node latency, avail. bw;... 2.applications are sensitive to resource constraints enough to matter Focus of this talk: verify assumption (1)

3 Introduction (cont.) Questions we will address  is there enough variation among nodes at any given (deployment) time to justify service placement?  is there enough variation over time on a single node to justify periodic task migration?  are there correlations between attributes on a single node, or among nodes at the same site? All of these questions are important in designing a system for resource discovery and service placement (like SWORD)

4 Outline 1.How much does the available amount of per-node resources vary among nodes at a fixed time? 2.How much does the available amount of per-node resources vary over time? How much do inter-node latency and available bandwidth vary over time? 3.On a given node, are any per-node attributes strongly correlated? Are inter-node latency and available bandwidth correlated?

5 Experimental environment Per-node attributes: Ganglia, CoMon  two-week period (Oct 10-Oct 24, 2004)  each node polled every 5 minutes  free memory, free swap, free disk, load average, network bytes sent and received/sec, # active slices Inter-node latency: all-pairs pings  one month period ending Oct 24, 2004  each pair of nodes measured every 15 minutes Inter-node bandwidth: Iperf  one month period ending Oct 24, 2004  each pair of nodes measured 1-2x/week About 250 nodes in the trace each day

6 Outline 1.How much does the available amount of per-node resources vary among nodes at a fixed time? 2.How much does the available amount of per-node resources vary over time? How much do inter-node latency and available bandwidth vary over time? 3.On a given node, are any per-node attributes strongly correlated? Are inter-node latency and available bandwidth correlated?

7 Resource heterogeneity: averages How much does available resources vary over the trace? attributemeanstd. dev.10 th %ile90 th %ile # of CPUs CPU speed (MHz) Total disk (GB) Total memory (MB) Total swap (GB)

8 Resource heterogeneity: averages How much does available resources vary over the trace? attributemeanstd. dev.10 th %ile90 th %ile 1 min load average Free memory (MB) Free swap (MB) Free disk (GB) Active slices Bytes/s in Bytes/s out

9 Resource heterogeneity: CV vs. time

10 Outline 1.How much does the available amount of per-node resources vary among nodes at a fixed time? 2.How much does the available amount of per-node resources vary over time? How much do inter-node latency and available bandwidth vary over time? 3.On a given node, are any per-node attributes strongly correlated? Are inter-node latency and available bandwidth correlated?

11 Variability of per-node attributes over time

12

13

14 Can rank degree of variability of each attribute  disk, swap < mem, load < net bytes; #slices mod to sig. CDF curve shifts to right as interval length incrs.  attributes vary less over short time periods than long  migration interval: find “sweet spot” in curve of variability vs. interval length CDF slope decreases as median var. of attr. incr.  may be able to classify nodes as high/low var. over time for mem, load, net bytes (they have high median var.)

15 Inter-node latency and BW variation over time Most nodes have low latency (and bw) variability even over a month-long trace  migration may not be worthwhile

16 Outline 1.How much does the available amount of per-node resources vary among nodes at a fixed time? 2.How much does the available amount of per-node resources vary over time? How much do inter-node latency and available bandwidth vary over time? 3.On a given node, are any per-node attributes strongly correlated? Are inter-node latency and available bandwidth correlated?

17 Correlation among per-node attributes No strong correlations between different attrs.  though some one-hour trace segments had some Some correlation between nodes at same site r load one mem free swap free disk free actv slice byte_inbyte_out load one.080 mem free swap free disk free actv slice byte_in byte_out

18 Correlation between latency and avail BW Moderate inverse power law correlation  Using latency to estimate BW gives 233% error some nodes are bandwidth-capped, some in weird ways Some node pairs showed strong lat-BW correlation  17% within 25%, 56% within 50% r=-.59

19 Conclusion 1.How much does the available amount of per-node resources vary among nodes at a fixed time? significantly; enough to warrant svc. placement 2.How much does the available amount of per-node resources vary over time? How much do inter-node latency and available bandwidth vary over time? moderate variability; may warrant migration 3.On a given node, are any per-node attributes strongly correlated? Are inter-node latency and available bandwidth correlated? no strong correlation between diff. attrs. some correlation between same attr, same site latency can predict avail. bandwidth

20 Future work Ask same questions but use application model to answer, rather than analysis of raw data  different apps have different resource sensitivities  different apps have different migration costs Can we predict attribute values?  give warning before migration  or just don’t bother to deploy on “bad” nodes How much “better” could we do if SWORD could schedule jobs?


Download ppt "David Oppenheimer UCB ROC Retreat 12 January 2005 A case for resource discovery in shared distributed platforms."

Similar presentations


Ads by Google