Presentation is loading. Please wait.

Presentation is loading. Please wait.

Texas Learning and Computation Center High Performance Systems Lab Automatic Clustering of Grid Nodes Nov 14, 2005 Qiang Xu, Jaspal Subhlok University.

Similar presentations


Presentation on theme: "Texas Learning and Computation Center High Performance Systems Lab Automatic Clustering of Grid Nodes Nov 14, 2005 Qiang Xu, Jaspal Subhlok University."— Presentation transcript:

1 Texas Learning and Computation Center High Performance Systems Lab Automatic Clustering of Grid Nodes Nov 14, 2005 Qiang Xu, Jaspal Subhlok University of Houston

2 Texas Learning and Computation Center High Performance Systems Lab Grid Scheduler Computational Resource | CPU, memory Network Topology Network Topology Network Link | Latency, Bandwidth I will decide which group of nodes are best for an application!!!

3 Texas Learning and Computation Center High Performance Systems Lab Network Topology Fine-grained physical network topology --- Hard! heterogeneous, dynamic, and distributed nature of a grid system We focus on the “logical” network topology logical network topology: the connectivity between nodes based on the observed behavior. 1) Easier to compute 2) Sufficient to tackle the resource selection problem

4 Texas Learning and Computation Center High Performance Systems Lab Discover Clusters/Logical Topology A set of nodes with IP addresses / hostnames Connectivity?

5 Texas Learning and Computation Center High Performance Systems Lab Discover Clusters/Logical Topology Cluster A Cluster B Cluster C Dist(A—B) Dist(B—C) Dist(A—C) nodes close to each other  same cluster

6 Texas Learning and Computation Center High Performance Systems Lab Outline Introduction Internet  Geometric Space Automatic Clustering Experiments and Result Conclusion

7 Texas Learning and Computation Center High Performance Systems Lab Internet Topology Map 1 A macroscopic snapshot of the Internet : 4 April 2005 - 17 April 2005.

8 Texas Learning and Computation Center High Performance Systems Lab Internet Topology Map 2 Internet map as of 1998 by Bill Cheswick, Bell Labs Hal Burch, CMU

9 Texas Learning and Computation Center High Performance Systems Lab Why Geometric Space ? Internet Topology Map --- Complex! Geometric Space (N-Dimension Euclidean Space) GNP(Global Network Positioning) --- T. S. Eugene Ng and Hui Zhang, INFOCOM'02 I can’t tell the distance between nodes!!

10 Texas Learning and Computation Center High Performance Systems Lab Magic Landmarks! Node Landmark 3 12 8 Landmarks: A set of distributed nodes across the internet

11 Texas Learning and Computation Center High Performance Systems Lab Geometric Space 1.One axis per landmark 2.Coordinate of nodes ≡ Latency from each landmark. Y4=8 X4=12 Z4=3

12 Texas Learning and Computation Center High Performance Systems Lab Internet  Geometric Space Simple Geometric Space Complex Internet Structure

13 Texas Learning and Computation Center High Performance Systems Lab Advantage of Geometric Space Simple --- distance in Geometric Space is well defined, e.g. the Euclidean distance. Scalable --- for M Nodes Pairwise distance among M nodes  M*M probes Mapping to Geometric space  M*N probes N is the number of landmarks – a number ~7 is known to be sufficient. Easy to manage --- only need to control the landmarks

14 Texas Learning and Computation Center High Performance Systems Lab Outline Introduction Internet  Geometric Space Automatic Clustering Experiments and Result Conclusion

15 Texas Learning and Computation Center High Performance Systems Lab Again the problem! Cluster A Cluster B Cluster C Dist(A—B) Dist(B—C) Dist(A—C)

16 Texas Learning and Computation Center High Performance Systems Lab Place Nodes in Geometric Space ! Simple Geometric Space How do I cluster?

17 Texas Learning and Computation Center High Performance Systems Lab Network Distance: Threshold: If Distance < Threshold, nodes belong to the same logical cluster – N is the # of landmarks –T parameter describes how close nodes have to be to be in the same cluster for a typical domain to be one cluster,T = 1ms Distance and Threshold

18 Texas Learning and Computation Center High Performance Systems Lab All grid nodes are graph nodes Add an edge between nodes if Distance < Threshold Build Unidirected Graph

19 Texas Learning and Computation Center High Performance Systems Lab Edge exist if Distance < Threshold Typical Case Clusters are obvious and easy to distinguish! Clusters are obvious and easy to distinguish!

20 Texas Learning and Computation Center High Performance Systems Lab Pathological Case Border Node ? Where are the clusters? General Case: Find maximal cliques in the graph – each clique is a cluster

21 Texas Learning and Computation Center High Performance Systems Lab Summary of Inter-domain Clustering 1.Place Nodes in the geometric space. 2.Calculate the Euclidean distance. 3.Build a graph based on distance and Threshold. 4.Find the maximal cliques. inter-domain clustering --- good! intra-domain clustering ---  not good enough!

22 Texas Learning and Computation Center High Performance Systems Lab Intra-domain clustering Nodes in the same domain but in different subnets. Short latency --- less than 1ms. Landmark-based approach --- resolution is not sufficient! measurement error ~ real latency We need to change the approach for intra- domain clustering !

23 Texas Learning and Computation Center High Performance Systems Lab Intra-domain Clustering 1.Distance between nodes is directly measured latency instead of projected geometrical distance. (M × M but M is smaller and measurements are quick.) 2.Basis for clustering is relative Distance between any two nodes inside a cluster is within β% of the smallest distance in the cluster.

24 Texas Learning and Computation Center High Performance Systems Lab REPEAT: Select least cost edge, say connecting clusters A and B If A and B are not the same cluster; and if this edge cost is within β % of least cost edges inside A and B, then combine them into one cluster Intra-domain Clustering Procedure Initially each node is a cluster Each edge is measured latency

25 Texas Learning and Computation Center High Performance Systems Lab Outline Introduction Internet  Geometric Space Automatic Clustering Experiments and Result Conclusion

26 Texas Learning and Computation Center High Performance Systems Lab Experiments Inter-Domain Clustering 3 Landmarks: UT(Austin), Rice, CMU 36 Compute Nodes: Rice, UT-Dallas, TAMU-College Station, TAMU-Galveston Intra-Domain Clustering 4 clusters at University of Houston: PGH201, Itanium, Opetron, Stokes TCP Ping(not ICMP Ping) to measure latency

27 Texas Learning and Computation Center High Performance Systems Lab Inter-domain Cluster ( 2 landmarks) + UT Dallas ðTAMU Galveston  TAMU College Station  Rice Cannotdistinguishbetween UT Dallas & TAMU Galveston

28 Texas Learning and Computation Center High Performance Systems Lab Inter-domain Cluster ( 3 landmarks) + UT Dallas ðTAMU Galveston  TAMU College Station  Rice 4 clusters are well distinguished

29 Texas Learning and Computation Center High Performance Systems Lab Inter-domain Cluster ( 2 landmarks) + UT Dallas ðTAMU Galveston  TAMU College Station  Rice

30 Texas Learning and Computation Center High Performance Systems Lab Intra-domain Cluster latency ClustersPGH201OpteronItaniumStokes PGH2010.090.32 0.30 Opteron0.250.09 0.50 Itanium0.300.10 0.35 Stokes0.400.500.600.10 Latency between Nodes (ms)

31 Texas Learning and Computation Center High Performance Systems Lab Illustration of Intra-domain Clusters + UT Dallas ðTAMU Galveston  TAMU College Station  Rice

32 Texas Learning and Computation Center High Performance Systems Lab Future Work Integrate into a grid scheduling system Use Bandwidth as a factor for clustering Dynamically update logical clusters Nodes behind a NAT (Network address translation) -- nodes with local IP addresses

33 Texas Learning and Computation Center High Performance Systems Lab Conclusions Efficient and scalable procedure to hierarchically group distributed nodes into logical clusters Validation with experiments on nodes distributed across Texas An important step for scheduling in a grid environment.

34 Texas Learning and Computation Center High Performance Systems Lab Questions? Thank you!


Download ppt "Texas Learning and Computation Center High Performance Systems Lab Automatic Clustering of Grid Nodes Nov 14, 2005 Qiang Xu, Jaspal Subhlok University."

Similar presentations


Ads by Google