Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

Similar presentations


Presentation on theme: "Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM."— Presentation transcript:

1 Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM Research Haifa HotNets, October 5, 2008

2 IP Multicast in Data Centers IPMC is not used in data centers

3 IP Multicast in Data Centers IPMC is not used in data centers Would speed up products that use multicast

4 IP Multicast in Data Centers Why is IP multicast rarely used?

5 IP Multicast in Data Centers Why is IP multicast rarely used? o Limited IPMC scalability on switches/routers and NICs

6 IP Multicast in Data Centers Why is IP multicast rarely used? o Limited IPMC scalability on switches/routers and NICs o Broadcast storms: Loss triggers a horde of NACKs, which triggers more loss, etc. o Disruptive even to non-IPMC applications.

7 IP Multicast in Data Centers IP multicast has a bad reputation

8 IP Multicast in Data Centers IP multicast has a bad reputation o Works great up to a point, after which it breaks catastrophically

9 IP Multicast in Data Centers Bottom line: o Administrators have no control over multicast use... o Without control, they opt for never.

10

11 Dr. Multicast

12 Dr. Multicast (MCMD) Policy: Permits data center operators to selectively enable and control IPMC Transparency: Standard IPMC interface, system calls are overloaded. Performance: Uses IPMC when possible, otherwise point-to-point unicast Robustness: Distributed, fault-tolerant service

13 Terminology Process: Application that joins logical IPMC groups Logical IPMC group: A virtualized abstraction Physical IPMC group: As usual UDP multi-send: New kernel-level system-call Collection: Set of logical IPMC groups with identical membership

14 Acceptable Use Policy Assume a higher-level network management tool compiles policy into primitives Explicitly allow a process to use IPMC groups o allow-join(process,logical IPMC) o allow-send(process,logical IPMC) UDP multi-send always permitted Additional restraints o max-groups(process,limit) o force-udp(process,logical IPMC)

15 Overview Library module Mapping module Gossip layer Optimization questions Results

16 Transparent. Overloads the IPMC functions o setsockopt(), send(), etc. Translation. Logical IPMC map to a set of P-IPMC/unicast addresses. o Two extremes MCMD Library Module

17 MCMD Agent runs on each machine o Contacted by the library modules o Provides a mapping One agent elected to be a leader: o Allocates IPMC resources according to the current policy MCMD Mapping Role

18 Allocating IPMC resources: An optimization problem Procs L-IPMC MCMD Mapping Role This box intentionally left BLACK Procs Collections L-IPMC

19 Runs system-wide as part of the agent Automatic failure detection Group membership fully replicated via gossip o Node reports its own state o Future: Replicate more selectively o Leader runs optimization algorithm on data and reports the mapping MCMD Gossip Layer

20 But gossip is slow... Implications: o Slow propagation of group membership o Slow propagation of new maps o We assume a low rate of membership churn Remedy: Broadcast module o Leader broadcasts urgent messages o Bounded bandwidth of urgent channel o Trade-off between latency and scalability MCMD Gossip Layer

21 Overview Library module Mapping module Gossip layer Optimization questions Results

22 Optimization Questions Procs L-IPMC BLACK Collections Procs L-IPMC First step: compress logical IPMC groups

23 klk;l Optimization Questions How compressible are subscriptions? o Multi-objective optimization: Minimize number of collections Minimize bandwidth overhead on network o Thm: The general problem is NP-complete o Thm: In uniform random allocation, "little" compression opportunity. o Social preferences o Lots of duplicates due to replication (e.g. for load balancing)

24 klk;l Optimization Questions Which collections get an IPMC address? o Thm: Ordered by decreasing traffic*size, assign P-IPMC addresses greedily, we minimize bandwidth. Tiling heuristic: o Sort L-IPMC by traffic*size o Greedily collapse identical groups o Assign IPMC to collections in reverse order of traffic*size, UDP-multisend to the rest Building tilings incrementally

25 klk;l Experimental Results

26 Insignificant overhead when mapping L- IPMC to P-IPMC. klk;l Overhead (max. throughput)

27 Insignificant overhead when mapping L- IPMC to P-IPMC. klk;l Overhead (CPU utilization)

28 klk;l Network Overhead Gossip Layer uses constant background bandwidth, urgent channel behaves well

29 Latency Latency of propagation of joins/leaves and new maps

30 A malfunctioning node bombards an existing IPMC group. MCMD policy prevents ill-effects klk;l Policy control < Traffic starts <New policy

31 Conclusion IPMC has been a bad citizen...

32 Conclusion IPMC has been a bad citizen... Dr. Multicast has the cure! Opportunity for big performance enhancements and policy control.

33 Thank you!

34

35 Insignificant overhead when mapping L-IPMC to P-IPMC. klk;l Overhead

36 A malfunctioning node bombards an existing IPMC group. MCMD policy prevents ill-effects klk;l Policy control

37 A malfunctioning node bombards an existing IPMC group. MCMD policy prevents ill-effects klk;l Policy control

38 Linux kernel module increases UDP-multisend throughput by 17% (compared to user-space UDP-multisend) klk;l Overhead

39 klk;l Latency of events Gossip: 99% of nodes aware of change within 9 epochs (now 1 sec)

40 Conclusions Policy: Allows data center operators to enable and control IPMC Transparency: Standard IPMC interface, system calls are overloaded. Performance: Uses IPMC when possible, otherwise point-to-point UDP Robustness: Distributed, fault-tolerant service

41 klk;l Results Library Module o Insignificant slowdown o Linux Kernel module provides 17% speed-up for UDP multi-send

42 klk;l Optimization questions Users Topics This box intentionally left BLACK Users Groups Topics Multi-objective: o Minimize number of groups o Minimize bandwidth overhead on network Thm: This problem is NP-complete o Reduction to Minimum Normal Set Basis

43 MCMD Library Layer Overloads the IPMC functions o setsockopt(), send(), etc. Translates logical IPMC addresses to physical IPMC, or point-to-point UDP packets depending on policy Notifies MCMD immediately about joins/leaves Learns about new mappings from MCMD Keeps statistics about group traffic rates

44 MCMD Library Layer Overloads the IPMC functions o setsockopt(), send(), etc. Translates logical IPMC addresses to physical IPMC, or point-to-point UDP packets depending on policy Caches translation maps Maintains a connection to MCMD for updates

45

46 Overview Library module Mapping module Gossip layer Optimization questions Results


Download ppt "Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM."

Similar presentations


Ads by Google