Presentation is loading. Please wait.

Presentation is loading. Please wait.

1.10.2012 1 SEATTLE - SEATTLE - A Scalable Ethernet Architecture for Large Enterprises T-110.6120 – Special Course in Future Internet Technologies M.Sc.

Similar presentations


Presentation on theme: "1.10.2012 1 SEATTLE - SEATTLE - A Scalable Ethernet Architecture for Large Enterprises T-110.6120 – Special Course in Future Internet Technologies M.Sc."— Presentation transcript:

1 1.10.2012 1 SEATTLE - SEATTLE - A Scalable Ethernet Architecture for Large Enterprises T-110.6120 – Special Course in Future Internet Technologies M.Sc. Pekka Hippeläinen IBM phippela@gmail

2 1.10.2012 2 SEATTLE  Based on and pictures borrowed from:Changhoon,K;Caesar,M;Rexford,J. Floodless in SEATTLE: A Scalable Ethernet Architecture for Large Enterprises  Is it possible to build a protocol that maintains the same configuration-free properties as Ethernet bridging, yet scales to large networks?

3 1.10.2012 3 Contents  Motivation: network management challenge  Ethernet features: ARP and DHCP broadcasts  1) Ethernet Bridging  2) Scaling with Hybrid networks  3) Scaling with VLANs  Distributed Hashing  SEATTLE approach  Results  Conclusions

4 1.10.2012 4 Network management challenge  IP Networks require massive effort to configure and manage  Even 70% of an enterprise network’s cost goes to maintenance and configuration  Ethernet is much simpler to manage  However Ethernet does not scale well beyond small LANs  SEATTLE architecture aims to provide scalability of IP with simplicity of Ethernet management

5 1.10.2012 5 Why Ethernet is so wonderful ?  Easy to setup, easy to manage  DHCP server, some hubs, plug’n play

6 18.09.2012 6 Flooding query 1: DHCP requests  Lets say node A joins the ethernet  To get IP / confirm IP – node A sends a DHCP request as a broadcast  Request floods through the broadcast domain

7 1.10.2012 7 Flooding query 2: ARP  In order for node A to communicate to node B in the same broadcast domain, the sender needs MAC address of the node B  Lets assume that node B IP is know  Node A sends and Address Request Protocol (ARP) broadcast – to find out MAC address of node B  Similarly to DHCP broadcast – the request is flooded through the whole broadcast domain  This is basically {IP -> MAC} mapping

8 1.10.2012 8 Why flooding is bad ?  Large Ethernet deployments contain vast number of hosts and thousands of bridges  Ethernet was not designed to such a scale  Virtualization and mobile deployments can cause many dynamic events – causing control traffic  Broadcast messages need to be processed in the end hosts – interrupting cpu  The bridges forwarding tables grow roughly linearly with number of hosts

9 1.10.2012 9 1) Ethernet bridging  Ethernet consists of segments each comprising a single physical layer  Ethernet bridges are used to interconnect segments to multi-hop network i.e. LAN  This forms a single broadcast domain  Bridge learns how to reach a host – by inspecting the incoming frames and associating the source MAC with the incoming port  A bridge stores this information to a forwarding table – using the table to forward packets to correct direction

10 1.10.2012 10 Bridge spanning tree  One bridge is configured to be the root bridge  Other bridges collectively compute a spanning tree based on the distance to the root  Thus traffic is not routed through shortest path but along the spanning tree  This approach avoids broadcast storms

11 1.10.2012 11

12 1.10.2012 12 2) Hybrid IP/Ethernet  In this approach multiple LANs are interconnected with IP routing  In hybrid networks each LAN contains at most a few hundred of hosts that form IP subnet  IP subnet is associated with the IP prefix  Assigning IP prefixes to subnet and associating subnets with router interfaces is a manual process  Unlike MAC which is host identifier – IP address denotes the hosts current location in the network

13 1.10.2012 13

14 1.10.2012 14 Drawbacks of Hybrid approach  Biggest drawback is the configuration overhead  Router interfaces must be configured  Host must have correct IP address corresponding to the subnet it is located (DHCP can be used)  Networking policies are defined usually per network prefix i.e. topology  When network changes the policies must be updated  Limited mobility support  Mobile users & virtualized hosts at datacenters  If IP is constant – the user should stay on the same subnet

15 1.10.2012 15 3) Virtual LANs  Overcomes some problems of Ethernet and IP Networks  Administrators can logically groups hosts into same broadcast domain  VLANS can be configured to overlap – configuring bridges not the hosts  Now broadcast overhead can be reduced by the isolates domains  Mobility is simplified – IP address can be retained while moving between bridges

16 1.10.2012 16 Virtual LANs  Traffic from B1 to B2 can be ‘trunked’ over multiple bridges  Inter domain traffic needs to be routed

17 1.10.2012 17 Drawbacks of VLANs  Trunk configuration overhead  Extending VLAN across multiple bridges requires VLAN to be configured at each of the bridges participating. Often manual work.  Limited control plane scalability  Forwarding table entries and broadcast traffic for every active host and every VLAN visible  Insufficient data plane efficiency  Single spanning tree is still used within each VLAN  Inter-VLAN traffic must be routed via IP gateways

18 1.10.2012 18 Distributed Hash Tables  Hash tables are used to store {key -> value} pairs  In case of multiple nodes there is nice way to make  Nodes symmetric  Distribute the hash table entries evenly among nodes  Keep reshuffling of entries small in case of adding/removing nodes  Idea is to calculate H(key) that is mapped to a host – one can visualize this to mapping to an angle (or to a point on a circle)

19 1.10.2012 19 Distributed Hash Tables  Each node is mapped to randomly distributed points on the circle  Thus each node is mapped to multiple buckets  One calculates the H(key) – and stores the entry to the node owning this bucket  If node is removed – the values are now assigned to next buckets  If node is added – entries are moved to the new buckets

20 1.10.2012 20 SEATTLE approach 1/2  1) Switches calculate shortest path among themselves  This is link state protocol – basically Dijkstra  Switch level discovery protocol – Ethernet hosts do not respond  Switch topology much more stable than at host level  Much more scalable than at host level  Each switch has an ID – one MAC address of the switch interfaces

21 1.10.2012 21 SEATTLE approach 2/2  2) DHT used in switches  {IP->MAC} mapping  This is essentially ARP request avoiding flooding  {MAC->location} mapping  When switch is located – routing along the shortest path can be used  DCHP Service location can also be stored  SEATTLE thus reduces flooding, allows usage of shortest path and offers a nice way to locate DHCP service

22 1.10.2012 22 SEATTLE  Control overhead reduced with consistent hashing  When set of switches changes due to network failure or recovery – only some entries must be moved  Balancing load with virtual switches  If some switches are more powerful – the switch can represent itself as many – getting more load  Enabling flexible service discovery  This is mainly DHCP – but could be something like {“PRINTER”->location}

23 1.10.2012 23 Topology changes  Adding and removing switches/links can alter topology  Switch/link failures and recoveries can also lead to partitioning events (more rare)  Non-partitioning link failures are easy to handle – the resolver for hash entry is not changed

24 1.10.2012 24 Switch failures  If switch fails or recovers hash entries need to be moved  The switch that published value – monitors the liveliness of resolver. Republishing entry when needed  The entries have TTL

25 1.10.2012 25 Partitioning events  Each switch has to book keep also locally-stored location entries  If switch s_old is removed / not reachable – all the switches need to remove these location entries  This approach correctly handles partitioning events

26 1.10.2012 26 Scaling: location  Hosts use directory service to publish and maintain {mac->location} mappings  When host a with mac_a arrives – it accesses switch S_a (steps 1-3)  Switch s_a publishes {mac_a,location}, by calculating the correct bucket F(mac_a) i.e. switch/resolver  When node b wants to send message to node a  F(mac_a) is calculated to fetch the location  ’Reactive resolution’ – also cache misses do not lead flooding

27 1.10.2012 27 Scaling: ARP  When node b makes ARP request – SEATTLE converts this to a {F(IP_a) -> mac_a} request  The resolver/switch for F(IP_a) is usually different from F(mac_a)  Optimization for hosts making ARP request  F(IP_a) address resolver can also store mac_a and S_a  When node b makes F(IP_a) ARP request also mac_a- >S_a mapping is cached to S_b  Shortest path (-> path 10) can now be used

28 1.10.2012 28 Handling host dynamics  Location change  Wireless handoff  VM moved but retaining MAC  Host MAC address changes  NIC card replaced  Failover event  VM migration forcing MAC change  Host changes IP  DHCP leave expires  Manual reconfiguration

29 1.10.2012 29 Insert, delete and update  Location change  Host h moves from s_old to s_new  s_new updates the existing mac-to-location entry  MAC change  IP-to-MAC update  MAC-to-location deletion (old) and insertion (new)  IP change  S_h deletes old IP-to-MAC and inserts new IP-to-MAC

30 1.10.2012 30 Ethernet: Bootstrapping hosts  Host discovered by access switches  SEATTLE switches snoop ARP requests  Most OSes generate ARP request at boot up / if up  Aldo DHCP messages or host down can be used  Host configuration without broadcast  DHCP_SERVER hashes string “DHCP_SERVER” and stores the location to the switches  The “DHCP_SERVER” string is used to locate service  No need to broadcast for ARP or DHCP

31 1.10.2012 31 Scalable and flexible VLANs  To support broadcasts – the authors suggest using groups  Similar to VLAN - groups is defined as a set of hosts who share the same broadcast domain  The groups are not limited to layer-2 reachability  Multicast-based group-wide broadcasting  Multicast tree with broadcast root for each group  F(group_id) used for broadcast root location

32 1.10.2012 32 Simulations  1) Campus ~40 000 students  517 routers and switches  2) AP-Large (Access Provider)  315 routers  3) Datacenter (DC)  4 core routes with 21 aggregation switches  Routers were converted to SEATTLE switches

33 1.10.2012 33 Cache timeout and AP-large with 50k hosts  Shortest path cache timeout has impact on number of location lookups  Even with 60s time out 99.98% packets were forwarded without lookup  Control overhead (blue) decreases very fast – where as the table size increases only moderately  Shortest path is used in majority of routing in these simulations

34 1.10.2012 34 Table size increase in DC  Ethernet bridges stores entry for each destination ~ O(sh) behavior across network  SEATTLE requires only ~O(h) state since only access and resolver switches need to store and location information for each hosts  With this topology the table size was reduced by factor of 22  In AP-large case the factor was increased to 64

35 1.10.2012 35 Control overhead in AP-large  Number of control messages over all links in the topology divided by the number switches and duration of the trace  SEATTLY significantly reduces control overhead in the simulations  This is mainly because Ethernet generates network wide floods for a significant number of packets

36 1.10.2012 36 Effect of switch failure in DC  Switches were allowed to fail randomly  The average recover time was 30 seconds  SEATTLE can use all the links in the topology, where as Ethernet is restricted to the spanning tree  Ethernet must re-compute the tree causing outages

37 1.10.2012 37 Effect of host mobility in Campus  Hosts were randomly moved between access switches  For high mobility rates, SEATLLES loss rate was lower than Ethernet  On Ethernet it takes sometime for switches to evict the stale information location information and re- learn the new location  SEATTLE provided low loss and broadcast overhead

38 1.10.2012 38 What was omitted  Authors suggest multi-level one-hop DHTs  With large dynamic networks – it can be beneficial that entries are stored close  This is achieved with regions and backbone – border switches connect to the backbone switches  With topology changes  Approach to seamless mobility is described in the paper  Updating remote host caches is required with switch based MAC revocation lists  Some simulation results  Authors also made sample implementation

39 1.10.2012 39 Conlusions  Operators today face challenges in managing and configuring large networks. This is largely to complexity of administering IP networks.  Ethernet is not a viable alternative  poor scaling and inefficient path selection  SEATTLE promises scalable self-configuring routing  Simulations suggest efficient routing, low latency with quick recovery  Host mobility supported with low control overhead  Ethernet stacks at end hosts are not modified

40 1.10.2012 40 Thank you for your attention! Questions? Comments?


Download ppt "1.10.2012 1 SEATTLE - SEATTLE - A Scalable Ethernet Architecture for Large Enterprises T-110.6120 – Special Course in Future Internet Technologies M.Sc."

Similar presentations


Ads by Google