Download presentation
Presentation is loading. Please wait.
Published byDerek Elliott Modified over 9 years ago
1
1 Routing and Forwarding 4/17/2012
2
Recap r Internet routing is divided into intra-AS (intradomain) and inter-AS (interdomain) routing r Technical reasons r Non-technical reasons r BGP (Border Gateway Protocol), a path- vector protocol, is the de facto standard interdomain protocol
3
3 Routing: Example AS A (OSPF) AS B (OSPF intra routing) AS D AS C i b How to specify? a1 a2 d a1->i: I can reach hosts in D; my path: AD E F AS I d1 d2
4
4 IP Addressing Scheme: Requirements r We need an address to uniquely identify each destination r Routing scalability needs flexibility in aggregation of destination addresses m we should be able to aggregate a set of destinations as a single routing unit r Preview: the unit of routing in the Internet is a network---the destinations in the routing protocols are networks
5
5 IP Address: An IP Address Identifies an Interface r IPv4 address: 32-bit identifier for an interface r interface: m routers typically have multiple interfaces m host may have multiple interfaces %/sbin/ifconfig -a 223.1.1.1 223.1.1.2 223.1.1.3 223.1.1.4 223.1.2.9 223.1.2.2 223.1.2.1 223.1.3.2 223.1.3.1 223.1.3.27 223.1.3.2 = 11011111 00000001 00000011 00000010 223 123
6
6 IP Addressing r IP address: m network part m host part r What’s a network ? ( from IP address perspective) m is a unit of routing: can be routed together (depend on the routing protocol) 223.1.1.1 223.1.1.3 223.1.1.4 223.1.2.2 223.1.2.1 223.1.2.6 223.1.3.2 223.1.3.1 223.1.3.27 223.1.1.2 223.1.7.0 223.1.7.1 223.1.8.0223.1.8.1 223.1.9.1 223.1.9.2
7
7 IP Addressing: Class-ful Addressing given notion of “network”, let’s re-examine IP addresses: “class-ful” addressing in the original IP design: Problem of class-ful addressing? 1.0.0.0 to 127.255.255.255 128.0.0.0 to 191.255.255.255 192.0.0.0 to 223.255.255.255 224.0.0.0 to 239.255.255.255 240.0.0.0 to 255.255.255.255
8
8 IP Addressing: CIDR r (Static)classful addressing: m inefficient use of address space, address space exhaustion e.g., a class A net allocated enough addresses for 16 million hosts; a class B address may also be too big m not flexible for aggregation r CIDR: Classless InterDomain Routing m network portion of address of arbitrary length m address format: a.b.c.d/x, where x is # bits in network portion of address 11001000 00010111 00010000 00000000 network part host part 200.23.16.0/23 Some systems use mask (1’s to indicate network bits), instead of the /x format
9
9 CIDR Address Aggregation AS A (OSPF) AS D i a1 a2 d i->a1: I can reach 130.132/22; my path: I AS I d1 130.132.1/24 130.132.2/24 130.132.3/24 intradomain routing uses /24
10
10 CIDR Address Aggregation x00/24: B x01/24: C x10/24: E x/22: A x11/24: F A B C E F G x11/24: GF F
11
11 Active BGP Entries (http://bgp.potaroo.net/as1221/bgp-active.html)http://bgp.potaroo.net/as1221/bgp-active.html Internet Growth (http://www.caida.org/research/topology/as_core_network/historical.xml) Routing Table Size of BGP (number of globally advertised, aggregated networks)
12
Routing Table Prefix Length Distr. 12
13
13 IP Addressing: How to Get One? Q: How does an ISP get its block of addresses? A: ICANN: Internet Corporation for Assigned Names and Numbers m allocates addresses m manages DNS m assigns domain names, resolves disputes Use %whois –h whois.arin.net “n ” to check addresses allocated to
14
14 IP addresses: How to Get One? Q: How does a host get an IP address? r Static configured m wintel: control-panel->network->configuration- >tcp/ip->properties m unix: %/sbin/ifconfig eth0 inet 192.168.0.10 netmask 255.255.255.0 r DHCP: Dynamic Host Configuration Protocol: dynamically get address from as server m “plug-and-play”
15
15 DHCP: Dynamic Host Configuration Protocol Goal: allow host to dynamically obtain its IP address from network server when it joins network m can renew its lease on address in use m allows reuse of addresses (only hold address while connected) m support for mobile users who want to join network DHCP msgs: m host broadcasts “DHCP discover” msg m DHCP server responds with “DHCP offer” msg m host requests IP address: “DHCP request” msg m DHCP server sends address: “DHCP ack” msg
16
16 Network Address Translation: Motivation 192.168.1.2 192.168.1.3 192.168.1.4 192.168.1.1 138.76.29.7 local network (e.g., home network) 192.168.1.0/24 rest of Internet Datagrams with source or destination in this network have 192.168.1/24 address for source, destination (as usual) All datagrams leaving local network have same single source NAT IP address: 138.76.29.7, different source port numbers A local network uses just one public IP address as far as outside world is concerned Each device on the local network is assigned a private IP address
17
Private IP 17
18
18 NAT: Network Address Translation Implementation: NAT router must: m outgoing datagrams: replace (source IP address, port #) of every outgoing datagram to (NAT IP address, new port #)... remote clients/servers will respond using (NAT IP address, new port #) as destination addr. m remember (in NAT translation table) every (source IP address, port #) to (NAT IP address, new port #) translation pair m incoming datagrams: replace (NAT IP address, new port #) in dest fields of every incoming datagram with corresponding (source IP address, port #) stored in NAT table
19
19 NAT: Network Address Translation 192.168.1.2 S: 192.168.1.2, 3345 D: 128.119.40.186, 80 1 192.168.1.1 138.76.29.7 1: host 192.168.1.2 sends datagram to 128.119.40.186, 80 NAT translation table WAN side addr LAN side addr 138.76.29.7, 5001 192.168.1.2, 3345 …… S: 128.119.40.186, 80 D: 192.168.1.2, 3345 4 S: 138.76.29.7, 5001 D: 128.119.40.186, 80 2 2: NAT router changes datagram source addr from 192.168.1.2, 3345 to 138.76.29.7, 5001, updates table S: 128.119.40.186, 80 D: 138.76.29.7, 5001 3 3: Reply arrives dest. address: 138.76.29.7, 5001 4: NAT router changes datagram dest addr from 138.76.29.7, 5001 to 192.168.1.2, 3345 192.168.1.3 192.168.1.4
20
20 Network Address Translation: Advantages r No need to be allocated range of addresses from ISP: - just one public IP address is used for all devices m 16-bit port-number field allows 60,000 simultaneous connections with a single LAN-side address ! m can change ISP without changing addresses of devices in local network m can change addresses of devices in local network without notifying outside world r Devices inside local net not explicitly addressable, visible by outside world (a security plus)
21
21 Network Address Translation: Problems r If both hosts are behind NAT, they will have difficulty establishing connection r NAT is controversial: m routers should process up to only layer 3 m violates end-to-end argument NAT possibility must be taken into account by app designers, e.g., P2P applications m address shortage should instead be solved by having more addresses --- IPv6 !
22
22 Outline r Admin. and recap r BGP r IP addressing IP forwarding
23
23 IP Datagram Format ver length 32 bits data (variable length, typically a TCP or UDP segment) 16-bit identifier Internet checksum time to live 32 bit source IP address IP protocol version number header length (bytes) max number remaining hops (decremented at each router) for fragmentation/ reassembly total datagram length (bytes) upper layer protocol to deliver payload to head. len type of service “type” of data flgs fragment offset upper layer 32 bit destination IP address Options (if any) E.g. timestamp, record route taken, specify list of routers to visit. how much overhead with TCP? r 20 bytes of TCP r 20 bytes of IP r = 40 bytes + app layer overhead
24
24 Data Forwarding: Steps r Error checking, e.g., check header checksum; if error, set up error flag r Decrement TTL; if TTL == 0, set error flag r If error, drop the packet, and generate ICMP report
25
25 The Network Layer forwarding Host, router network layer functions: Routing protocols path selection RIP, OSPF, BGP The IP protocol addressing datagram format ICMP protocol error reporting router “signaling” Transport layer: TCP, UDP Link layer Physical layer Network layer
26
26 ICMP: Internet Control Message Protocol r communicate network-level information m error reporting: unreachable host, network, port, protocol m echo request/reply (used by ping) r network-layer “above” IP: m ICMP msgs carried in IP datagrams r ICMP message: type, code plus first 8 bytes of IP datagram causing error Type Code description 0 0 echo reply (ping) 3 0 dest network unreachable 3 1 dest host unreachable 3 2 dest protocol unreachable 3 3 dest port unreachable 3 6 dest network unknown 3 7 dest host unknown 4 0 source quench (congestion control - not used) 8 0 echo request (ping) 9 0 route advertisement 10 0 router discovery 11 0 TTL expired 12 0 bad IP header checksum type code ICMP message body traceroute is developed by a clever use of ICMP
27
27 Data Forwarding: Steps r If no error, look up packet destination address in forwarding table : m if datagram for a host on directly attached network, it is the job of the link layer now m otherwise, lookup: find next-hop router, and its outgoing interface if needed, do fragmentation forward packet to outgoing interface (to the next hop neighbor) try %netstat –rn to see the forwarding table
28
28 Forwarding Look up # prefix interface a) 00001 b) 00010 c) 00011 d) 001 e) 0101 f) 011 g) 10 h) 100 j) 1011 k k) 1100 i) 1010 e f g i 01 a bc d a h j default: - The networks are represented by a decision tree, e.g., a Patricia Trie to look for the longest match of the destination address
29
Content Addressable Memory (CAM) r Standard computer memory (Random Access Memory or RAM) m Word Read(addr) m Write(addr, Word) r CAM m Word Read(addr) m Write(addr, Word) m AddrList Search(Word) 29
30
Ternary CAM r Binary CAM m It searches words consisting entirely of 1s and 0s. r TCAM m It allows a third matching state of “X” or “Don’t Care” for one or more bits in the stored data word. m “10XX0” will match “10000”, “10010”, “10110”, “10100”. 30
31
Prototype of TCAM Based IP Table Lookup 31 Physical AddressIP Prefix (4 Bytes)Next Hop (4 Bytes) 0x000000000x010203XX (1.2.3.0/24)1 0x000000080x050607XX(5.6.7.0/24)2 0x000000100x0102XXXX(1.2.0.0/16)0 Keep an eye on hardware revolutions Solid State Drive (SDD) Optical Switching or Routing?
32
32 Example 1 (same network): A->B r Look up dest address r find dest is on same net r link layer will send the datagram directly inside a link-layer frame misc fields 223.1.1.1223.1.1.3 data 223.1.1.1 223.1.1.2 223.1.1.3 223.1.1.4 223.1.2.9 223.1.2.2 223.1.2.1 223.1.3.2 223.1.3.1 223.1.3.27 A B Dest. Net. next router Nhops 223.1.1/24 1 223.1.2/24 223.1.1.4 2 223.1.3/24 223.1.1.4 2 forwarding table in A 0.0.0.0/0 223.1.1.4 - 223.1.4.1 To Internet src dst
33
33 Example 2 (Different Networks): A-> E r look up dest address in forwarding table r routing table: next hop router to dest is 223.1.1.4 r link layer sends datagram to router 223.1.1.4 inside a link- layer frame m the dest. of the link layer frame is 223.1.1.4 misc fields 223.1.1.1223.1.2.3 data 223.1.1.1 223.1.1.2 223.1.1.3 223.1.1.4 223.1.2.9 223.1.2.3 223.1.2.1 223.1.3.2 223.1.3.1 223.1.3.27 A B E Dest. Net. next router Nhops 223.1.1/24 1 223.1.2/24 223.1.1.4 2 223.1.3/24 223.1.1.4 2 forwarding table in A 0.0.0.0/0 223.1.1.4 - 223.1.4.1 To Internet
34
34 Example 2 (Different Networks): A-> E Arriving at 223.1.1.4, destined for 223.1.2.2 r look up dest address in router’s forwarding table r E on same network as router’s interface 223.1.2.9 m router, E directly attached r link layer sends datagram to 223.1.2.2 inside link-layer frame via interface 223.1.2.9 r datagram arrives at 223.1.2.2!! (hooray!) misc fields 223.1.1.1223.1.2.3 data Dest. Net router Nhops interface 223.1.1/24 - 1 223.1.1.4 223.1.2/24 - 1 223.1.2.9 223.1.3/24 - 1 223.1.3.27 223.1.1.1 223.1.1.2 223.1.1.3 223.1.1.4 223.1.2.9 223.1.2.3 223.1.2.1 223.1.3.2 223.1.3.1 223.1.3.27 A B E forwarding table in router 0.0.0.0/0 - - 223.1.4.1 223.1.4.1 To Internet
35
35 What A Router Looks Like: Outside
36
36 Look Inside a Router Two key router functions: r run routing algorithms/protocol (RIP, OSPF, BGP) r switching datagrams from incoming to outgoing ports
37
37 Input Port Functions physical layer: bit-level reception data link layer: e.g., Ethernet network layer: lookup output port using forwarding table
38
38 Switching: Low End
39
39 r Overcome bus bandwidth limitations r fragmenting datagram into fixed length cells, switch cells through the fabric. r Crossbar, Banyan networks, and others r Cisco 12416: switches 320 Gbps (upgradeable to 1.28 Tbps) with 16 slots (each 10G full-duplex) through the crossbar interconnection network Switching Via An Interconnection Network
40
40 New Potential Bottleneck: Output Ports r Due to output port contention and head-of-the-Line (HOL) blocking (i.e., queued datagram at front of queue prevents others in queue from moving forward)
41
41 Head-of-Line Blocking Limits Thrput r Due to output-port contention and HOL blocking, the stable throughput is only around 2 - sqrt(2) = 0.586 of line speed !
42
42 Avoiding Port Contention and HOB r Virtual output queueing r Input/output ports matching algorithm r Switch fabric speedup, e.g., two cells to one output port For more details: http://www.cisco.com/warp/public/63/arch12000-swfabric.html
43
43 Output Ports r Buffering required when datagrams arrive from fabric faster than the transmission rate r Queueing (delay) and loss due to output port buffer overflow ! r Scheduling and queue/buffer management choose among queued datagrams for transmission
44
Backup Slides 44
45
45 Another Method for Checking Convergence: Dispute Wheels r A dispute wheel m u 1 prefers R 1 Q 2 over Q 1 m u 2 prefers R 2 Q 3 over Q 2 m etc u2u2 d u3u3 u1u1 Q2Q2 Q3Q3 Q1Q1 R2R2 R3R3 R1R1 RnRn 2 0 1 3 320 30 210 20 130 10 unun QnQn
46
46 Patterns of Valid Routes r Consider how a path is extended according to the export policies m case 1 (format: Destination, link type, …) Dest, …CP Dest, … CP CP Dest, …CP Dest, … CP PC Dest, …CP Dest, … CP PP m case 2 Dest, …PC Dest, … PC PC m case 3 Dest, …PP Dest, … PP PC r Valid consecutive link types (starting from source to destination, i.e., reserve) m PC PC m CP PC m PP PC m CP CP m CP PP
47
47 Case 1: The first link of Q 1 is a PC link r Then the first link of R 1 Q 2 must be a PC link m because u 1 chooses R 1 Q 2 and C > E/P r Thus all remaining links along R 1 Q 2 are PC links m because only PC follows PC r Thus the first link of Q 2 is a PC link r …… r All links along R 1, …, R n are PC links r The network has a PC loop. contradiction ! u1u1 d u2u2 unun Q1Q1 Q2Q2 QnQn R1R1 R2R2 RnRn R n-1 PC
48
48 Case 2: The first link of Q 1 is a CP/PP link u1u1 d u2u2 unun Q1Q1 Q2Q2 QnQn R2R2 RnRn R n-1 r All links along R n are CP links because all links before CP/PP are CP r The first link of Q n must be a CP/PP link m because C > E/P r …… r All links along R1, …, Rn are CP links r The network has a PC loop. contradiction ! CP/PP CP CP/PP
49
49 Backup: IP Multicast
50
50 IP Fragmentation & Reassembly r Network links have MTU (max.transfer size) - largest possible link-level frame. m different link types, different MTUs, e.g. Ethernet MTU is 1500 bytes r Large IP datagram divided (“fragmented”) m one datagram becomes several datagrams m “reassembled” only at final destination m IP header bits used to identify, order related fragments fragmentation: in: one large datagram out: 3 smaller datagrams reassembly
51
51 IP Fragmentation and Reassembly ID =x offset =0 fragflag =0 length =4000 ID =x offset =0 fragflag =1 length =1500 ID =x offset =1480 fragflag =1 length =1500 ID =x offset =2960 fragflag =0 length =1040 One large datagram becomes several smaller datagrams Example r 4000 byte datagram r MTU = 1500 bytes
52
52 IP Multicast: Service Model r Multicast group concept: use of indirection m A group is identified by a location-independent logical address (class D IP address: prefix 1110) r Open group model m Anyone can send packets to the “logical” group address m Anyone can join a group and receive packets r Normal, best-effort delivery semantics of IP 128.119.40.186 128.59.16.12 128.34.108.63 128.34.108.60 multicast group 226.17.30.197 Needed: infrastructure to deliver mcast-addressed datagrams to all hosts that have joined that multicast group
53
53 Multicast Across LANs shared tree source-based trees r Goal: find a tree (or trees) connecting routers having local mcast group members m source-based: different tree from sender to each receiver –Distance-vector multicast routing protocol (DVMRP) –Protocol-independent multicast-dense mode (PIM-DM) m shared-tree: same tree used by all group members –Core-Based Tree (CBT) –Protocol-independent multicast-sparse mode (PIM-SM)
54
54 Source Tree: Reverse Path Flooding (RPF) r A router x forwards a packet from source (S) iff it arrives via neighbor y, and y is on the shortest path from x back to S r A packet is replicated to all but the incoming interface x x y y t t S S a z z 1 1 1 1 1
55
55 Reverse Path Forwarding: Improvement r Basic idea: forward a packet from S only on child links for S r A child link of router x for source S m a link that has x as parent on the shortest path from the link to S m a child x notifies its parent y (through the routing protocol) that it has selected y as its parent x x y y t t S S a z z
56
56 Reverse Path Forwarding: Pruning r N o need to forward datagrams down subtree with no mcast group members r “prune” msgs sent upstream by router with no downstream group members R1 R2 R3 R4 R5 R6 R7 router with attached group member router with no attached group member prune message LEGEND S: source links with multicast forwarding P P P
57
57 Pruning r Prune (Source, Group) at a leaf router if no members m send No-Membership Report (NMR) up tree r If all children of router R prune (S,G) m propagate prune for (S,G) to its parent r What do you do when a member of a group (re)joins? m send a Graft message to upstream parent r How to deal with failures? m prune dropped m flow is reinstated m down stream routers re-prune r Note: again a soft-state approach
58
58 Implementation of Source Trees in the Internet r Multicast OSFP (MOSFP) m Membership is part of the link state distribution; calculate source specific, pre-pruned trees r Reverse Path Forwarding m Distance Vector Multicast Routing Protocol (DVMRP) m Protocol Independent Multicast – Dense Mode (PIM-DM) very similar to DVMRP m Difference: PIM uses any unicast routing algorithm to determine the path from a router to the source; DVMRP uses distance vector m Question: the state requirement of Reverse Path Forwarding
59
59 Building a Shared Tree r Steiner Tree: minimum cost tree connecting all routers with attached group members r A Steiner tree is not a spanning tree because you do not need to connect all nodes in the network r Problem is NP-hard r Excellent heuristics exists r Not used in practice: m computational complexity m information about entire network needed m monolithic: rerun whenever a router needs to join/leave
60
60 Center (Core) based Shared Tree r Single delivery tree shared by all r One router identified as “center” of tree r Tree construction is receiver-based m edge router sends unicast join-msg addressed to center router m join-msg “processed” by intermediate routers and forwarded towards center m join-msg either hits existing tree branch for this center, or arrives at center m path taken by join-msg becomes new branch of tree for this router r A sender unicasts a packet to center m The packet is distributed on the tree when it hits the tree
61
61 Example: M3 Joins r Group members: M1, M2 core M1 M2 M3 shared tree S1 join message Discussion: what is property of the constructed tree?
62
62 Example: M1 Sends Data r Group members: M1, M2, M3 r M1 sends data core M1 M2 M3 control (join) messages data S1
63
63 Shared Tree Protocols in the Internet r Core Based Tree r Protocol Independent Multicast (PIM) Sparse mode r The catch: how do you know the center? m session announcement
64
64 Mbone: Tunneling Q: How to connect “islands” of multicast routers in a “sea” of unicast routers? mcast datagram encapsulated inside “normal” (non-multicast- addressed) datagram normal IP datagram sent thru “tunnel” via regular IP unicast to receiving mcast router receiving mcast router unencapsulates to get mcast datagram physical topology logical topology
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.