Presentation is loading. Please wait.

Presentation is loading. Please wait.

Nick Feamster Georgia Tech

Similar presentations

Presentation on theme: "Nick Feamster Georgia Tech"— Presentation transcript:

1 Nick Feamster Georgia Tech
Networking Nick Feamster Georgia Tech

2 Goal of This Tutorial Teach engineers the basics of networking and ISP operations Networks today Business models Operations (NOC, operators) Common problems Measurement, Monitoring, and Security

3 Today’s Networks Service provider business models
Network operations center Network operators and engineers

4 Business Models Increasingly commoditized (see Geoff Huston’s talk at NANOG) Status quo: Establish transit costs, bill at 95th percentile of usage Future: differential pricing, preference for certain groups of users, applications

5 Billing for Internet Usage
95th Percentile billing Customer network pays for “committed information rate” (CIR) Throughput measured every 5 minutes (typically with SNMP; flow statistics also can be used for billing) Customer billed based on 95th percentile

6 Net Neutrality

7 Network Operations Operators run the day-to-day operations of the network Adjusting to shifts in traffic, failures, etc. Responding to security threats Provisioning new customers

8 Point-of-Presence (PoP)
A “cluster” of routers in a single physical location Inter-PoP links Long distances High bandwidth Intra-PoP links Cables between racks or floors Aggregated bandwidth PoP

9 Example: Abilene Network Topology

10 Another Example Backbone

11 Internet Routing Overview
Autonomous Systems (ASes) Abilene Comcast Georgia Tech AT&T Cogent Intradomain (i.e., “intra-AS”) routing Interdomain routing

12 Internet Routing Protocol: BGP
Autonomous Systems (ASes) Route Advertisement Destination Next-hop AS Path /16 174… 2637 Session Traffic Diagram of routing table is very confusing because it’s not pointing to anything Green arrow shorter, and too thick… green is a msg More intuition about how the system actually works. Don’t say “interdomain” DESTINATION-BASED Routing Tables look like a set of possible routes and a rankings over these routes (pop up a simplified table fragment)

13 Question: What’s the difference between IGP and iBGP?
Two Flavors of BGP iBGP eBGP External BGP (eBGP): exchanging routes between ASes Internal BGP (iBGP): disseminating routes to external destinations among the routers within an AS Question: What’s the difference between IGP and iBGP?

14 IPv4 Addresses: Networks of Networks
Topological Addressing 32-bit number in “dotted-quad” notation 130 207 7 36 Network (16 bits) Host (16 bits) Problem: 232 addresses is a lot of table entries Solution: Routing based on network and host /16 is a 16-bit prefix with 216 IP addresses

15 Pre-1994: Classful Addressing
32 8 16 24 Class A Network ID Host ID /8 blocks (e.g., MIT has /8) Class B 10 /16 blocks (e.g., Georgia Tech has /16) Class C 110 /24 blocks (e.g., AT&T Labs has /24) Class D Multicast Addresses 1110 Class E 1111 Reserved for experiments Simple Forwarding: Address range specifies network ID length

16 Classless Interdomain Routing (CIDR)
Use two 32-bit numbers to represent a network. Network number = IP address + Mask Example: BellSouth Prefix: /22 IP Address: “Mask”: Address no longer specifies network ID range. New forwarding trick: Longest Prefix Match

17 Benefits of CIDR Efficiency: Can allocate blocks of prefixes on a finer granularity Hierarchy: Prefixes can be aggregated into supernets. (Not always done. Typically not, in fact.) Customer 1 /24 /8 AT&T Internet Customer 2 /24

18 Growth of IP Prefixes

19 1994-1998: Linear Growth About 10,000 new entries per year
Source: Geoff Huston About 10,000 new entries per year In theory, less instability at the edges (why?)

20 Around 2000: Fast Growth Resumes
T. Hain, “A Pragmatic Report on IPv4 Address Space Consumption”, Cisco IPJ, September 2005 Claim: remaining /8s will be exhausted within the next 5-10 years.

21 Significant contributor: Multihoming Rapid growth in routing tables
Fast growth resumes Significant contributor: Multihoming Dot-Bomb Hiccup Rapid growth in routing tables Source: Geoff Huston

22 The Address Allocation Process
IANA AfriNIC APNIC ARIN LACNIC RIPE Georgia Tech Allocation policies of RIRs affect pressure on IPv4 address space

23 Common Problems Diagnosis and troubleshooting (hence, measurement)
Traffic engineering Security Design and capacity planning

24 Some downtime is very hard to protect against…
What can go wrong? Some downtime is very hard to protect against… Two-thirds of the problems are caused by configuration of the routing protocol

25 Measurement and Monitoring

26 Passive vs. Active Measurement
Passive Measurement: Collection of packets, flow statistics of traffic that is already flowing on the network Packet traces Flow statistics Application-level logs Active Measurement: Inject “probing” traffic to measure various characteristics Traceroute Ping Application-level probes (e.g., Web downloads)

27 Billing for Internet Usage
95th Percentile billing Customer network pays for “committed information rate” (CIR) Throughput measured every 5 minutes (typically with SNMP; flow statistics also can be used for billing) Customer billed based on 95th percentile

28 Passive Traffic Data Measurement
SNMP byte/packet counts: everywhere Packet monitoring: selected locations Flow monitoring: typically at edges (if possible) Direct computation of the traffic matrix Input to denial-of-service attack detection Deep Packet Inspection: also at edge, where possible

29 Simple Network Management Protocol
Management Information Base (MIB) Information store Unique variables named by OIDs Accessed with SNMP Specific MIBs for byte/packet counts (per link) SNMP Manager Agent Managed Objects DB

30 SNMP (Passive) Advantage: ubiquitous Disadvantages Utility
Supported on all networking equipment Multiple products for polling and analyzing data Disadvantages Coarse granularity Cannot express complex queries on the data Unreliable delivery of the data using UDP Utility Link utilization (billing) Traffic matrix inference

31 Packet-level Monitoring
Passive monitoring to collect full packet contents (or at least headers) Advantages: lots of detailed information Precise timing information Information in packet headers Disadvantages: overhead Hard to keep up with high-speed links Often requires a separate monitoring device

32 Full Packet Capture (Passive)
Example: Georgia Tech OC3Mon Rack-mounted PC Optical splitter Data Acquisition and Generation (DAG) card Source:

33 What is a flow? Source IP address Destination IP address Source port
Destination port Layer 3 protocol type TOS byte (DSCP) Input logical interface (ifIndex)

34 Cisco NetFlow Basic output: “Flow record”
Most common version is v5 Current version (9) is being standardized in the IETF (template-based) More flexible record format Much easier to add new flow record types Core Network Collector (PC) Approximately 1500 bytes 20-50 flow records Sent more frequently if traffic increases Collection and Aggregation

35 Flow Record Contents Source and Destination, IP address and port
Basic information about the flow… Source and Destination, IP address and port Packet and byte counts Start and end times ToS, TCP flags …plus, information related to routing Next-hop IP address Source and destination AS Source and destination prefix

36 Aggregating Packets into Flows
Criteria 1: Set of packets that “belong together” Source/destination IP addresses and port numbers Same protocol, ToS bits, … Same input/output interfaces at a router (if known) Criteria 2: Packets that are “close” together in time Maximum inter-packet spacing (e.g., 15 sec, 30 sec) Example: flows 2 and 4 are different flows due to time

37 Reducing Measurement Overhead
Filtering: on interface destination prefix for a customer port number for an application (e.g., 80 for Web) Sampling: before insertion into flow cache Random, deterministic, or hash-based sampling 1-out-of-n or stratified based on packet/flow size Two types: packet-level and flow-level Aggregation: after cache eviction packets/flows with same next-hop AS packets/flows destined to a particular service

38 Packet Sampling for Flow Monitoring
Packet sampling before flow creation (Sampled Netflow) 1-out-of-m sampling of individual packets (e.g., m=100) Create of flow records over the sampled packets Reducing overhead Avoid per-packet overhead on (m-1)/m packets Avoid creating records for a large number of small flows Increasing overhead (in some cases) May split some long transfers into multiple flow records … due to larger time gaps between successive packets time not sampled timeout two flows

39 Sampling: Flow-Level Sampling
Sampling of flow records evicted from flow cache When evicting flows from table or when analyzing flows Stratified sampling to put weight on “heavy” flows Select all long flows and sample the short flows Reduces the number of flow records Still measures the vast majority of the traffic sample with 0.1% probability Flow 1, 40 bytes Flow 2, bytes Flow 3, 8196 bytes Flow 4, bytes Flow 5, 532 bytes Flow 6, 7432 bytes sample with 100% probability sample with 10% probability

40 Two Main Approaches Packet-level Monitoring Flow-level Monitoring
Keep packet-level statistics Examine (and potentially, log) variety of packet-level statistics. Essentially, anything in the packet. Timing Flow-level Monitoring Monitor packet-by-packet (though sometimes sampled) Keep aggregate statistics on a flow

41 Packet Capture on High-Speed Links
Example: Georgia Tech “OC3Mon” Rack-mounted PC Optical splitter Data Acquisition and Generation (DAG) card Source:

42 Characteristics of Packet Capture
Allows inspection on every packet on 10G links Disadvantages Costly Requires splitting optical fibers Must be able to filter/store data

43 Data Measurement Repositories
Abilene/Internet 2 Observatory Configuration examples SNMP data ISIS, BGP routing data, NetFlow traffic data RouteViews BGP updates BGP table snapshots

44 Multihoming and Traffic Engineering

45 What is Multihoming? The use of redundant network links for the purposes of external connectivity Can be achieved at many layers of the protocol stack and many places in the network Multiple network interfaces in a PC An ISP with multiple upstream interfaces Can refer to having multiple connections to The same ISP Multiple ISPs

46 Why Multihome? Redundancy Availability Performance Cost
Interdomain traffic engineering: the process by which a multihomed network configures its network to achieve these goals

47 Redundancy Maintain connectivity in the face of:
Physical connectivity problems (fiber cut, device failures, etc.) Failures in upstream ISP

48 Performance Use multiple network links at once to achieve higher throughput than just over a single link. Allows incoming traffic to be load-balanced. 30% of traffic 70% of traffic

49 Multihoming in IP Networks Today
Stub AS: no transit service for other ASes No need to use BGP Multi-homed stub AS: has connectivity to multiple immediate upstream ISPs Need BGP No need for a public AS number No need for IP prefix allocation Multi-homed transit AS: connectivity to multiple ASes and transit service Need BGP, public AS number, IP prefix allocation

50 BGP or no? Advantages of static routing Advantages of BGP
Cheaper/smaller routers (less true nowadays) Simpler to configure Advantages of BGP More control of your destiny (have providers stop announcing you) Faster/more intelligent selection of where to send outbound packets. Better debugging of net problems (you can see the Internet topology now)

51 Same Provider or Multiple?
If your provider is reliable and fast, and affordably, and offers good tech-support, you may want to multi-home initially to them via some backup path (slow is better than dead). Eventually you’ll want to multi-home to different providers, to avoid failure modes due to one provider’s architecture decisions.

52 Multihomed Stub: One Link
Multiple links between same pair of routers. Default routes to “border” “Stub” ISP Upstream ISP Downstream ISP’s routers configure default (“static”) routes pointing to border router. Upstream ISP advertises reachability

53 Multihomed Stub: Multiple Links
Multiple links to different upstream routers BGP for load balance at edge “Stub” ISP Upstream ISP Internal routing for “hot potato” Use BGP to share load Use private AS number (why is this OK?) As before, upstream ISP advertises prefix

54 Multihomed Stub: Multiple ISPs
Upstream ISP 1 “Stub” ISP Upstream ISP 2 Many possibilities Load sharing Primary-backup Selective use of different ISPs Requires BGP, public AS number, etc.

55 Multihomed Transit Network
ISP 1 Transit ISP ISP 3 ISP 2 BGP everywhere Incoming and outcoming traffic Challenge: balancing load on intradomain and egress links, given an offered traffic load

56 Interdomain Traffic Engineering
The process by which a network operator configures the network to achieve Traffic load balance Redundancy (primary/backup), etc. Two tasks Outbound traffic control Inbound traffic control Key Problems: Predictability and Scalability

57 Outbound Traffic Control
Easier to control than inbound traffic Destination-based routing: sender determines where the packets go Control over next-hop AS only Cannot control selection of the entire path Provider 1 Provider 2 Control with local preference

58 Outbound Traffic: Load Balancing
Control routes to provider per-prefix Assign local preference across destination prefixes Change the local preference assignments over time Useful inputs to load balancing End-to-end path performance data Outbound traffic statistics per destination prefix Challenge: Getting from traffic volumes to groups of prefixes that should be assigned to each link Premise of “intelligent route control” preoducts.

59 Traffic Engineering Goals
Predictability Ensure the BGP decision process is deterministic Assume that BGP updates are (relatively) stable Limit overhead introduced by routing changes Minimize frequency of changes to routing policies Limit number of prefixes affected by changes Limit impact on how traffic enters the network Avoid new routes that might change neighbor’s mind Select route with same attributes, or at least path length

60 Managing Scale Destination prefixes Routing choices
More than 90,000 destination prefixes Don’t want to have per-prefix routing policies Small fraction of prefixes contribute most of the traffic Focus on the small number of heavy hitters Define routing policies for selected prefixes Routing choices About 27,000 unique “routing choices” Help in reducing the scale of the problem Small fraction of “routing choices” contribute most traffic Focus on the very small number of “routing choices” Define routing policies on common attributes

61 Achieving Predictability
Route prediction with static analysis Helpful to know effects before deployment Static analysis can help BGP policy configuration Topology BGP routing model eBGP routes Offered traffic Flow of traffic through the network

62 Challenges to Predictability
For transit ISPs: effects on incoming traffic Lack of coordination strikes again!

63 Inter-AS Negotiation Coordination aids predictability
Destination 1 Coordination aids predictability Negotiate where to send Inbound and outbound Mutual benefits How to implement? What info to exchange? Protecting privacy? How to prioritize choices? How to prevent cheating? Provider B multiple peering points “Hot Potato” routing Provider A Destination 2

64 Outbound: Multihoming Goals
Redundancy Dynamic routing will failover to backup link Performance Select provider with best performance per prefix Requires active probing Cost Select provider per prefix over time to minimize the total financial cost

65 Inbound Traffic Control
More difficult: no control over neighbors’ decisions. Three common techniques (previously discussed) AS path prepending Communities and local preference Prefix splitting How does today’s paper (MONET) control inbound traffic?

66 How many links are enough?
K upstream ISPs Not much benefit beyond 4 ISPs Akella et al., “Performance Benefits of Multihoming”, SIGCOMM 2003

67 Problems with Multihoming in IPv4
Routing table growth Provider-based addressing Advertising prefix out multiple ISPs – can’t aggregate Poor control over inbound traffic Existing mechanisms do not allow hosts to control inbound traffic

68 Internet Routing Overview
Autonomous Systems (ASes) Abilene Comcast Georgia Tech AT&T Cogent Intradomain (i.e., “intra-AS”) routing Interdomain routing

69 Configuration Problems: “AS 7007”
“…a glitch at a small ISP… triggered a major outage in Internet access across the country. The problem started when MAI Network Services...passed bad router information from one of its customers onto Sprint.”, April 25, 1997 UUNet Sprint Florida Internet Barn

70 Diagnosis and Troubleshooting
“…a glitch at a small ISP… triggered a major outage in Internet access across the country. The problem started when MAI Network Services...passed bad router information from one of its customers onto Sprint.”, April 25, 1997 “Microsoft's websites were offline for up to 23 hours...because of a [router] misconfiguration…it took nearly a day to determine what was wrong and undo the changes.”, January 25, 2001 “WorldCom Inc…suffered a widespread outage on its Internet backbone that affected roughly 20 percent of its U.S. customer base. The network problems…affected millions of computer users worldwide. A spokeswoman attributed the outage to "a route table issue.", October 3, 2002 "A number of Covad customers went out from 5pm today due to, supposedly, a DDOS (distributed denial of service attack) on a key Level3 data center, which later was described as a route leak (misconfiguration).“ --, February 23, 2004

71 Operator Mailing List (NANOG)
Date: Mon, 18 Oct :15: Subject: Level 3 US east coast "issues" Level 3 experiencing widespread "unspecified routing issues" on the US east coast. Master ticket Anyone have more specific information? Date: Mon, 18 Oct :20: (EDT) Subject: Re: Level 3 US east coast "issues" Level 3 is currently experiencing a backbone outage causing routing instability and packet loss. We are working to restore and will be sending hourly updates…

72 Compare: 83 power outages, 1 fire
Operator Mailing List Compare: 83 power outages, 1 fire “Two rats crawled through an underground cable conduit into a cabinet of power switching gear adjacent to the Stanford University cogeneration plant, and caused an explosion that cut off power to the Stanford area.” (October 12, 1996) XXX need to regenerate bar graphs to give equal weighting to years (i.e., do it in three year chunks, starting 96-98, 99-01, 02-04? I think that way the comparisons are easier to make. How fatal were these errors? Note: Only includes problems openly discussed on this list.

73 Routing Configuration
Filtering: route advertisement Ranking: route selection Customer Primary Dissemination: internal route advertisement XXX What problem does factoring solve? Need to tie into challenge this approach solves (“dealing with complexity”) This slide seems like an orphan slide. Can’t see the transition both to and from. The next slide is on vis which flows well from the prev slide! Competitor Backup

74 Internet Business Model (Simplified)
Provider Preferences implemented with local preference manipulation Free to use Pay to use Peer Get paid to use Customer Destination Customer/Provider: One AS pays another for reachability to some set of destinations “Settlement-free” Peering: Bartering. Two ASes exchange routes with one another.

75 Peering Contracts: Consistent Export
Rules of settlement-free peering: Advertise routes at all peering points Advertised routes must have equal “AS path length” Sprint “equally good” routes AT&T Enables “hot potato” routing.

76 Two different Export Policies
Consistent Export Possible Causes Neighbor AS Export Export Clause Prepend Malice/deception iBGP signaling partition Inconsistent export policy Two different Export Policies neighbor route-map PEER permit 10 set prepend neighbor route-map PEER permit 10 set prepend 123

77 Inconsistent Export in Practice
Feamster et al., “BorderGuard: Detecting Cold Potatoes from Peers”. ACM IMC, October 2004.

78 Blackholes Date: Thu, 18 Jul 2002 06:05:10 -0400 (EDT)
From: Chad Oleary Subject: Re: problems with 701 To: We're starting to see the same issues with UUNet, again. Anyone else seeing this? Trying to reach Qwest... traceroute to ( ), 30 hops max, 38 byte packets 1 ( ) ms ms ms Serial2-10.GW1.TPA2.ALTER.NET ( ) ms ms ms at XL4.ATL1.ALTER.NET ( ) ms ms ms 4 XL2.ATL5.ALTER.NET ( ) ms ms ms 5 POS7-0.BR2.ATL5.ALTER.NET ( ) ms ms ms 6 * * * 7 * * *

79 Security

80 Security: “Bogon” Routes
Feamster et al., “An Empirical Study of ‘Bogon’ Route Advertisements”. ACM CCR, January 2005.

81 Can IP addresses from which spam is received be spoofed?
Spam, Phishing, etc. Unsolicited commercial As of about August 2008, estimates indicate that about 95% of all is spam Common spam filtering techniques Content-based filters DNS Blacklist (DNSBL) lookups: Significant fraction of today’s DNS traffic! Can IP addresses from which spam is received be spoofed?

82 Spam and Routing

83 Worms and Botnets

84 What is a Worm? Code that replicates and propagates across the network
Often carries a “payload” Usually spread via exploiting flaws in open services “Viruses” require user action to spread First worm: Robert Morris, November 1988 6-10% of all Internet hosts infected (!) Many more since, but none on that scale until July 2001

85 Example Worm: Code Red Initial version: July 13, 2001
Exploited known ISAPI vulnerability in Microsoft IIS Web servers 1st through 20th of each month: spread 20th through end of each month: attack Payload: Web site defacement Scanning: Random IP addresses Bug: failure to seed random number generator

86 Code Red: Revisions Released July 19, 2001
Payload: flooding attack on Attack was mounted at the IP address of the Web site Bug: died after 20th of each month Random number generator for IP scanning fixed

87 Code Red: Host Infection Rate
Measured using backscatter technique Exponential infection rate

88 Designing Fast-Spreading Worms
Hit-list scanning Time to infect first 10k hosts dominates infection time Solution: Reconnaissance (stealthy scans, etc.) Permutation scanning Observation: Most scanning is redundant Idea: Shared permutation of address space. Start scanning from own IP address. Re-randomize when another infected machine is found. Internet-scale hit lists Flash worm: complete infection within 30 seconds

89 Botnets Bots: Autonomous programs performing tasks
Plenty of “benign” bots e.g., weatherbug Botnets: group of bots Typically carries malicious connotation Large numbers of infected machines Machines “enlisted” with infection vectors like worms (last lecture) Available for simultaneous control by a master Size: up to 350,000 nodes (from today’s paper)

90 “Rallying” the Botnet Easy to combine worm, backdoor functionality
Problem: how to learn about successfully infected machines? Options Hard-coded address

91 Botnet Controller (IRC server)
Dynamic DNS Botnet Controller (IRC server) Infected Machine Botnet master typically runs some IRC server on a well-known port (e.g., 6667) Infected machine contacts botnet with pre-programmed DNS name (e.g., Dynamic DNS: allows controller to move about freely

92 Some Defenses

93 Idea #1: Ingress Filtering
Drop all packets with source address other than /24 Internet /24 RFC 2827: Routers install filters to drop packets from networks that are not downstream Feasible at edges Difficult to configure closer to network “core”

94 Idea #2: uRPF Checks Unicast Reverse Path Forwarding
Accept packet from interface only if forwarding table entry for source IP address matches ingress interface Strict Mode uRPF Enabled “A” Routing Table Destination Next Hop / Int. 1 / Int. 2 from wrong interface Unicast Reverse Path Forwarding Cisco: “ip verify unicast reverse-path” Requires symmetric routing

95 Problems with uRPF Asymmetric routing

96 S-BGP Address-based PKI: validate signatures 􀂄
Authentication of ownership for IP address blocks, AS number, an AS's identity, and a BGP router's identity Use existing infrastructure (Internet registries etc.) Routing origination is digitally signed BGP updates are digitally signed 􀂄 Route attestations: A new, optional, BGP transitive path attribute carries digital signatures covering the routing information in updates

97 Practical Problems with S-BGP
Requires Public-Key Infrastructure Lots of digital signatures to calculate and verify. Message overhead CPU overhead Calculation expense is greatest when topology is changing Caching can help Route aggregation is problematic (maybe that’s OK) Secure route withdrawals when link or node fails? Address ownership data out of date Deployment

Download ppt "Nick Feamster Georgia Tech"

Similar presentations

Ads by Google