Lecture 13 – Network Mapping

Lecture 13 – Network Mapping
15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture 13 – Network Mapping

Why Predict Network Performance?
In any large distributed system must assign responsibility to nodes Want to maximize performance, availability, etc. How to choose? Communication patterns are often localized Need to find nodes that are near each other Need geographic & network mapping Lecture #13

Difficult to find nearby nodes quickly and efficiently
Network Mapping Difficult to find nearby nodes quickly and efficiently Huge number of paths to measure TCP bandwidth and RTT probes are time-consuming No clean mapping of IP address  location (geographic or network topology) Lecture #13

Outline IP2Geo IDMaps SPAND GNP Lecture #13

IP2Geo Techniques GeoTrack GeoPing GeoCluster DNS-based Latency-based
BGP-based Lecture #13

GeoTrack Traceroute to host DNS names for routers near host
Typically city, airport, country names embedded in names Need ISP-specific heuristics to reduce false positives Codes used and location in name Impact of proxies Always returns location of proxy Client may be anywhere Lecture #13

GeoTrack Performance FooTV CDF looks identical to proxy location PDF
Most FooTV customers use a proxy Median Error: 590km University hosts work well No proxy, always on, traceroute always completes Median error: 102km Lecture #13

GeoPing Can latency be converted to distance?
Distance to different locations grouped by latency While good at short latencies  error grows for longer latencies Can’t break speed of light, but can get slow route to nearby places Lecture #13

GeoPing Performance Match host to other nearby nodes
Delay map: vector of delays to known n landmarks Matching: most similar delay map Euclidean distance in n dimensional space Use matched host’s location Lecture #13

GeoCluster Classify all hosts in an address cluster as same location
BGP address prefixes as starting point Address allocation in CIDR blocks Basic granularity of announcements BGP announcements may be subdivided to reflect different attributes (e.g. policy) 100,000 address prefixes announced  from 12,000 ASes Address prefixes may still be large entities May span large geographic areas Lecture #13

GeoCluster Sub-clustering
When prefix is large  look at consensus on location If consensus is weak  split prefix in half Repeat if needed until too few samples Proxies result in lack of consensus and therefore no location report Lecture #13

GeoCluster Performance
Without sub-clustering University data set Median error: 28km No report for 12% of hosts bCentral data set Median error: 685km Much larger ISPs cause problems 23% with no answer Lecture #13

GeoCluster Performance
bCentral poor performance due to large size of prefixes Can subclustering help?  yes Note that even /24 clusters don’t work well  surprising given the address allocation of Internet Lecture #13

Network Distance Round-trip propagation and transmission delay
Reflects Internet topology and routing A good first order performance optimization metric Helps achieve low communication delay A reasonable indicator of TCP throughput Can weed out most bad choices But the O(N2) network distances are also hard to determine efficiently in Internet-scale systems Lecture #13

Active Measurements Network distance can be measured with ping-pong messages But active measurement does not scale Lecture #13

Scaling Alternatives Lecture #13

Sharing Measurements IDMaps [Francis et al ’99]
Server A/B 50ms Probe A B Lecture #13

Assumptions Probe nodes approximate direct path
May require large number Careful placement may help Requires that distance between end-points is approximated by sum Triangle inequality must hold (i.e., (a,c) > (a,b) + (b,c) Lecture #13

Triangle Inequality in the Internet
Lecture #13

SPAND Design Choices Measurements are shared Measurements are passive
Hosts share performance information by placing it in a per-domain repository Measurements are passive Application-to-application traffic is used to measure network performance Measurements are application-specific When possible, measure application response time, not bandwidth, latency, hop count, etc. 3 basic design choices that SPAND uses Lecture #13

SPAND Architecture Internet Client Packet Capture Host Performance
Data Performance Server Perf. Reports Perf Query/ Response Client Lecture #13

SPAND Assumptions Geographic Stability: Performance observed by nearby clients is similar  works within a domain Amount of Sharing: Multiple clients within domain access same destinations within reasonable time period  strong locality exists Temporal Stability: Recent measurements are indicative of future performance  true for 10’s of minutes Lecture #13

Prediction Accuracy Packet capture trace of IBM Watson traffic
Cumulative Probability Ratio of Predicted to Actual Throughput Packet capture trace of IBM Watson traffic Compare predictions to actual throughputs Lecture #13

First Key Insight With millions of hosts, “What are the O(N2) network distances?” may be the wrong question Instead, could we ask: “Where are the hosts in the Internet?” What does it mean to ask “Where are the hosts in the Internet?” Do we need a complete topology map? Can we build an extremely simple geometric model of the Internet? Lecture #13

New Fundamental Concept: “Internet Position”
Using GNP, every host can have an “Internet position” O(N) positions, as opposed to O(N2) distances Accurate network distance estimates can be rapidly computed from “Internet positions” (x2,y2,z2) y “Internet position” is a local property that can be determined before applications need it Can be an interface for independent systems to interact (x1,y1,z1) x z (x4,y4,z4) (x3,y3,z3) Lecture #13

Vision: Internet Positioning Service
(2,0) (6,0) (1,3) (2,4) (5,4) (7,3) Enable every host to independently determine its Internet position Internet position should be as fundamental as IP address “Where” as well as “Who” Lecture #13

Global Network Positioning (GNP) Coordinates
Model the Internet as a geometric space (e.g. 3-D Euclidean) Characterize the position of any end host with geometric coordinates Use geometric distances to predict network distances (x2,y2,z2) y (x1,y1,z1) x z (x4,y4,z4) (x3,y3,z3) Lecture #13

Landmark Operations (Basic Design)
y x (x2,y2) (x1,y1) (x3,y3) L1 L2 L3 L1 L2 L3 Measure inter-Landmark distances Use minimum of several round-trip time (RTT) samples Internet Compute coordinates by minimizing the discrepancy between measured distances and geometric distances Cast as a generic multi-dimensional minimization problem, solved by a central node Lecture #13

Ordinary Host Operations (Basic Design)
(x2,y2) (x1,y1) (x3,y3) L2 L1 L3 y (x4,y4) L1 L3 x L2 Internet Each host measures its distances to all the Landmarks Compute coordinates by minimizing the discrepancy between measured distances and geometric distances Cast as a generic multi-dimensional minimization problem, solved by each host Lecture #13

Overall Accuracy 0.1 0.28 Lecture #13

Why the Difference? IDMaps overpredicts IDMaps
GNP (1-dimensional model) IDMaps overpredicts Lecture #13

Next Lecture Lecturer: Brad Karp Topic: distributed systems Readings
Fault tolerance Replication Logging/monitoring Readings Ivy Byzantine Fault Tolerance Lecture #13

Lecture 13 – Network Mapping

Similar presentations

Presentation on theme: "Lecture 13 – Network Mapping"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 13 – Network Mapping

Similar presentations

Presentation on theme: "Lecture 13 – Network Mapping"— Presentation transcript:

Similar presentations

About project

Feedback