Presentation is loading. Please wait.

Presentation is loading. Please wait.

Internet Measurement (and some inference & modeling)

Similar presentations


Presentation on theme: "Internet Measurement (and some inference & modeling)"— Presentation transcript:

1 Internet Measurement (and some inference & modeling)
Shivkumar (“Shiv”) Kalyanaraman Rensselaer Polytechnic Institute GOOGLE: “Shiv RPI”

2 Topics Measurement philosophy: why, what, when, where, how?
Some measurement projects & results Techniques: passive & active Packet tracing SNMP Probing Inference and Modeling Tomography & Traffic Matrix Estimation for network engineering Traffic modeling Rocketfuel: inferring topologies from outside ISP networks

3 Why Measurement? We built it, we depend on it, so we must try to understand it … as it works in reality... Measurement gives us the data and basis for this understanding. Modeling, Inference etc to get new understanding & learning from data Complex interactions between protocols not well modeled during their design. Need support for troubleshooting and network management Wide area behavior unpredictable Change is normal

4 Characteristics of the Internet
The Internet is Decentralized (loose confederation of peers) Self-configuring (no global registry of topology) Stateless (limited information in the routers) Connectionless (no fixed connection between hosts) These attributes contribute To the success of Internet To the rapid growth of the Internet … and the difficulty of controlling the Internet! ISP sender receiver

5 Internet Measurement Challenges
Size of the Internet O(100M) hosts, O(1M) routers, O(10K) networks Complexity of the Internet Components, protocols, applications, users Constant change is the norm Web, e-commerce, peer-to-peer, wireless, next? The Internet was not developed with measurement as a fundamental feature Nearly every network operator would like to keep most data on their network private Floyd and Paxson, “Difficulties in Simulating the Internet”, IEEE/ACM Transactions on Networking, 2000.

6 Themes Measurement has been the basis for critical improvements
Without measurement, what do you know? Measurement capability in the Internet is limited The systems not designed to support measurement Measurement tools and infrastructures are few and limited Size, diversity, complexity and change Measurement data presents many challenges Networking researchers need better connections with experts in other domains

7 Operator Philosophy: Tension With IP
Accountability of network resources But, routers don’t maintain state about transfers But, measurement isn’t part of the infrastructure Reliability/predictability of services But, IP doesn’t provide performance guarantees But, equipment is not especially reliable (no “five-9s”) Fine-grain control over the network But, routers don’t do fine-grain resource allocation But, network automatically re-routes after failures End-to-end control over communication But, end hosts and applications adapt to congestion But, traffic may traverse multiple domains of control

8 Network Operations: Measure, Model, and Control
Network-wide “what-if” model Topology/ Configuration Offered traffic Changes to the network measure control Operational network

9 “Operations” Research: Detect, Diagnose, and Fix
Detect: note the symptoms of a problem Periodic polling of link load statistics Active probes measuring performance Customer complaining (via the phone network?) Diagnose: identify the illness Change in user behavior? Router/link failure or policy change? Denial of service attack? Fix: select and dispense the medicine Routing protocol reconfiguration Installation of packet filters Network measurement plays a key role in each step!

10

11 Traffic Measurement: Control vs. Discovery
Discovery: characterizing the network End-to-end characteristics of delay, throughput, and loss Verification of models of TCP congestion control Workload models capturing the behavior of Web users Understanding self-similarity/multi-fractal traffic Control: managing the network Generating reports for customers and internal groups Diagnosing performance and reliability problems Tuning the configuration of the network to the traffic Planning outlay of equipment (routers, proxies, links)

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43 Measurement Techniques

44 Time Scales for Network Operations
Minutes to hours Denial-of-service attacks Router and link failures Serious congestion Hours to weeks Time-of-day or day-of-week engineering Outlay of new routers and links Addition/deletion of customers or peers Weeks to years Planning of new capacity and topology changes Evaluation of network designs and routing protocols

45 Traffic Measurement: SNMP Data
Simple Network Management Protocol (SNMP) Router CPU utilization, link utilization, link loss, … Collected from every router/link every few minutes Applications Detecting overloaded links and sudden traffic shifts Inferring the domain-wide traffic matrix Advantage Open standard, available for every router and link Disadvantage Coarse granularity, both spatially and temporally

46

47 Traffic Measurement: Packet-Level Traces
Packet monitoring IP, TCP/UDP, and application-level headers Collected by tapping individual links in the network Applications Fine-grain timing of the packets on the link Fine-grain view of packet header fields Advantages Most detailed view possible at the IP level Disadvantages Expensive to have in more than a few locations Challenging to collect on very high-speed links Extremely high volume of measurement data

48

49 Extracting Data from IP Packets
TCP TCP TCP Application message (e.g., HTTP response) Many layers of information IP: source/dest IP addresses, protocol (TCP/UDP), … TCP/UDP: src/dest port numbers, seq/ack, flags, … Application: URL, user keystrokes, BGP updates,…

50 Aggregating Packets into Flows
Set of packets that “belong together” Source/destination IP addresses and port numbers Same protocol, ToS bits, … Same input/output interfaces at a router (if known) Packets that are “close” together in time Maximum inter-packet spacing (e.g., 15 sec, 30 sec) Example: flows 2 and 4 are different flows due to time

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71 Summary: Traffic Measurement: Flow-Level Traces
Flow monitoring (e.g., Cisco Netflow) Measurements at the level of sets of related packets Single list of shared attributes (addresses, port #s, …) Number of bytes and packets, start and finish times Applications Computing application mix and detecting DoS attacks Measuring the traffic matrix for the network Advantages Medium-grain traffic view, supported on some routers Disadvantages Not uniformly supported across router products Large data volume, and may slow down some routers Memory overhead (size of flow cache) grows with link speed

72 Summary: Reducing Packet/Flow Measurement Overhead
Filtering: select a subset of the traffic E.g., destination prefix for a customer E.g., port number for an application (e.g., 80 for Web) Aggregation: grouping related traffic E.g., packets/flows with same next-hop AS E.g., packets/flows destined to a particular service Sampling: subselecting the traffic Random, deterministic, or hash-based sampling 1-out-of-n or stratified based on packet/flow size Combining filtering, aggregation, and sampling

73 Summary: Comparison of Techniques
Filtering Aggregation Sampling Precision exact exact approximate constrained a-priori constrained a-priori Generality general Local Processing filter criterion for every object table update for every object only sampling decision Local memory one bin per value of interest none none depends on data depends on data Compression controlled

74

75

76

77

78

79 Inference and Modeling…

80 DATA-DRIVEN…

81

82 Eg: The Network Design Problem
200 65 258 134 30 42 Düsseldorf Frankfurt Berlin Hamburg München Communication Demands Düsseldorf Frankfurt Berlin Hamburg München Potential topology & Capacities

83

84

85 Traffic Modeling …

86

87

88

89

90

91

92

93

94

95

96 Mandelbrot’s Construction
Renewal reward processes and their aggregates Aggregate is made up of many constituents Each constituent is of the on/off type On/off periods have a “duration” Constituents make contributions (“rewards”) when “on” Constituents make no contributions when “off” What can be said about the aggregate? In terms of assumed type of “randomness” for durations and rewards In terms of implied type of “burstiness”

97 Mandelbrot’s Types of “Randomness”
Distribution functions/random variables “Mild” → finite variance (Gaussian) “Wild” → infinite variance Correlation function of stochastic process None => “IID” (independent, identically distributed) “Mild” → short-range dependence (SRD, Markovian) “Wild” → long-range dependence (LRD)

98 Mandelbrot’s Types of “Burstiness”
Bursty BURSTY smooth bursty Mild Wild Distribution function Mild Wild Correlation structure Tail-driven burstiness (“Noah effect”) Dependence-driven burstiness (“Joseph effect”)

99 Type of Burstiness: “Smooth”
CCDF Function 1-F(x) 1-F(x) on log scale x on linear scale Correlation Function r(n) Log-linear scales r(n) on log scale lag n on linear scale

100 Type of Burstiness: “bursty”
CCDF Function 1-F(x) 1-F(x) on log scale x on linear scale Correlation Function r(n) Log-linear scale Log-log scale r(n) on log scale lag n on log scale

101 Type of Burstiness: “Bursty”
CCDF Function 1-F(x) 1-F(x) on log scale x on log scale Correlation Function r(n) Log-log scale Log-linear scale ? r(n) on log scale lag n on linear scale

102 Type of Burstiness: “BURSTY”
CCDF Function 1-F(x) ? 1-F(x) on log scale x on log scale Correlation Function r(n) Log-log scales ? r(n) on log scale lag n on log scale

103 Mandelbrot’s Types of “Burstiness”
Bursty BURSTY smooth bursty Mild Wild Distribution function Mild Wild Correlation structure Tail-driven burstiness (“Noah effect”) Dependence-driven burstiness (“Joseph effect”)

104 Inference For Network Engineering: Traffic Matrix Estimation…

105 Network Engineering: Inference
Reliability analysis Predicting traffic under planned or unexpected router/link failures Traffic engineering Optimizing OSPF weights to minimize congestion Capacity planning Forecasting future capacity requirements Routes change under failures Network engineering? Many people think the network is just a collection of dumb pipes. So do we really need to engineer a collection of dumb pipes? The answer is of course yes. And it takes quite a bit of intelligence in order to run a large IP network. Here are some common network engineering tasks …

106 Traffic Matrix Problem

107

108

109

110

111

112

113

114

115

116

117

118 i.e. # Unknowns > # Equations

119 Naïve Approach Naïve approach: singular value decomposition … In real networks the problem is highly under-constrained

120 Simple Gravity Model y1  x1 x2 y2  x2 x3 y3  x1 x3
Motivated by Newton’s Law of Gravitation Assume traffic between sites is proportional to traffic at each site y1  x1 x2 y2  x2 x3 y3  x1 x3 Assume there is no systematic difference between traffic in different locations Only the total volume matters Could include a distance term, but locality of information is not so important in the Internet as in other networks An alternative approach is gravity modeling. It is motivated by Newton’s law of gravitation and has been commonly used in fields social engineering …

121 Simple Gravity Model Better than naïve, but still not very accurate

122 Generalized Gravity Model
Internet routing is asymmetric Hot potato routing: use the closest exit point Generalized gravity model For outbound traffic, assumes proportionality on per-peer basis (as opposed to per-router) peer links access links A major reason that the simple gravity model works not so well is the asymmetric nature of Internet routing …

123 Generalized Gravity Model
Fairly accurate given that no link constraint is used

124 Tomographic Approach x = AT y Apply the link constraints 1 route 1
router 2 route 3 route 2 3 In gravity modeling, the only information we use is the total load at each node. The tomographic approach instead tries to estimate traffic matrices by applying the link constraints … x = AT y

125 Tomographic Approach Under-constrained linear inverse problem
Find additional constraints based on models Typical approach: use higher order statistics Disadvantages Complex algorithm – doesn’t scale Large networks have nodes, routes Reliance on higher order statistics is not robust given the problems in SNMP data Artifacts, Missing data Violations of model assumptions (e.g. non-stationarity) Relatively low sampling frequency: 1 sample every 5 min Unevenly spaced sample points Not very accurate at least on simulated TM As I mentioned before, the problem is a highly under-constrained linear inverse problem. The natural solution to this is of course to add some constraints …

126 Inference: Network Tomography
From link counts to the traffic matrix Sources 5Mbps 3Mbps 4Mbps 4Mbps Destinations

127 Tomography: Formalizing the Problem
Source-destination pairs p is a source-destination pair of nodes xp is the (unknown) traffic volume for this pair Routing Rlp = 1 if link l is on the path for src-dest pair p Or, Rlp is the proportion of p’s traffic that traverses l Links in the network l is a unidirectional edge yl is the observed traffic volume on this link Relationship: y = Rx (now work back to get x)

128 Tomography: Single Observation is Insufficient
Linear system is underdetermined Number of nodes n Number of links e is around O(n) Number of src-dest pairs c is O(n2) Dimension of solution sub-space at least c - e Multiple observations are needed k independent observations (over time) Stochastic model with src-dest counts Poisson & i.i.d Maximum likelihood estimation to infer traffic matrix Vardi, “Network Tomography,” JASA, March 1996

129

130 Tomography: Challenges
Limitations Cannot handle packet loss or multicast traffic Statistical assumptions don’t match IP traffic Significant error even with large # of samples High computation overhead for large networks Directions for future work More realistic assumptions about the IP traffic Partial queries over subgraphs in the network Incorporating additional measurement data

131 Tomo-gravity “Tomo-gravity” = tomography + gravity modeling
Exploit topological equivalence to reduce problem size Use least-squares method to get the solution, which Satisfies the constraints Is closest to the gravity model solution Can use weighted least-squares to make more robust least square solution Now I can explain how we come up with the name tomo-gravity. Tomo-gravity = tomography + gravity modeling. It tries to take the best of both approaches … gravity model solution constraint subspace

132 Tomo-gravity: Accuracy
Accurate within 10-20% (esp. for large elements)

133 Tomo-gravity Solution
Tomo-gravity infers traffic matrices from widely available measurements of link loads Accurate: especially accurate for large elements Robust: copes easily with data glitches, loss Flexible: extends easily to incorporate more detailed measurements, where available Fast: for example, solves AT&T’s IP backbone network in a few seconds In daily use for AT&T IP network engineering Reliability analysis, capacity planning, and traffic engineering

134 Summary: Tomo-gravity
Tomo-gravity takes the best of both tomography and gravity modeling Simple, and quick A few seconds for whole AT&T backbone Satisfies link constraints Gravity model solutions don’t Uses widely available SNMP data Can work within the limitations of SNMP data Only uses first order statistics  interpolation very effective Limited scope for improvement Incorporate additional constraints from other data sources: e.g., Netflow where available Operational experience very positive In daily use for AT&T IP network engineering Successfully prevented service disruption during simultaneous link failures To summarize, tomo-gravity really works! It takes the best of both tomography and gravity modeling … (end with the story on how we successfully prevented service disruption during disastrous simultaneous link failures)


Download ppt "Internet Measurement (and some inference & modeling)"

Similar presentations


Ads by Google