Presentation is loading. Please wait.

Presentation is loading. Please wait.

Real Time Network Policy Checking using Header Space Analysis

Similar presentations


Presentation on theme: "Real Time Network Policy Checking using Header Space Analysis"— Presentation transcript:

1 Real Time Network Policy Checking using Header Space Analysis
Presented by Nikhil, Jiham, Aagam

2 Background: Network Debugging is Hard !
Forwarding state is hard to analyze because: Distributed across multiple tables and boxes Written to network by multiple independent writers (different protocols, network admins) Presented in different formats by vendors Forwarding state of network is hard to analyze Set of forwarding state of network is a set of forwarding rules installed in cables, switches, routers and other networking boxes that overall decide how an incoming packet is processed by the boxes and sent to an output port And these forwarding rules are distributed across multiple cables and multiple boxes that overall determine the end to end behavior of network Forwarding state is hard to analyze because, first of all, it’s distributed across multiple tables and boxes and as a result, harder to understand overall system of behavior Second, it’s written to network by multiple independent writers such as different instances, protocols, or even manually by network admin and these independently written states may interact in complex ways that may have unforseen results Also the forwarding state is presented in different formats by different vendors which make it harder to understand by human beings Finally, the forwarding state is not directly controllable or observable by network admin As you can see, overall, it is not constructed in a way that lend itself well to checking and verification That is why network debugging is hard

3 Prior Work : Header Space Analysis
For network analysis, these simples questions need to be answered : Can A talk to B? What are all the packet headers from A that can reach B? Loops? Isolation? Protocol independent, general. Some prior work that the actual authors of the paper have created sets the foundation for NetPlumber. It is called Header Space Analysis and it provides a uniform, vendor-independent and protocol-agnostic model of the network using geometric model of packet processing.

4 Header Space Analysis : Snapshot-based Checking
Header Space Analysis is used to create a model of the network and use that to check network properties

5 Header Space Analysis : Snapshot-based Checking
So the tool gets a complete snapshot of the forwarding states of a network

6 Header Space Analysis : Snapshot-based Checking
Convert that into format for each box using a transfer function and then uses that model to check properties like

7 Header Space Analysis : Snapshot-based Checking
“can host a talk to host b?” Or is there any forwarding loop in the network?

8 Stream-based Checking
We know that network changes happen all the time so rules may get added, deleted, or even a batch of rules can get added and deleted at the same time . as a result, the network state may change from one state to the other and these changes have potential of breaking things and causing policy violations Time

9 Stream-based Checking
So the authors of the paper wanted to try a new approach of network verification that tries to verifies a stream of network updates in real time and make sure that it is not violating any of the network policies or invariants More specifically, they wanted to design a new model of the network where they can apply this new stream of updates along with policies and invariants that can check on the network and then system gives us a yes or no constantly to see if they are getting violated Then ideally, we can prevent errors before they happen or they can raise an alarm if something goes wrong in the network Time

10 Stream-based Checking
Time

11 Stream-based Checking
Time

12 Real-time Policy Checking
Prevent errors before they hit network Report a violation as soon as it happens

13 Related Work Programming foundations
Frenetic: provides high-level abstractions to achieve per- packet and perflow consistency during network updates Offline checking Rcc: verifies BGP configurations NICE: applies model checking to find bugs in OpenFlow programs HSA: checks data plane correctness against invariants Anteater: boolean, SAT solvers for modeling / checking Other related work include Frenetic, which verifies network updates for per-packet and per-flow consistency, as opposed to NetPlumber verifying forwarding policies Many offline checking tools but main problem is that it cannot prevent bugs from damaging the network unless the check is run

14 Related Work Online Monitoring
OFRewind: captures and reproduces problematic OpenFlow command sequence ATPG: monitors network by periodically sending test packets NDB: network debugger VeriFlow: most closely to NetPlumber Verifies policies in realtime Uses trie structure to search rules Determines affected ECs and updates forwarding graph for that class Similar runtime performance as NetPlumber Online Monitoring tools help troubleshoot network programs at run-time These tools can complement but not replace the need for real-time policy verification VeriFlow is most closely like NetPlumber. It verifies the compliance of network updates with specified polciies in real time, however NetPlumber additionally can verify arbitrary header modifcations, including rewriting and encapsulation. NetPlumber is also protocol independent.

15 Outline NetPlumber : Real time policy checking tool Evaluation
How it works? How to check policy? How to parallelize? Evaluation

16 NetPlumber : Real time policy Checking Tool
Observe state changes (installation or removal of rules, link up or down events) Update event (NetPlumber in turn updates the internal model of the network) Check policies Deploying NetPlumber as a policy checker in SDNs

17 NetPlumber : Approach to conventional networks
SNMP Trap NetPlumber

18 Plumbing Graph : Nodes and Edges
Plumbing graph captures all possible paths of flows through the network. Nodes: forwarding rules in the network. Directed Edges: next hop dependency of rules. Rule A has a next hop dependency to rule B if There is a physical link from A’s box to B’s box; and A.range has an intersection with B.domain.

19 Plumbing Graph : Nodes and Edges
Intra-table dependency of rules Plumbing graph needs to consider rule priorities. Each rule node keeps track of higher priority rules in the same table. It calculates the domain of each higher priority rule, subtracting it from its own domain.

20 Plumbing Graph Plumbing graph of a simple network consisting of 4 switches each with one table

21 NetPlumber : Source and Probe nodes
To compute reachability, NetPlumber inserts flow from the source port into the plumbing graph and propagates it towards the destination. Source Node : “flow generator”, all-wildcard headers. Sink Nodes: the dual of source nodes. Probe Node : “flow monitor”. can be attached to appropriate locations of the plumbing graph.

22 NetPlumber : Computing Reachability
Finding reachability between S and P.

23 NetPlumber : Updating State
As events occur in the network, NetPlumber needs to update its plumbing graph and re-route the flows. Such events include: Adding new rules Deleting rules Link up Link down Adding new tables Deleting tables

24 NetPlumber : Updating State
Adding rule 1.2 (shaded in green) to table 1. Results: 3 new pipes 1 new intra-table dependency New flows added (highlighted in bold) Flows deleted

25 NetPlumber : Updating State
Complexity for the addition or deletion of a single rule: O(r+spd) r: entry # in each table, s: source node #, p: pipe # connected, d: network diameter

26 NetPlumber Each flow at any point in the plumbing graph, carries its complete history. By traversing backward, we can examine the entire history of the flow; all the rules that have processed this flow along the path.

27 NetPlumber—Checking Policies
Each probe node is configured with: a filter flowexp, which constrains the set of flows that should be examined by the probe node, and a test flowexp, which is the constraint that is checked on the matching flows. ∀𝑓𝑓~𝑓𝑖𝑙𝑡𝑒𝑟:𝑓~𝑡𝑒𝑠𝑡 All flows which satisfy the filterexp, satisfy the testexp as well. ∃𝑓𝑓~𝑓𝑖𝑙𝑡𝑒𝑟:𝑓~𝑡𝑒𝑠𝑡 There exist a flow that satisfies both the filter and test exps.

28 Flowexp Language Grammar

29 NetPlumber—Checking Policies
Policy: Guests can not access server S.

30 NetPlumber—Checking Policies
Policy: http traffic from client C to server S doesn’t go through more than 4 hops.

31 NetPlumber—Checking Policies
Policy: traffic from client C to server S should go through middle box M.

32 Why Dependency Graph Helps
Incremental update Only have to trace through dependency subgraph affected by an update. Flexible policy expression Probe and source nodes are flexible to place and configure. Parallelization Can partition dependency graph into clusters to minimize inter- cluster dependencies.

33 Distributed NetPlumber

34 Distributed NetPlumber
Each instance of NetPlumber is responsible for checking a subset of rules that belong to one cluster (i.e. a FEC). Rules that belong to more than one cluster will be replicated on all the instances they interact with. Probe nodes are replicated on all instances to ensure global verification. The final result will be ready after the last instance is done with its job.

35 Experiment on Google WAN
Google Inter-datacenter WAN Largest deployed SDN, running OpenFlow Around 143,000 of rules Google WAN is an Inter-datacenter network for Google connecting Google data centers worldwide. It is the largest deployed SDN, running OpenFlow and has about 143,000 rules and you can see the topology diagramed here

36 Experiment on Google WAN
Policy Check: all 52 edge switches can talk to each other More than 2500 pairwise reachability check Used two snapshots taken 6 weeks apart Used the first snapshot to create initial NetPlumber state and used the diff as a sequential update So the policy check on this network is that all 52 edge switches can talk to each other, which translates into more than 2500 pairwise reachability check. We check for that when we connect a source and problem? To each of those edge switches, the source injecting We take two snapshots taken 6 weeks apart and use the first snapshot to load NetPlumber and the diff as incremental update And verify that during that time, the reachability has been maintained

37 Experiment on Google WAN
Google WAN is more interesting in a scale point of view, so we can see that here. This graph shows the runtime of NetPlumber on Google WAN. The x axis shows the runtime in milliseconds and Y axis is CDF of runtime pairrule update As you can see, about 60 percent of rule changes can be verified in less than 1 millisecond and 95% of updates can be verified in 10 milliseconds There is this tail that take longer to verify, and these are the default or aggregate rules that are added and deleted simultaneously in the dependency graph Point of reference, is that if we wanted to use Hassel, a different real time checker, it would take Hassel 100s minimum Another observation is that by increasing the number of instances of NetPlumber, the runtime starts getting better and beter beyond some point After about 5 instances of Google WAN the runtime does not get better since the dependecy graph of Google has around 5 natural clusters and if you try to go beyond that, you replicate these rules in the clusters and we are not getting much benefit from that policy verification

38 Benchmarking Experiment
For a single pairwise reachability check They had a benchmarking test for a single reachability pairwise reachability check across three networks: Google, stanford, internet 2 As you can see, the add rule update time for all networks is well under 1 milisecond, but adding link update time takes a few seconds The reason for that is because moving links change many things in the dependency graph so it takes longer to verify That should be okay because link changes are not that frequent in the network so we can stil use NetPlumber for most of the network However, if the network aims to be energy efficient so they want to turn the links on and off, then NetPlumber may not be a good tool

39 Limitations Relies on reading state of network devices, so cannot model middleboxes with dynamic state Greater processing time for verifying link updates Not suitable for networks with high rate of link up/down events

40 Conclusions Designed a protocol-independent system for real time network policy checking Key component: dependency graph of forwarding rule, capturing all flow paths Incremental update Flexible policy expressions Parallelization by clustering

41 AVANT-GUARD: Scalable and Vigilant Switch Flow Management in Software-Defined Networks
Reference Paper

42 Avant-Guard 2 Primary functions of Avant-Guard:
Connection Migration - Mitigates the risk of DoS attacks by reducing the number of interactions between the data plane and the control plane Actuating Triggers - Collects network information and inserts conditional flow rules when certain criteria are met

43 Avant-Guard

44 Attack Scenario: SYN Flood Attack
Standard Handshake: SYN Flood:

45 SYN Cookies When the server receives a TCP SYN packet, it responds with a TCP SYN+ACK packet with a carefully crafted sequence number based on connection parameters t, m, and s: t - Time of connection m - Maximum Segment Size (MSS) s - Hash of src IP, src port, dest IP, dest port, and t When the client responds, the server can reconstruct the SYN queue entry by validating the sequence number No connection state is stored on the server until the handshake is complete

46 Connection Migration Classification Report Migration Relay

47 Connection Migration: Classification

48 Connection Migration: Classification

49 Connection Migration Classification Report Migration Relay

50 Connection Migration: Report
After a connection is validated in the Classification stage, the Report stage checks if the connection has any corresponding flow rules If the session does not have any flow rules, the data plane forwards this flow request to the control plane If the control plane approves, the connection is moved to the next stage

51 Connection Migration Classification Report Migration Relay

52 Connection Migration: Migration
The Connection Migration (CM) module initiates a TCP 3-way handshake with the destination host If the handshake is successful, the data plane reports this to the control plane Even if the handshake is not successful (due to an unavailable host or closed port), the results are reported to the control plane

53 Connection Migration Classification Report Migration Relay

54 Connection Migration: Relay
The data plane relays all TCP data packets between the source and destination

55 Connection Migration Classification Report Migration Relay

56 Connection Migration: Example

57 Actuating Triggers Avant-Guard provides triggers that allow the data plane to report network status and payload information to the control plane The triggers can also activate flow rules when specific conditions are met The actuating triggers have four steps: Defining trigger conditions on the control plane Registering conditions with the data plane Checking if conditions are met on the data plane Trigger a call back event or insert a flow rule if a condition is met

58 Actuating Triggers

59 Actuating Triggers: Defining a Condition
There are three types of trigger conditions: Payload-based Traffic-rate-based Rule-activation When creating a trigger condition, the control plane defines the type, sets the condition, and includes a pointer to a predefined flow rule

60 Actuating Triggers After a condition is defined, the control plane registers the condition with the data plane If the conditions are met, the data plane notifies the control plane (using a “trigger” extension to OpenFlow written by the authors)

61 Attack Scenario 1: DDoS

62 Attack Scenario 2: Malicious Payload

63 Attack Scenario 3: Nmap Without Avant-Guard: With Avant-Guard:

64 Evaluation Connection Migration Overhead: Actuating Triggers Overhead:
Average OpenFlow connection establishment delay: us Average Avant-Guard delay: us (0.626% increase) Actuating Triggers Overhead:

65 Avant-Guard: Conclusion
2 Primary functions of Avant-Guard: Connection Migration - Mitigates the risk of DoS attacks by reducing the number of interactions between the data plane and the control plane Actuating Triggers - Collects network information and inserts conditional flow rules when certain criteria are met


Download ppt "Real Time Network Policy Checking using Header Space Analysis"

Similar presentations


Ads by Google