Presentation is loading. Please wait.

Presentation is loading. Please wait.

Building OC-768 Monitor using GS Tool Vladislav Shkapenyuk Theodore Johnson Oliver Spatscheck June 2009.

Similar presentations


Presentation on theme: "Building OC-768 Monitor using GS Tool Vladislav Shkapenyuk Theodore Johnson Oliver Spatscheck June 2009."— Presentation transcript:

1 Building OC-768 Monitor using GS Tool Vladislav Shkapenyuk Theodore Johnson Oliver Spatscheck June 2009

2 GS Tool Overview High performance data stream processing –Monitor network traffic using SQL queries. –Rapid development of complex monitoring systems With automated complex optimizations Joint project between Database, Networking Research –New model of database processing Which became popular around that time. –Overcame snorts of derision Currently, standard DPI solution in AT&T.

3 Current DPI Deployments Backbone router monitoring –Mandatory for OC-768 –2 OC768 probes in NY54, one in Orlando and St. Louis –Emergency deployments debug routing problems (Incorrect packet forwarding) SNMP pollers hitting routers too hard ICDS Probes –Using OC768 probe in New York –Track RTT and loss rates for all eyeballs DNS monitoring –Identify performance problems, attacks DPI/Mobility –2 probes monitoring mobility data, soon will have 8 Each observes 700k packets/sec. 2 GB/minute, 3 TB/day, 1 PB/year –Will monitor mobility traffic in core network whenever Mobility will make transition

4 DPI Deployments (cont.) DPI/DSL –22 probes monitoring access routers –E.g. usage billing for heavy users (Reno) DPI/Lightspeed –Debugging Microsoft software –Proactive video quality management Smersh –Monitor all FP internet traffic Many Others in Deployment or Trial –EMEA VPN, Project Madison, Project Gemini

5 GS Tool Architecture Lightweight architecture –Generated code –Templetized operators Avoid data copies Early data reduction –Prefilter –Low-level queries Filtering, transformation, (partial) aggregation. –Push operators into the NIC Network Interface Card (filtering and transform) Ring buffer prefilter q1q2q3q4 RTS Q1 Q2 Q3 Q4

6 Scaling Issues Exploding data rates –GS architecture worked well for up to OC-48 rates (2.5 Gbit/sec) –10Geth is a new standard need automatic parallelization for multi-core Gigascope –OC-768 links carry 2x40 Gbit/sec traffic (112 million packets/sec) 26 cycles per tuple on 4Ghz machine exceeds the capabilities of modern computer buses need traffic partitioning / cluster computing Exploding query sets –I used to think that 50 queries is a lot –Now up to 170 on some major apps. Exploding query complexity –Complex joins –Complex aggregation group definition (TCP flows, etc) –Alerting

7 CRS-1 R 90 10 LLGX OC-768->OC-192 Gigascope Splitters jumper Infrastructure T3 1 OC-768 link (T and R fibers) CENTRAL OFFICE ULH 90 Gigascope 2 OC-768 fibers OC-192 Avici T 4 10GE links Data Warehouse Users in same rack 8 10GE fibers n<=8 10GE fibers, with ends hanging in rack, ready to be connected to Gigascope ports, to analyze 10GE data 1 Gigascope = CPU with OC- 192/10GE card, FE port, and Gigascope application Gigascope would be deployed on only one end of each of the two OC-768’s planned in CI. Management Router FE CBB NM Cabinet OOB connectivity Graph provided by Fred Imbrogno

8 OC-768 Probe Architecture

9 First Prototype – Four Dwarfs

10 Current Version Splitter BoxSplitter Box (back)

11 OC-768 Probe in Detail (cont.) GS Management Cabinet GS OC-768 Probe

12 Technical Challenges Liveness management Multi-query optimization –avoid cost of invoking queries needlessly –combining common predicates Load shedding –query-aware load shedding that returns meaningful results Distributed query optimization –query-aware stream partitioning for distributed evaluation Pushing operators into NIC and below –using TCAMs for filtering, sampling and partitioning

13 Earlier Data Reduction Prefilter –Move predicates below query invocation Avoid cost of invoking queries that will reject the record –Many queries, high packet rate => procedure invocation is expensive. Example of multi-query optimization –goes beyond sharing common predicates NIC q1q3 q4q2 Q1 Q2 Q3 Prefilter

14 Step 1 : Gather cheap predicates Select time, timestamp, len, caplen, raw_packet_data From [link1].packet Where len <= 47 Select @Host, @Name, time, len, timewarp, raw_packet_data From IPV4 Where MPLS_is_valid_1 = FALSE and ipversion = 4 and ( // IP + PPP type hdr_length > len-2 or total_length > len-2) PredicateAppears In len <= 471 MPLS_is_valid_1 = FALSE2 ipversion = 42 hdr_length > len-2 or total_length > len-22

15 Step 2: Assign Predicates to Bits PredicateBit len <= 470 MPLS_is_valid_1 = FALSE1 ipversion = 42 hdr_length > len-2 or total_length > len-23 Step 3: Assign signatures to queries QuerySignature 11 214

16 DSL Query Set Doubled maximum throughput on DSL query set

17 Load Shedding Best to never drop input records –Sometimes we have no choice (during DDoS attack) Second best is to shed load gracefully Problem: query-oblivious load shedding destroys query results Solution: Analyze the query set to determine an acceptable sampling strategy Select tb, SrcIP, DstIP, SrcPort, DstPort, count(*), OR(flags) From TCP Group by time/60 as tb, SrcIP, DstIP, SrcPort, DstPort

18 Query-Aware Load Shedding Sampling methods Per-tuple Per-group Strongly Compatible –Output of the query over a sample is a sample of the output of the query over the entire stream Weakly Compatible –“good approximations” OK for aggregates (e.g., SUM has a good approximation). Per-group sampling –Define the group as a subset of the partitioning fields Aggregation : group-by fields Join – equi-join fields. Reconcile the sampling choices at the leaves

19 Reconciliation Select R.a, S.b From R, S Where R.c = S.d Select c, e, count(*) From R Group By c, e Select a, c, e From TCP Where Pr Select b, d From TCP Where Ps Q1 Q2 R S Q1Q2 RS d cc, e dc

20 Why Reconciliation Isn’t Trivial

21 Distributed Query Optimization The OC-768 monitor partitions 40Gbit/s streams into 4 10Gbit/s streams –partitioning is done by Hydra in hardware only one shot at partitioning –each GS server monitors single10GE link Query evaluation plan depends on the method of partitioning –result reintegration is cheap if partitioning is compatible some partitioning strategies are much better than others Example: traffic flow computation on 8 nodes SELECT tb, srcIP, destIP, srcPort, destPort, count(*) as cnt, sum (len), OR_AGGR(flags), other_stats … FROM TCP GROUP BY time/10 as tb, srcIP,destIP,srcPort,destPort; –in the worst case a single flow will result in 8 partial flows –partial results need to be combined --> load on final aggregator node can exceed the load on centralized system

22 Distributed Multi-query Optimization Straightforward to compute good partitioning for simple individual queries –similar to finding compatible sampling method –prior work in parallel database and networking community We run a query set, not a single query –Network monitoring query sets are large and complex -> conflicting partitioning requirements –may not be possible to be compatible with all queries -> need to choose globally optimal How to find an optimal partitioning strategy for the query set –Repartitioning is not acceptable.

23 Reconcile This

24 Query-aware partitioning Find the best partitioning set for each query node –Partitioning set : attributes used in the partitioning hash function Reconcile partitioning sets –Find largest partitioning set compatible with two queries. for simple cases reconciled set is an intersection of two sets Find an optimal partitioning set for all queries –No single set likely to be compatible with all queries. –Dynamic programming algorithm to efficiently search the space of solutions > Goal: minimize the cost of copying tuples across network

25 Query-oblivious vs Query-aware Partitioning Query-oblivious Query-aware

26 Live OC768 Feed Experiments OC768 monitor deployed in NY54 switching center. Live internet backbone traffic captured using an OC768 router optical splitter. –OC-768 Traffic is split into four 10GigEth lines –each line is monitored by a separate dual-core 3GHz Xeon servers w/ 4 GB of RAM, Linux 2.4.21 –servers connected using Gigabit LAN One direction of OC768 link monitored, observed approximately 1.6 millions packets per second (about 7.3 Gbit/sec).

27 Computing HTTP Traffic Flows

28 Computing TCP Jitter (self-join)

29 Even Earlier Data Reduction Push predicates and projection operators into the NIC. Adapt operators to NIC capabilities –Snap length: transfer only the initial portion of a packet to the server –TCAM: high speed classification hardware Push common predicates, sampling, and partitioning into the NIC. Even below the NIC? NIC q1q3 q4q2 Q1 Q2 Q3 Prefilter

30 TCAM Programming Ternary Content Addressable Memory –each memory line stores a rules with associated action Relatively easy to program, can assign actions whenever we have a match in a CAM –we are using it to implement partitioning, sampling and tuple filtering –also possible on the 10Geth card –Enable parallel processing at the low level Need to translate partition set into TCAM rules –Also selection predicates and sampling rules

31 Conclusions First and only OC768 monitor in existence –glory and the pain of all first adopters DSMS technology is necessary to scale to the levels of data and query complexity –multi-query optimization –intelligent sampling for load shedding –distributed query optimization –TCAM programming … Successfully deployed in AT&T core network –many techniques developed for OC768 are now widely used in Mobility, ICDS and other deployments


Download ppt "Building OC-768 Monitor using GS Tool Vladislav Shkapenyuk Theodore Johnson Oliver Spatscheck June 2009."

Similar presentations


Ads by Google