Building OC-768 Monitor using GS Tool Vladislav Shkapenyuk Theodore Johnson Oliver Spatscheck June 2009.

Slides:



Advertisements
Similar presentations
Logically Centralized Control Class 2. Types of Networks ISP Networks – Entity only owns the switches – Throughput: 100GB-10TB – Heterogeneous devices:
Advertisements

A Search Memory Substrate for High Throughput and Low Power Packet Processing Sangyeun Cho, Michel Hanna and Rami Melhem Dept. of Computer Science University.
A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
Bio Michel Hanna M.S. in E.E., Cairo University, Egypt B.S. in E.E., Cairo University at Fayoum, Egypt Currently is a Ph.D. Student in Computer Engineering.
3/8/2012Data Streams: Lecture 151 CS 410/510 Data Streams Lecture 15: How Soccer Players Would do Stream Joins & Query-Aware Partitioning for Monitoring.
Scalable Flow-Based Networking with DIFANE 1 Minlan Yu Princeton University Joint work with Mike Freedman, Jennifer Rexford and Jia Wang.
By Aaron Thomas. Quick Network Protocol Intro. Layers 1- 3 of the 7 layer OSI Open System Interconnection Reference Model  Layer 1 Physical Transmission.
1 In VINI Veritas: Realistic and Controlled Network Experimentation Jennifer Rexford with Andy Bavier, Nick Feamster, Mark Huang, and Larry Peterson
Engine Design: Stream Operators Everywhere Theodore Johnson AT&T Labs – Research Contributors: Chuck Cranor Vladislav Shkapenyuk.
15-441: Computer Networking Lecture 26: Networking Future.
Snort - an network intrusion prevention and detection system Student: Yue Jiang Professor: Dr. Bojan Cukic CS665 class presentation.
TCP Splicing for URL-aware Redirection
A Heartbeat Mechanism and its Application in Gigascope Johnson, Muthukrishnan, Shkapenyuk, Spatscheck Presented by: Joseph Frate and John Russo.
Applications : Network Monitoring Theodore Johnson AT&T Labs – Research Contributors: Chuck Cranor Vladislav Shkapenyuk Oliver.
Bandwidth DoS Attacks and Defenses Robert Morris Frans Kaashoek, Hari Balakrishnan, Students MIT LCS.
Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.
Connecting LANs, Backbone Networks, and Virtual LANs
Lucent Technologies – Proprietary Use pursuant to company instruction Learning Sequential Models for Detecting Anomalous Protocol Usage (work in progress)
Composing Software Defined Networks Jennifer Rexford Princeton University With Joshua Reich, Chris Monsanto, Nate Foster, and.
Workpackage 3 New security algorithm design ICS-FORTH Paris, 30 th June 2008.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Software-Defined Networks Jennifer Rexford Princeton University.
Common Devices Used In Computer Networks
Networking & the Internet. 2 What is a Network? □ A computer network allows computers to communicate with many other computers and to share resources.
© 2006 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 Identifying Application Impacts on Network Design Designing and Supporting Computer.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
POSTECH DP&NM Lab. Internet Traffic Monitoring and Analysis: Methods and Applications (1) 5. Passive Monitoring Techniques.
QoS Support in High-Speed, Wormhole Routing Networks Mario Gerla, B. Kannan, Bruce Kwan, Prasasth Palanti,Simon Walton.
Scalable and Efficient Data Streaming Algorithms for Detecting Common Content in Internet Traffic Minho Sung Networking & Telecommunications Group College.
NetFlow: Digging Flows Out of the Traffic Evandro de Souza ESnet ESnet Site Coordinating Committee Meeting Columbus/OH – July/2004.
Heartbeat Mechanism and its Applications in Gigascope Vladislav Shkapenyuk (speaker), Muthu S. Muthukrishnan Rutgers University Theodore Johnson Oliver.
Multiple Aggregations Over Data Streams Rui ZhangNational Univ. of Singapore Nick KoudasUniv. of Toronto Beng Chin OoiNational Univ. of Singapore Divesh.
Vladimír Smotlacha CESNET Full Packet Monitoring Sensors: Hardware and Software Challenges.
Networking & the Internet. 2 What is a Network? □ A computer network allows computers to communicate with many other computers and to share resources.
Wire Speed Packet Classification Without TCAMs ACM SIGMETRICS 2007 Qunfeng Dong (University of Wisconsin-Madison) Suman Banerjee (University of Wisconsin-Madison)
ECE 526 – Network Processing Systems Design Packet Processing I: algorithms and data structures Chapter 5: D. E. Comer.
Securing and Monitoring 10GbE WAN Links Steven Carter Center for Computational Sciences Oak Ridge National Laboratory.
LAN Switching and Wireless – Chapter 1 Vilina Hutter, Instructor
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 ECSE-6600: Internet Protocols Informal Quiz #14 Shivkumar Kalyanaraman: GOOGLE: “Shiv RPI”
Jennifer Rexford Princeton University MW 11:00am-12:20pm Measurement COS 597E: Software Defined Networking.
Trajectory Sampling for Direct Traffic Oberservation N.G. Duffield and Matthias Grossglauser IEEE/ACM Transactions on Networking, Vol. 9, No. 3 June 2001.
4/19/20021 TCPSplitter: A Reconfigurable Hardware Based TCP Flow Monitor David V. Schuehler.
Programming Languages for Software Defined Networks Jennifer Rexford and David Walker Princeton University Joint work with the.
Chapter2 Networking Fundamentals
Department of Computer Science and Engineering Applied Research Laboratory Architecture for a Hardware Based, TCP/IP Content Scanning System David V. Schuehler.
Efficient Cache Structures of IP Routers to Provide Policy-Based Services Graduate School of Engineering Osaka City University
Workpackage 3 New security algorithm design ICS-FORTH Ipswich 19 th December 2007.
Yaping Zhu with: Jennifer Rexford (Princeton University) Aman Shaikh and Subhabrata Sen (ATT Research) Route Oracle: Where Have.
CS 740: Advanced Computer Networks IP Lookup and classification Supplemental material 02/05/2007.
An Efficient Gigabit Ethernet Switch Model for Large-Scale Simulation Dong (Kevin) Jin.
Jennifer Rexford Princeton University MW 11:00am-12:20pm SDN Programming Languages COS 597E: Software Defined Networking.
Packet Classification Using Multidimensional Cutting Sumeet Singh (UCSD) Florin Baboescu (UCSD) George Varghese (UCSD) Jia Wang (AT&T Labs-Research) Reviewed.
Sven Ubik, Aleš Friedl CESNET TNC 2009, Malaga, Spain, 11 June 2009 Experience with passive monitoring deployment in GEANT2 network.
1 Netflow Collection and Aggregation in the AT&T Common Backbone Carsten Lund.
Streaming Data Warehouses Theodore Johnson
BUFFALO: Bloom Filter Forwarding Architecture for Large Organizations Minlan Yu Princeton University Joint work with Alex Fabrikant,
1 Out of Order Processing for Stream Query Evaluation Jin Li (Portland State Universtiy) Joint work with Theodore Johnson, Vladislav Shkapenyuk, David.
Gigascope A stream database for network monitoring
S. Sudarshan CS632 Course, Mar 2004 IIT Bombay
Jennifer Rexford Princeton University
Large-scale file systems and Map-Reduce
Advanced Computer Networks
Data Streaming in Computer Networking
A Deterministic End to End Performance Verification Architecture
SONATA: Query-Driven Network Telemetry
CS 31006: Computer Networks – The Routers
Streaming Sensor Data Fjord / Sensor Proxy Multiquery Eddy
Packet Classification Using Coarse-Grained Tuple Spaces
Programmable Networks
IP Control Gateway (IPCG)
Presentation transcript:

Building OC-768 Monitor using GS Tool Vladislav Shkapenyuk Theodore Johnson Oliver Spatscheck June 2009

GS Tool Overview High performance data stream processing –Monitor network traffic using SQL queries. –Rapid development of complex monitoring systems With automated complex optimizations Joint project between Database, Networking Research –New model of database processing Which became popular around that time. –Overcame snorts of derision Currently, standard DPI solution in AT&T.

Current DPI Deployments Backbone router monitoring –Mandatory for OC-768 –2 OC768 probes in NY54, one in Orlando and St. Louis –Emergency deployments debug routing problems (Incorrect packet forwarding) SNMP pollers hitting routers too hard ICDS Probes –Using OC768 probe in New York –Track RTT and loss rates for all eyeballs DNS monitoring –Identify performance problems, attacks DPI/Mobility –2 probes monitoring mobility data, soon will have 8 Each observes 700k packets/sec. 2 GB/minute, 3 TB/day, 1 PB/year –Will monitor mobility traffic in core network whenever Mobility will make transition

DPI Deployments (cont.) DPI/DSL –22 probes monitoring access routers –E.g. usage billing for heavy users (Reno) DPI/Lightspeed –Debugging Microsoft software –Proactive video quality management Smersh –Monitor all FP internet traffic Many Others in Deployment or Trial –EMEA VPN, Project Madison, Project Gemini

GS Tool Architecture Lightweight architecture –Generated code –Templetized operators Avoid data copies Early data reduction –Prefilter –Low-level queries Filtering, transformation, (partial) aggregation. –Push operators into the NIC Network Interface Card (filtering and transform) Ring buffer prefilter q1q2q3q4 RTS Q1 Q2 Q3 Q4

Scaling Issues Exploding data rates –GS architecture worked well for up to OC-48 rates (2.5 Gbit/sec) –10Geth is a new standard need automatic parallelization for multi-core Gigascope –OC-768 links carry 2x40 Gbit/sec traffic (112 million packets/sec) 26 cycles per tuple on 4Ghz machine exceeds the capabilities of modern computer buses need traffic partitioning / cluster computing Exploding query sets –I used to think that 50 queries is a lot –Now up to 170 on some major apps. Exploding query complexity –Complex joins –Complex aggregation group definition (TCP flows, etc) –Alerting

CRS-1 R LLGX OC-768->OC-192 Gigascope Splitters jumper Infrastructure T3 1 OC-768 link (T and R fibers) CENTRAL OFFICE ULH 90 Gigascope 2 OC-768 fibers OC-192 Avici T 4 10GE links Data Warehouse Users in same rack 8 10GE fibers n<=8 10GE fibers, with ends hanging in rack, ready to be connected to Gigascope ports, to analyze 10GE data 1 Gigascope = CPU with OC- 192/10GE card, FE port, and Gigascope application Gigascope would be deployed on only one end of each of the two OC-768’s planned in CI. Management Router FE CBB NM Cabinet OOB connectivity Graph provided by Fred Imbrogno

OC-768 Probe Architecture

First Prototype – Four Dwarfs

Current Version Splitter BoxSplitter Box (back)

OC-768 Probe in Detail (cont.) GS Management Cabinet GS OC-768 Probe

Technical Challenges Liveness management Multi-query optimization –avoid cost of invoking queries needlessly –combining common predicates Load shedding –query-aware load shedding that returns meaningful results Distributed query optimization –query-aware stream partitioning for distributed evaluation Pushing operators into NIC and below –using TCAMs for filtering, sampling and partitioning

Earlier Data Reduction Prefilter –Move predicates below query invocation Avoid cost of invoking queries that will reject the record –Many queries, high packet rate => procedure invocation is expensive. Example of multi-query optimization –goes beyond sharing common predicates NIC q1q3 q4q2 Q1 Q2 Q3 Prefilter

Step 1 : Gather cheap predicates Select time, timestamp, len, caplen, raw_packet_data From [link1].packet Where len <= time, len, timewarp, raw_packet_data From IPV4 Where MPLS_is_valid_1 = FALSE and ipversion = 4 and ( // IP + PPP type hdr_length > len-2 or total_length > len-2) PredicateAppears In len <= 471 MPLS_is_valid_1 = FALSE2 ipversion = 42 hdr_length > len-2 or total_length > len-22

Step 2: Assign Predicates to Bits PredicateBit len <= 470 MPLS_is_valid_1 = FALSE1 ipversion = 42 hdr_length > len-2 or total_length > len-23 Step 3: Assign signatures to queries QuerySignature

DSL Query Set Doubled maximum throughput on DSL query set

Load Shedding Best to never drop input records –Sometimes we have no choice (during DDoS attack) Second best is to shed load gracefully Problem: query-oblivious load shedding destroys query results Solution: Analyze the query set to determine an acceptable sampling strategy Select tb, SrcIP, DstIP, SrcPort, DstPort, count(*), OR(flags) From TCP Group by time/60 as tb, SrcIP, DstIP, SrcPort, DstPort

Query-Aware Load Shedding Sampling methods Per-tuple Per-group Strongly Compatible –Output of the query over a sample is a sample of the output of the query over the entire stream Weakly Compatible –“good approximations” OK for aggregates (e.g., SUM has a good approximation). Per-group sampling –Define the group as a subset of the partitioning fields Aggregation : group-by fields Join – equi-join fields. Reconcile the sampling choices at the leaves

Reconciliation Select R.a, S.b From R, S Where R.c = S.d Select c, e, count(*) From R Group By c, e Select a, c, e From TCP Where Pr Select b, d From TCP Where Ps Q1 Q2 R S Q1Q2 RS d cc, e dc

Why Reconciliation Isn’t Trivial

Distributed Query Optimization The OC-768 monitor partitions 40Gbit/s streams into 4 10Gbit/s streams –partitioning is done by Hydra in hardware only one shot at partitioning –each GS server monitors single10GE link Query evaluation plan depends on the method of partitioning –result reintegration is cheap if partitioning is compatible some partitioning strategies are much better than others Example: traffic flow computation on 8 nodes SELECT tb, srcIP, destIP, srcPort, destPort, count(*) as cnt, sum (len), OR_AGGR(flags), other_stats … FROM TCP GROUP BY time/10 as tb, srcIP,destIP,srcPort,destPort; –in the worst case a single flow will result in 8 partial flows –partial results need to be combined --> load on final aggregator node can exceed the load on centralized system

Distributed Multi-query Optimization Straightforward to compute good partitioning for simple individual queries –similar to finding compatible sampling method –prior work in parallel database and networking community We run a query set, not a single query –Network monitoring query sets are large and complex -> conflicting partitioning requirements –may not be possible to be compatible with all queries -> need to choose globally optimal How to find an optimal partitioning strategy for the query set –Repartitioning is not acceptable.

Reconcile This

Query-aware partitioning Find the best partitioning set for each query node –Partitioning set : attributes used in the partitioning hash function Reconcile partitioning sets –Find largest partitioning set compatible with two queries. for simple cases reconciled set is an intersection of two sets Find an optimal partitioning set for all queries –No single set likely to be compatible with all queries. –Dynamic programming algorithm to efficiently search the space of solutions > Goal: minimize the cost of copying tuples across network

Query-oblivious vs Query-aware Partitioning Query-oblivious Query-aware

Live OC768 Feed Experiments OC768 monitor deployed in NY54 switching center. Live internet backbone traffic captured using an OC768 router optical splitter. –OC-768 Traffic is split into four 10GigEth lines –each line is monitored by a separate dual-core 3GHz Xeon servers w/ 4 GB of RAM, Linux –servers connected using Gigabit LAN One direction of OC768 link monitored, observed approximately 1.6 millions packets per second (about 7.3 Gbit/sec).

Computing HTTP Traffic Flows

Computing TCP Jitter (self-join)

Even Earlier Data Reduction Push predicates and projection operators into the NIC. Adapt operators to NIC capabilities –Snap length: transfer only the initial portion of a packet to the server –TCAM: high speed classification hardware Push common predicates, sampling, and partitioning into the NIC. Even below the NIC? NIC q1q3 q4q2 Q1 Q2 Q3 Prefilter

TCAM Programming Ternary Content Addressable Memory –each memory line stores a rules with associated action Relatively easy to program, can assign actions whenever we have a match in a CAM –we are using it to implement partitioning, sampling and tuple filtering –also possible on the 10Geth card –Enable parallel processing at the low level Need to translate partition set into TCAM rules –Also selection predicates and sampling rules

Conclusions First and only OC768 monitor in existence –glory and the pain of all first adopters DSMS technology is necessary to scale to the levels of data and query complexity –multi-query optimization –intelligent sampling for load shedding –distributed query optimization –TCAM programming … Successfully deployed in AT&T core network –many techniques developed for OC768 are now widely used in Mobility, ICDS and other deployments