Presentation is loading. Please wait.

Presentation is loading. Please wait.

Supporting real-time & offline network traffic analysis Chung-Min Chen Munir Cochinwala Allen Mcintosh Marc Pucci Telcordia Technologies Applied Research.

Similar presentations


Presentation on theme: "Supporting real-time & offline network traffic analysis Chung-Min Chen Munir Cochinwala Allen Mcintosh Marc Pucci Telcordia Technologies Applied Research."— Presentation transcript:

1 Supporting real-time & offline network traffic analysis Chung-Min Chen Munir Cochinwala Allen Mcintosh Marc Pucci Telcordia Technologies Applied Research Morristown, NJ, USA

2 Outline  OSS Requirements  Work Proposal  Stream Data Management Issues  Traffic Warehouse  Tribeca: a stream database manager

3 OSS Requirements OSS Data time frame/ resp. time  Traffic control & seconds – minutes monitoring  Service level 15 min. – hours agreement  Capacity planningweeks - months

4 Work proposal (system overview) EMS tcpdump SNMP agent SNMP agent adaptor R R BPF Stream Engine LAN WAN Live SQL Live Monitor Live Monitor Live Monitor client DBMS Warehouse

5 Real-time traffic analysis: state-of-industry  Ad hoc or canned programs/scripts –Slow deployment –No data sharing –Hard to maintain and little reuse  Traditional DBMS: –Can beat high line speed (e.g., OC48)? –Cumbersome in programming (write into DB then query) –Semantic mismatch between “stream” and “relation”

6 Stream Data Management  “stream” as a first class object (like “relation”)  Stream: –a continuous, unbounded sequence of records with a total ordering  Issues –Stream algebra –Data types –Query language –Implementation

7 Stream Algebra  Operators: –Selection: relatively easy –Join: can be defined nicely (assuming unbounded buffer) –Demultiplex/multiplex: the result could be multiple streams  Operands: –Stream + stream –Stream + relation

8 Data Types  BLOB: –leave the burden to the application developers  Conventional relational data types: –Need “adaptors” to convert from raw types to relational types  Native support for structured binary object (SBO) –Separate fields at “bit” level –Most flexible & efficient, but require re-implementation of the database type system

9 Stream Query Language  How to handle multi-stream output, e.g. group-by? select avg(ip_stream.packet_size) from ip_stream group by ip_stream.source_ip_addr  How to handle indefinitely waiting in join? select * from s1, s2 where s1.packet_id = s2.packet_id  Time window clause, temporal attributes/operators, …

10 Implementation Issues  Bounded buffer management  Time-constrained query processing: must beat the buffer refresh rate  Storage & I/O bandwidth requirement (OC48 or higher?)  Migration of data & processing to disk  Data loss & incomplete query

11 Traffic Warehouse  Repository of traffic data for off-line analysis  Efficient navigation across protocol stack & other business table dimensions  Storage (cluster, parallelism)  Distributed warehouse approach  Chen et al. [SIGMOD2000]: –HTTP, FTP, TCP. IP –tcpdump, HTTP server logs  Caceres et al. [IEEE Comm. 2000]: AT&T WorldNet data warehouse

12 Tribeca* [VLDB96,USENIX98]  Singe stream input (no “join”)  Supported operators: –Selection –Projection –Aggregates –Mux/demux multi-stream output –Time window  User-defined data type and extraction functions (in C)  Tested on ATM cell traces  Achieved 5-7MB/s (30-40k rec/s ) processing rate on a Sun Sparc10 *former contributors: M. Sullivan, Y. Saraiya, A. Heybey

13 Tribeca: example query ATM VCs (virtual circuits)  Q1: Count the accumulated number of large IP packets ( > 250 bytes) transmitted over the link.  Q2: Find the number & avg length of TCP/IP packets for every successive 5 second time window. Save to a file.

14 Tribeca: example query source_stream s1 is {live, “atm_link_1476”, AtmCellTrace} result_stream r1 is {file “res1”} stream_demux {s1.atm.vci} p1 atm cells s1 demux on VCI

15 Tribeca: example query source_stream s1 is {live, “atm_link_1476”, AtmCellTrace} result_stream r1 is {file “res1”} stream_demux {s1.atm.vci} p1 stream_proj {{p1.assemble_ip}} p2 stream_mux p2 p3 atm cells s1 demux assemble extract mux P2 IP packets assemble_ip is a user-defined function p3

16 Tribeca: example query source_stream s1 is {live, “atm_link_1476”, AtmCellTrace} result_stream r1 is {file “res1”} stream_demux {s1.atm.vci} p1 stream_proj {{p1.assemble_ip}} p2 stream_mux p2 p3 stream_qual {p3.length.geq 250} p4 stream_agg {p4.count} length > 250 atm cells s1 demux assemble extract mux IP packets count display p4

17 Tribeca: example query source_stream s1 is {live, “atm_link_1476”, AtmCellTrace} result_stream r1 is {file “res1”} stream_demux {s1.atm.vci} p1 stream_proj {{p1.assemble_ip}} p2 stream_mux p2 p3 stream_qual {p3.length.geq 250} p4 stream_agg {p4.count} stream_qual {{p3.type.eq TCP}} p5 stream_agg {p5.count, p5.length.avg} on fixed window {5 sec} r1 length > 250 atm cells s1 demux assemble extract mux IP packets count display type = TCP fixed 5 sec window count, avg (length) r1 (save to file) p5

18 Tribeca  data type inheritance (IP - TCP, UDP)  window: fixed vs. moving; user-defined delimiter  record: fixed length, variable length, framing  implementation optimization –dual buffers –minimize data copying: passing pointers instead

19 Related Activities  CAIDA  SLAC  NLANR  XIWT –AT&T,HP,Sun,Telcordia, … –passive Internet traffic collection at major Internet backbone routers

20 Related Work  Tangram [Parker90,92] –a model captures streams, sets and parallelism –more a state machine than a query language  SEQ [Seshadri95,96] –static sequences  Datacycle [Bowen92] –information filtering on broadcast data


Download ppt "Supporting real-time & offline network traffic analysis Chung-Min Chen Munir Cochinwala Allen Mcintosh Marc Pucci Telcordia Technologies Applied Research."

Similar presentations


Ads by Google