Download presentation
Presentation is loading. Please wait.
Published byMervyn Stephens Modified over 9 years ago
1
A Data Stream Management System for Network Traffic Management Shivnath Babu Stanford University Lakshminarayanan Subramanian Univ. California, Berkeley Jennifer Widom Stanford University NRDM, Santa Barbara, CA, May 25, 2001
2
Network Traffic Management Large networks are growing complex and difficult to manage –Increasing demands, overprovisioning, hardware changes, manual configuration –Lack of information to configure network for effective usage –Collect data E.g., packet traces, network-flow data, SNMP data – Process data E.g., compute link utilization, per-hop delays, traffic demands – Deploy mechanisms to control traffic E.g., change routing parameters Network traffic management is becoming an important part of the Internet infrastructure Data management forms a core part of traffic management
3
Traffic Management: Data Collection Many data sources –Packet and flow traces –Router forwarding tables and configuration data –SNMP data –Active measurements of packet delay, link utilization Data is collected continuously –Networks need to be 24*7 for everything –Huge and fast-growing databases Many current traffic management systems store collected data in file systems or data warehouses
4
Traffic Management: Data Processing Sophisticated data processing is required Measuring link utilization –Aggregate packet traces Maintaining network topology –Join SNMP data from different network elements Deriving traffic demands –Join network flow traces, router forwarding tables and configuration data, and SNMP data Anomaly detection, traffic modeling, traffic prediction, and many others Most current traffic management systems process data using ad-hoc scripts or software toolkits
5
Challenge in Data Management: Online Data Processing Most current traffic management applications process data offline –Huge volume of data –Complex processing involved Offline processing is indeed appropriate for some applications –E.g., capacity planning, determining pricing plans Many traffic management applications need online processing –E.g., congestion cause detection, resource allocation for guaranteed QoS, detecting denial-of-service attacks, detecting Service-Level Agreement violations, admission control and traffic policing
6
Online Processing What’s wrong with using a file system and procedural processing? –Difficult to maintain and reuse (not a long term solution) What’s wrong with using a Database Management System (DBMS)? –DBMS expects all data to be managed as persistent data sets –DBMS assumes “one-time” queries against stored and finite data
7
A Data Stream Management System (DSMS) for Online Processing Data Streams are the appropriate model for online processing –Data is changing frequently (often exclusively though insertions) –It is impractical to operate on same data multiple times Continuous queries -- issued once and run “forever” Performance –Need continuous-query optimization –Need adaptive query-optimization A Data Stream Management System for traffic management –Idea: Support online processing with continuous queries over data streams
8
A Data Stream Management System for Online Processing (cont’d) Packet traces Flow tracesRouter forwarding tables SNMP data Active measurements Applications based on online processing Continuous Queries Data Management System Streams Data Stream Management System
9
Continuous Query over a Single Data Stream Many options with different ramifications Q A? Stream is infinite, append-only (e.g., packet traces) – size of A is unbounded for a filter query -- cannot store A – Stream out A -- but self-join query requires unbounded intermediate state to compute A – Updates to tuples in A -- e.g., aggregation query Stream has updates, deletions (e.g., SNMP data) – often require more intermediate state to compute A Data Stream
10
Operator Architecture in a DSMS Stream Append-only semantics: Result tuples that won’t change later Update semantics: Updates to current result Store: Result tuples that could change later Scratch: Intermediate state to compute future results Throw: Unneeded data
11
Example Queries from Traffic Management Single packet trace input data stream (IP headers over a link) Continuous query 1: Link utilization (total #bytes sent over the link) –Store -- sum of packet lengths –Stream -- empty –Scratch -- empty Continuous query 2: Number of flows per protocol Flow Identifier Scratch Packet Trace Stream Per-Protocol #flows counter Store
12
Example Queries from Traffic Management (cont’d) Continuous query 3: Join packet traces collected from different points in the network to measure packet delays (or identify routes) Packet trace 1 Packet trace 2 Symmetric Hash-Join Scratch Stream Efficient intermediate state management Intermediate state is unbounded theoretically Use of constraints can reduce intermediate state Can reclaim memory after each match Approximate answers can further reduce intermediate state Can you trade precision for state? HT 1 HT 2
13
Examples Queries from Traffic Management (cont’d) Continuous query 4: Identify top 5% (source IP address, destination IP address) Pairs with maximum bandwidth consumption over a link Non-trivial query over a stream –Number of distinct Pairs can vary –Bandwidth consumption of each Pair can vary –How much intermediate state is needed? Count Distinct Pairs Scratch Packet trace Stream Bandwidth Consumption Of Pairs Store Scratch Top 5% Pairs
14
Further Challenges in Data Management: Distributed Stream Processing Data is collected from different points in a network Structure of an Internet Service Provider imposes restrictions –Core routers are sensitive (so are the network operators ) Sending collected data to a central processing site is harmful –Additional load on the network –Hinders real-time processing –Won’t scale with the network and traffic Truly distributed processing is infeasible for many queries –Goal: minimize communication traffic –Trade communication traffic for precision
15
Example Queries from Traffic Management (cont’d) Continuous query 5: Identify top 5% of destination IP addresses with maximum bandwidth consumption (to detect denial-of- service attacks) CQ 5 local CQ 5 local CQ 5 local CQ 5 global Stream Hierarchical processing structure could also be useful
16
Summary of Basic Problems and Techniques Continuous queries over data streams is a unique combination of: –Online processing –Storage constraints -- amount of memory available is bounded Query result size may be unbounded Intermediate state may be unbounded Relevant techniques –Online data structures (not build-and-throw) –Summarization: samples, histograms, wavelets, fractals –Adaptivity Data characteristics Flow rates Amount of memory
17
Some Simplifying Assumptions In talk, but not necessarily in work Traffic management data is clean –Data is dirty: incomplete, inconsistent –Temporal uncertainties –Could be reduced as the importance of traffic management is realized Traffic management data is tuple-oriented –Often true –Implications for query language
18
Conclusions Traffic management requires efficient data management Many traffic management applications benefit from online data processing Case for a Data Stream Management System (DSMS) –Provides continuous queries over data streams for online processing –Many interesting research issues –Work is in progress Additional references –S. Babu and J. Widom. Continuous queries over data streams http://dbpubs.stanford.edu/pub/2001-9 –STREAM project homepage http://www-db.stanford.edu/stream
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.