High Performance Simulation of Internet Worms David M. Nicol Dept. of ECE, & Information Trust Institute University of Illinois, Urbana-Champaign.

Slides:



Advertisements
Similar presentations
Algorithm Analysis Input size Time I1 T1 I2 T2 …
Advertisements

1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.
Hybrid BDD and All-SAT Method for Model Checking Orna Grumberg Joint work with Assaf Schuster and Avi Yadgar Technion – Israel Institute of Technology.
Chapter 5: Tree Constructions
Slides from: Doug Gray, David Poole
COS 461 Fall 1997 Routing COS 461 Fall 1997 Typical Structure.
EDA (CS286.5b) Day 10 Scheduling (Intro Branch-and-Bound)
Restless bandits and congestion control Mark Handley, Costin Raiciu, Damon Wischik UCL.
Modeling Malware Spreading Dynamics Michele Garetto (Politecnico di Torino – Italy) Weibo Gong (University of Massachusetts – Amherst – MA) Don Towsley.
Model calibration using. Pag. 5/3/20152 PEST program.
1 EL736 Communications Networks II: Design and Algorithms Class8: Networks with Shortest-Path Routing Yong Liu 10/31/2007.
Generated Waypoint Efficiency: The efficiency considered here is defined as follows: As can be seen from the graph, for the obstruction radius values (200,
Planning under Uncertainty
Fall 2006CENG 7071 Algorithm Analysis. Fall 2006CENG 7072 Algorithmic Performance There are two aspects of algorithmic performance: Time Instructions.
Network Architecture for Joint Failure Recovery and Traffic Engineering Martin Suchara in collaboration with: D. Xu, R. Doverspike, D. Johnson and J. Rexford.
15-441: Computer Networking Lecture 26: Networking Future.
Traffic Engineering With Traditional IP Routing Protocols
Hash Tables With Finite Buckets Are Less Resistant to Deletions Yossi Kanizo (Technion, Israel) Joint work with David Hay (Columbia U. and Hebrew U.) and.
Detecting Network Intrusions via Sampling : A Game Theoretic Approach Presented By: Matt Vidal Murali Kodialam T.V. Lakshman July 22, 2003 Bell Labs, Lucent.
Network Bandwidth Allocation (and Stability) In Three Acts.
EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley Asynchronous Distributed Algorithm Proof.
Rethinking Internet Traffic Management: From Multiple Decompositions to a Practical Protocol Jiayue He Princeton University Joint work with Martin Suchara,
Semester 4 - Chapter 3 – WAN Design Routers within WANs are connection points of a network. Routers determine the most appropriate route or path through.
Ryan Kinworthy 2/26/20031 Chapter 7- Local Search part 2 Ryan Kinworthy CSCE Advanced Constraint Processing.
On Self Adaptive Routing in Dynamic Environments -- A probabilistic routing scheme Haiyong Xie, Lili Qiu, Yang Richard Yang and Yin Yale, MR and.
Internet Quarantine: Requirements for Containing Self-Propagating Code David Moore et. al. University of California, San Diego.
Coordinated Sampling sans Origin-Destination Identifiers: Algorithms and Analysis Vyas Sekar, Anupam Gupta, Michael K. Reiter, Hui Zhang Carnegie Mellon.
UCSC 1 Aman ShaikhICNP 2003 An Efficient Algorithm for OSPF Subnet Aggregation ICNP 2003 Aman Shaikh Dongmei Wang, Guangzhi Li, Jennifer Yates, Charles.
Chapter 13: WAN Technologies and Routing 1. LAN vs. WAN 2. Packet switch 3. Forming a WAN 4. Addressing in WAN 5. Routing in WAN 6. Modeling WAN using.
Flow Models and Optimal Routing. How can we evaluate the performance of a routing algorithm –quantify how well they do –use arrival rates at nodes and.
Distributed Quality-of-Service Routing of Best Constrained Shortest Paths. Abdelhamid MELLOUK, Said HOCEINI, Farid BAGUENINE, Mustapha CHEURFA Computers.
Distributed Constraint Optimization Michal Jakob Agent Technology Center, Dept. of Computer Science and Engineering, FEE, Czech Technical University A4M33MAS.
Top-Down Network Design Chapter Thirteen Optimizing Your Network Design Oppenheimer.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Chapter 9 – Classification and Regression Trees
Analysis of Algorithms
Multicast Routing Algorithms n Multicast routing n Flooding and Spanning Tree n Forward Shortest Path algorithm n Reversed Path Forwarding (RPF) algorithms.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
ACN: RED paper1 Random Early Detection Gateways for Congestion Avoidance Sally Floyd and Van Jacobson, IEEE Transactions on Networking, Vol.1, No. 4, (Aug.
Monte Carlo Methods Versatile methods for analyzing the behavior of some activity, plan or process that involves uncertainty.
Minimax Open Shortest Path First (OSPF) Routing Algorithms in Networks Supporting the SMDS Service Frank Yeong-Sung Lin ( 林永松 ) Information Management.
Program Efficiency & Complexity Analysis. Algorithm Review An algorithm is a definite procedure for solving a problem in finite number of steps Algorithm.
1 Short Term Scheduling. 2  Planning horizon is short  Multiple unique jobs (tasks) with varying processing times and due dates  Multiple unique jobs.
Multiplicative Wavelet Traffic Model and pathChirp: Efficient Available Bandwidth Estimation Vinay Ribeiro.
Interconnect simulation. Different levels for Evaluating an architecture Numerical models – Mathematic formulations to obtain performance characteristics.
EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley.
Intradomain Traffic Engineering By Behzad Akbari These slides are based in part upon slides of J. Rexford (Princeton university)
Efficient Cache Structures of IP Routers to Provide Policy-Based Services Graduate School of Engineering Osaka City University
1 - CS7701 – Fall 2004 Review of: Detecting Network Intrusions via Sampling: A Game Theoretic Approach Paper by: – Murali Kodialam (Bell Labs) – T.V. Lakshman.
Routing Networks and Protocols Prepared by: TGK First Prepared on: Last Modified on: Quality checked by: Copyright 2009 Asia Pacific Institute of Information.
Jennifer Rexford Fall 2014 (TTh 3:00-4:20 in CS 105) COS 561: Advanced Computer Networks TCP.
Written by Changhyun, SON Chapter 5. Introduction to Design Optimization - 1 PART II Design Optimization.
1 An Arc-Path Model for OSPF Weight Setting Problem Dr.Jeffery Kennington Anusha Madhavan.
Liang, Introduction to Java Programming, Sixth Edition, (c) 2007 Pearson Education, Inc. All rights reserved Chapter 23 Algorithm Efficiency.
Sporadic model building for efficiency enhancement of the hierarchical BOA Genetic Programming and Evolvable Machines (2008) 9: Martin Pelikan, Kumara.
Smart Sleeping Policies for Wireless Sensor Networks Venu Veeravalli ECE Department & Coordinated Science Lab University of Illinois at Urbana-Champaign.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
1 Low Latency Multimedia Broadcast in Multi-Rate Wireless Meshes Chun Tung Chou, Archan Misra Proc. 1st IEEE Workshop on Wireless Mesh Networks (WIMESH),
Optimization-based Cross-Layer Design in Networked Control Systems Jia Bai, Emeka P. Eyisi Yuan Xue and Xenofon D. Koutsoukos.
1 Chapter 5 Branch-and-bound Framework and Its Applications.
Hybrid BDD and All-SAT Method for Model Checking
Presented by Tae-Seok Kim
Analysis of Algorithms
Semester 4 - Chapter 3 – WAN Design
Data Streaming in Computer Networking
Network Layer.
Spatial Online Sampling and Aggregation
Presentation transcript:

High Performance Simulation of Internet Worms David M. Nicol Dept. of ECE, & Information Trust Institute University of Illinois, Urbana-Champaign

2 problem background Worm: malware that through automated means gains access to a host, implants a copy of itself, and uses the host to attempt to infect others Evaluation of defense against worms most readily facilitated by simulation / emulation –e.g. use simulator to generate traffic thrown against defenses Worms that search for targets can generate a lot of traffic –fast-scanning worms saturate links and may topple routers Problems : Find efficient and accurate ways of simulating infection growth of random scanning worms Find efficient and accurate ways of simulating scan traffic across large networks

3 variance is important Random scanning induces variation in infection times. It can be significant 20 independent sample paths, Code Red vII parameters 10% of infections Variance in time of k-th infection Variance in I(t) Early life : when the worm must be detected if defenses are to be effective Our evaluations need to be based on a range of possible behaviors one sample path is not enough

4 detailed model Like Code Red vII (except for uniform scanning) –100 concurrent infected host –Long time-out delay when SYN is not acked –Short delay with SYN-ACK return –State-independent probability of receiving SYN-ACK –State-dependent probability of infection from responding host TCP open timeout SYN-ACK returned,no infectioninfection Inner cycle Outer cycle Very small probabilities of infection imply geometric # inner cycles can be replaced with an exponential distribution

5 Time of Next Infection (TNI) Given I current infected hosts, time to next infection is the minimum of 100xI independent exponentials, hence is exponential with rate Scan rate Number of targetsProb{hit a target} State evolution of I(t) is thus a continuous-time Markov chain Compare this formulation with the classic SI equations TNI is a discrete version of SI

6 TNI and variance Sum of inter-arrival times Grows small quickly with increasing j Variance is dominated by early inter-infection times

7 TNI verification Time of ith infection, and I(t) are essentially indistinguishable at 95% significance level (K-S test)

8 hybrid model TNI is more efficient when susceptibles are hard to find, SI is more efficient when I(t) is rising rapidly Early life : few infections mid-life: many infections, many susceptibles Late life: few susceptibles  Decision policy : infection to simulate with TNI, or with numerical integration of SI Problem : minimize execution time, subject to constraint that relative error in variance for every time of infection is bounded

9 results a) For any  and infection index i, in logarithmic time find largest index f(i,  ) s.t. relative error in variance is <  if TNI used before i and after f(i,  ) Infection number i f(i,  ) TNI

10 Infection number Numerical integration TNI sampling results b) Cost of numerical integration of SI as a function of infection index is convex and symmetric, while cost of TNI per infection is constant c) For given i and f(i,  ) we can compute the least cost policy d) Optimum policy is tractable Numerical integration i f(I,  )

11 ratio of SI solution cost to hybrid cost Size of each subnetwork relative to Internet Factor by which S(0) exceeds avg. per subnet Very high ratio Approx. 10% of subnet hosts are susceptible

12 ratio of TNI cost to hybrid cost Size of each subnetwork relative to Internet Factor by which S(0) exceeds avg. per subnet Densely distributed susceptibles While TNI is “usually” much faster than differential equation model, hybrid protects against the dense cases

13 backbone and sub-networks Imagine a large backbone network –Routers modeled in detail –Sub-networks attach through backbone routers Worm state advanced individually in sub-networks Each  time, each sub-network offers description of scan flows (dest,rate) to backbone simulator Problem now is to determine delivered scan rates to each sub- network AT&T AboveNet Exodus Cable&Wireless Level3 Verio Sprint UUNet

14 Resolution and Transparency All of a port’s final output flows can be resolved once all of its input flow values are resolved But to break cycles we need to be smarter…. A port is transparent if the sum of input rate bounds is no greater than the output bandwidth Example : Suppose Then 1. so that Port becomes transparent Try to resolve final output flow values based on upper bounds Notice that every output flow is bounded from above by input flow rate …. Every flow can be bounded by its ingress rate Flow rate becomes resolved 2. Port becomes resolved Port becomes resolved 3. Flows become resolved 4. Repeat

15 Modeling congestion Even though flows are acyclic, dependency cycles may form in definition of flow rates congestion when otherwise Define in out in depends on No congestion out

16 State Change Rules Port states are {resolved, transparent, unresolved} Flow states are {settled, bounded} Rule 1: port resolution Pre-condition Action Port state is not resolved and all input flow states are settled Mark port state as resolved, compute all output flow values, mark each as settled

17 State Change Rules Port states are {resolved, transparent, unresolved} Flow states are {settled, bounded} Rule 2: port transparency Pre-condition Action Port state is unresolved and sum of input rate bounds is less than bandwidth, Mark port state as transparent. For every input rate that is settled, mark corresponding output rate as settled

18 State Change Rules Port states are {resolved, transparent, unresolved} Flow states are {settled, bounded} Rule 3: settle state transition Pre-condition Action Port state is transparent, some input flow is settled, and corresponding output flow is not Mark corresponding output flow as settled, with value equal to input flow value

19 State Change Rules Port states are {resolved, transparent, unresolved} Flow states are {settled, bounded} Rule 4: flow bound transition #1 Pre-condition Action Port state is unresolved, the fair proportion relative to settled flows of an input flow rate exceeds bound on output flow Lower corresponding output flow bound to be equal to fair proportion of input flow bound is sum of settled flow rates

20 State Change Rules Port states are {resolved, transparent, unresolved} Flow states are {settled, bounded} Rule 5: flow bound transition #2 Pre-condition Action Port state is not resolved, the flow rate bound of an input flow is less than the corresponding output flow bound Set bound of output flow equal to bound on input rate

21 Cycle Resolution After all that, we may still be left with cycles of unresolved ports General problem is solution of a system of non-linear equations –Solution methods generally iterative The number of iterations, and cost of iterations is principle issue –We explore “fixed-point” iteration. Each iteration : –freeze all input rates –compute output rates based on frozen input rates –compare new solutions with old for convergence Our experiments define convergence when the relative difference between successive flow value solutions is less than (1/10)% for all flow values

22 Results Convergence behavior –Examine # ports in cycle and iterations for convergence –Vary topology Topology#routers#links#flowsMbps Top Top Top Top Topologymedian #ports in cycles#median iterations Top Top Top Dependency reduction is effective Fixed point algorithm converges quickly

23 Results We ask –How fast does it run? –What is speedup relative to pure packet simulation? Topology#routers#links#flowsMbps Top Top Top Top Topologysecs/time-stepsecs/time-step (20% link util.) (50% link util.) Top Top Top Top For 1 sec time-step, faster than real- time on a model equivalent to 1.9G pkt-evts/sec (1K bytes/pkt) Experiments run on PC 1.5 GHz CPU 3Gb memory Linux OS Topologies Results

24 Results We ask –How fast does it run? –What is speedup relative to pure packet simulation? Topology#routers#links#flowsMbps Top Top Top Top Experiments run on PC 1.5 GHz CPU 3Gb memory Linux OS Topologies Results Link util.speedupLink util.speedup 10% % % % % % % % 1135 Directly compare packet-oriented simulation, using exactly same input flow rates, on Top-1 speedup over wide range of loads

25 Primer on packet forwarding CIDR addressing : /24 Forwarding decisions represented in trie –Branch on most significant bits in IP address –Remember last node seen with interface label –Example : , b,3 a,2 c,1 e,4 d, labelIP spaceOutput interface a00*2 b0100*3 c1*1 d1000*5 e101*4

26 flow optimizations Number of unique source-destination scan flows grows in the square of the number of sub-networks Run-time can be improved by reducing # flows Exploit common situation that payload does not matter for modeling –Flows are sufficiently identified by (dest,rate) –We can merge flows targeting the same destination, just add rates ( , 5) ( ,64, 27) ( , 19) No merging ( , 5) ( , 27) ( , 19) ( , 46) Merging

27 merge improvement Reduction of # flows is topology dependent For illustrative purposes consider a tree Count “flow crossings” : sum over all links of # flows Common destination Flow compression ratio is approximately log(# subnetworks )/ 2

28 splitting flows During peak worm scanning a sub-network will be scanning all other sub-networks Reduce # flows by offering to network an aggregate description, e.g /0 (all IP addresses) A router must split an aggregate flow descriptor when subspaces are routed through different interfaces ( /8,400) ( /10,100) ( /10, 100) ( /9, 200)

29 splitting and tries Push labels of exit points up the trie b,3 a,2 c,1 e,4 d, sink:3 sink:2 sink:3 sink:5 sink:4 labelIP spaceOutput interface a00*2 b0100*3 c1*1 d1000*5 e101*4 Splitting Algorithm a)Follow MSB of prefix to completion, or node with conflicted children b)If completed, route all to implied interface. If conflicted children Build set of descendents with earliest sink labels Partition flow in proportion to routed space

30 run-time gains are good Very fast worm, 4 backbone networks spanning O(100) to O(1000) routers : merge+split yields x15 improvement in execution time

31 memory gains are good Very fast worm, 4 backbone networks spanning O(100) to O(1000) routers : merge+split yields x10 reduction in flow interfaces

32 summary Developed hybrid model of worm whose execution time is optimal subject to constraint on error in variance Consider backbone + sub-networks model –Developed fast solution for solving huge system of coupled non-linear equations describing bandwidth sharing –Developed optimizations that reduce # flows managed in backbone by an order of magnitude We are simulating the interactions of infrastructure and worms on very large examples