Traceback Pat Burke Yanos Saravanos. Agenda Introduction Problem Definition Traceback Methods  Packet Marking  Hash-based Conclusion References.

Traceback Pat Burke Yanos Saravanos

Agenda Introduction Problem Definition Traceback Methods  Packet Marking  Hash-based Conclusion References

Why Use Traceback? SPAM DoS Insider attacks Worms / Viruses  Code Red (2001) spreading at 8 hosts/sec  Slammer Worm (2003) spreading at 125 hosts/sec

Why Use Traceback? Currently very difficult to find spammers, virus authors  Easy to spoof IPs  No inherent tracing mechanism in IP Blaster virus author left clues in code, was eventually caught What if we could trace packets back to point of origin?

Packet Tracing

Benchmarks Effect on throughput  Amount of overhead added to the packets False positive rate Computational intensity  Time required to trace an attack  Amount of data required to trace an attack  CPU/memory usage on router Collisions  For hash-based traceback methods

FDPM Flexible Deterministic Packet Marking

Packet Marking Add information to the packets so that paths can be retraced to original source Methods for marking packets  Probabilistic Node Marking Edge Marking  Deterministic

Probabilistic Packet Marking (PPM) Using probability, router marks a packet  With router IP address (node marking)  With edge of paths (edge marking) 95% accuracy, requires ~300,000 packets PPM needs a large number of packets and creates very high computational load

PPM Nodes - Cons Large number of false positives  DDoS with 25 hosts requires several days and has thousands of false positives Slow convergence rate  For 95% success, we need 300,000 packets Attacker can still inject modified packets into PPM network (mark spoofing) This is only for a single attacker

Deterministic Packet Marking (DPM) Every packet is marked Spoofed marks are overwritten with correct marks

DPM Incoming packets are marked Outgoing packets are unaltered Requires more overhead than PPM Less computation required Probability of generating ingress IP address (1-p) d-1

DPM Mark 16-bit mark field 1-bit flag field

DPM Mark Encoding Two 16-bit fields and a 1-bit flag IP populates one of the two fields with probability of 0.5  Set flag to 1 if using the higher end bits Can be made more secure by using non- uniform probability distributions

DPM Claimed to have 0 false positives Claimed to converge very quickly  99% probability of success with 7 packets  99.9% probability of success with only 10 packets Has not been tested on large networks Cannot deal with NAT

Flexible DPM DPM uses 17 bits in the IP header to store marking information FDPM uses a variable length TOS added to the mark  TOS is between 0 and 8 bits, mark is 17-25 bits Split ingress IP into k segments, send in separate packets  Segment numbers keep address order consistent  Reconstruction process recognizes packets from same source

FDPM Reconstruction Mark Recognition  Store reconstruction packets in cache  Split IP header into fields to find mark length Address Recovery  Analyze and store mark in recovery table  Different source IPs may have the same digest (hash value) and collisions may occur More than one entry is created

Flow-Based Marking Mark packets selectively according to flow properties when router has heavy traffic  Reduce load on router while still marking Packets are classified according to destination IP address Uses flow thresholds  L max is threshold where router’s load is exceeded (called the overload problem)

Traceable Sources

Overload Problem Under heavy traffic, router can randomly mark packets Both attack and normal packets may be marked Flow-based marking as mentioned before is much more efficient

FDPM Performance FDPM can trace many more sources with less computational load since it uses variable mark lengths Incorporates overload prevention to keep router from failure Requires more overhead (up to 25 bits instead of only 17) PPMDPMFDPM Packets for Traceback10 3 10 2 Traceable Sources10 2 10 3 10 5 Computational LoadHighMediumLow Overload PreventionNone Good

HASH-BASED TRACEBACK Source Path Isolation Engine (SPIE)

SPIE - Overview Each router along a packet’s transmission path computes a set of Hash-codes (digests) associated with each packet The time-tagged digests are stored in router- memory for some time period  Limited by available router resources Traceback is initiated only by “authenticated agent requests” to the SPIE Traceback Manager (STM)  Executed by means of a broadcast message Results in the construction of a complete attack graph within the STM

SPIE - Assumptions Packets may be addressed to multiple destinations Attackers are aware they are being traced Routers may be subverted, but not often Routing within the network may be unstable  Traceback must deal with divergent paths Packet size should not grow as a result of traceback  1 byte increase in size = 1% increase in resource use  Very controversial … self-enabling assumption End hosts may be resource constrained Traceback is an infrequent operation  Broadcast messages can have a significant impact on internet performance Traceback should return entire path, not just source

SPIE - Architecture DGA (Data Generation Agent) Resident in SPIE-enhanced routers to produce digests and store them in time-stamped digest tables. Implemented as software agents, interface cards, or dedicated aux boxes SCAR (SPIE Collection and Reduction Agents) Data concentration point for some regional area. When traceback is requested, SCAR’s initiate a broadcast request for traceback and produce regional attack graphs based upon data from constituent DGA’s STM (SPIE Traceback Manager) Controls the SPIE system. Verifies authenticity of a traceback request, dispatches the request to the appropriate SCAR’s, gathers regional attack graphs, and assembles the complete attack graph.

SPIE - Hashing Multiple hash-codes (hash-codes, different groupings of fields) are calculated for each package based on 24 bytes of relatively invariant fields in the header plus the first 8 byte of the payload. Masked (gray) areas are NOT used in hash-code calculation For each packet received, SPIE computes k independent n-bit digests, and sets the corresponding bits in the 2n bit digest table [Bloom Filter] (nominally 256 Mb per filter). Each filter contains the digests of multiple packages (approximately 50M packets per filter)… as large as possible, but avoiding collisions.

SPIE – Hashing Collisions The figure to the right shows the fraction of packets that collide as a function of prefix length.  The WAN represents 985,150 packets between 6,031 host pairs collected at the University of Florida OC-3 gateway.  The LAN trace consists of one million packets between 2,879 pairs at the MIT Lab for Computer Science. WAN.00092% LAN.139%

SPIE – Hashing When one Bloom Filter is “full”, the next one is initialized and time tagged to record the receipt of the next packet. Can be implemented as a circular buffer of filters.  For security purposes, each SPIE Agent generates a new set of k input vectors to the Bloom Filters with each filter change Based on a pseudo-random number generator independently seeded at each router  These vectors are stored with the associated filter. SPIE never needs to record any packet payload information  The first 8 bytes of the payload can be regenerated from the hashing, given the stored input vectors

SPIE – Traceback Processing The SPIE Traceback Manager controls the process  Cryptographically verifies that the authenticity and integrity of the traceback request message Authorized requester Packet ID Victim Approximate time of attack  Dispaches the request to the appropriate SPIE Collection and Reduction Agents (SCARs) SCARs poll their assigned Data Collection Agents (DCAs)  DCAs poll their assigned routers  If the response from the targeted SCARs indicates that other regional SCARs are involved in the Trace, the STM sends another direct request This loop continues until all branches terminate  Gathers the resulting attack graphs from the (SCARs)  Assembles them into a Complete Attack Graph

SPIE – Traceback Processing (Cont) SPIE-enhanced routers hash the data received in the Traceback Request to determine whether or not the target message passes through the router  Computes k digests using the appropriate input vectors  Checks for a “1” in each of the corresponding K locations of the digest table “near” the target time If ALL associated bits are set, it is highly likely that the packet was stored.  It is within the realm of possibilities that the Filter is saturated with an overabundance of packets, creating a false positive.  This is controlled by limiting the number of digests in each filter, depending upon Digest Table size and the mean volume of packet traffic.

SPIE – Traceback Processing (Cont) Reverse Path Flooding, starting at the Victim’s router (V) and proceeding backwards toward the Attacker (A). Solid arrows represent the attack path. Dashed arrows are SPIE queries. Queries are dropped by routers that did not forward the packet in question. ATTACK PATH: A R2 R5 R7 R9 V

SPIE – Metrics

SPIE – Implementation Issues PRO  Single packet tracing is feasible  Automated processing by SPIE- enhanced routers make spoofing difficult, at best  Relatively low storage required Only digests and time are stored  Does not aid in eavesdropping of payload data Payload is not stored CON  Requires specially configured (SPIE-enhanced) routers. Probability of detection is directly related to the number of available SPIE-enhanced routers in the network in question  Storage in routers is a limiting factor in the window of time in which a packet may be successfully traced May consider some sort of filtering of packets to be digested  May have the appearance of a loss of anonymity across the Internet

Conclusions DoS, worms, viruses continuously becoming more dangerous Attacks must be shut down quickly and be traceable Integrating traceback into next generation Internet is critical

Conclusions Flexible Deterministic Packet Marking  As fast as regular DPM, faster than PPM  Requires more overhead than DPM, but traces more sources and less computational load Hash-based Traceback  No packet overhead  New, more capable routers

Conclusions Cooperation is required  Routers must be built to handle new tracing protocols  ISPs must provide compliance with protocols  Internet is no longer anonymous Some issues must still be solved  NATs  Collisions

References Belenky, A., Ansari, N. “IP Traceback with Deterministic Packet Marking”. IEEE Communications Letter, April 2003. Savage, S., et al. “Practical Network Support for IP Traceback”. Department of Computer Science, University of Washington. Snoeren, A., Partridge, Craig, et al. “Single-Packet IP Traceback”. IEEE/ACM Transactions on Networking, December 2002. Xiang, Y., Zhou, W. “A Defense System Against DDoS Attacks by Large-Scale IP Traceback”, IEEE 2005.

Traceback Pat Burke Yanos Saravanos. Agenda Introduction Problem Definition Traceback Methods  Packet Marking  Hash-based Conclusion References.

Similar presentations

Presentation on theme: "Traceback Pat Burke Yanos Saravanos. Agenda Introduction Problem Definition Traceback Methods  Packet Marking  Hash-based Conclusion References."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Traceback Pat Burke Yanos Saravanos. Agenda Introduction Problem Definition Traceback Methods  Packet Marking  Hash-based Conclusion References.

Similar presentations

Presentation on theme: "Traceback Pat Burke Yanos Saravanos. Agenda Introduction Problem Definition Traceback Methods  Packet Marking  Hash-based Conclusion References."— Presentation transcript:

Similar presentations

About project

Feedback