Presentation is loading. Please wait.

Presentation is loading. Please wait.

Internet Intrusions: Global Characteristics and Prevalence Presented By: Zhichun Li Using slides from Vinod Yegneswaran’s presentation at SIGMETRICS 2003.

Similar presentations


Presentation on theme: "Internet Intrusions: Global Characteristics and Prevalence Presented By: Zhichun Li Using slides from Vinod Yegneswaran’s presentation at SIGMETRICS 2003."— Presentation transcript:

1 Internet Intrusions: Global Characteristics and Prevalence Presented By: Zhichun Li Using slides from Vinod Yegneswaran’s presentation at SIGMETRICS 2003

2 Overview Data Sources Data Sources Intrusion Characteristics Intrusion Characteristics –Port and source Distribution Projection to the global address space Projection to the global address space Implications of Shared Information Implications of Shared Information –Does information sharing help? –How much information is needed?

3 Goals This papers aims to: Show the volume of intrusions attempts Show the volume of intrusions attempts Show the distribution of intrusions Show the distribution of intrusions –In terms of both source and victim Show the impact of various scan types Show the impact of various scan types Expand findings to the global scope Expand findings to the global scope

4 Data Sources To extend the findings to the global scope, the data must: Come from many ASes Come from many ASes Be spread both geographically and over the IP address space Be spread both geographically and over the IP address space

5 DSHIELD http://www.dshield.org (part of SANS Institute) http://www.dshield.org (part of SANS Institute) http://www.dshield.org Firewall / NIDS logs, ~ 1600 networks Firewall / NIDS logs, ~ 1600 networks –BlackIce Defender, CISCO PIX Firewall, IP chains –Snort, Zonealarm Pro, Portsentry 4 months (aug 2001, may-july 2002) 4 months (aug 2001, may-july 2002) –60 million scans, 375K dest IPs per month –5 Class B, 45 Class C, many others

6 DSHIELD Data Lowest common denominator approach Lowest common denominator approach –simplicity, diversity, unbiased Pitfalls Pitfalls –packet headers, active connection info –flooding intentional, misconfiguration (broadcast, half- life) intentional, misconfiguration (broadcast, half- life) –Spoofed sources

7 DSHIELD Red dots represent participating ASes Grey lines demonstrate connectivity between ASes Dots closer to the center indicate ASes closer to the internet backbone

8 Worms Code-red I Code-red I –July 12, 2001, 2 phase attack, random propagation Code-red II Code-red II –Aug 4, 2001, “local-random propagation” Nimda Nimda –Sep 18, 2001, “local-random propagation” SQL-snake SQL-snake –May 2002, port 1433, random propagation –email passwords and sysinfo ixltd@postone.com

9 Scan Types Vertical Scan Vertical Scan –Multiple ports on 1 victim by 1 source Horizontal Scan Horizontal Scan –1 port on multiple victims by 1 source Coordinated Scans Coordinated Scans –Multiple sources aimed at a /24 space Stealth Scans Stealth Scans –Horizontal or vertical –Characterized by a very low frequency

10 Intrusion Characteristics Port Distribution Port Distribution –Monitor the destination port for intrusion attempts Source Distribution Source Distribution –Look for trends in the source address associated with intrusions –Group intrusions into port 80, port 1433, and non-worm scans

11 Port Distribution

12 Source Distribution port 80 port 1433 non-worm (June 2002) (June 2002) (June 2002)

13 Persistence of Worm Activity 3 months data: May-July 2002 (CDF) Half life ~ 18 days (/24), 6 hours (/32)

14 Date Characteristics Code Red 1 was still very much alive!!

15 Top Sources Mainly applies to non-worm scans Mainly applies to non-worm scans Results will show that only a few sources are responsible for a significant amount of the scans Results will show that only a few sources are responsible for a significant amount of the scans –Zipf Distribution Argument for a blacklist Argument for a blacklist

16 Top Sources Zipf distribution (power law) CDF (source IP rank vs num scans : log-log scale)

17 Top Sources May 2002 scan volume: overall vs top 100 sources Top 100 sources account for 50% of all scans in any month

18 Source Coordination Aug 2001: 8 of the top 20 sources display identical ON/OFF behavior Such clusters common among top 20 sources of all 4 months! All sources scan more than 5 distinct /16s.

19 Source Coordination May 2002: ON/OFF pattern (4 out of top 20 sources) Staggering behavior (identical attack or attack tool)

20 Identification of Scan Types Still look at only non-worm scans Still look at only non-worm scans Horizontal scans make up the majority of the scans Horizontal scans make up the majority of the scans More vertical scan episodes More vertical scan episodes Surprisingly high number of coordinated scans Surprisingly high number of coordinated scans Stealth scans occur much less frequently, but are usually vertical scans Stealth scans occur much less frequently, but are usually vertical scans

21 Scan Types Number of Scans

22 Scan Types Number of Episodes

23 Global Projections Question: How has the scanning trend changed over the past year? Question: How has the scanning trend changed over the past year? –Must extend the data to the entire internet Simply average the data and multiply by 2 32 Simply average the data and multiply by 2 32 –Possible because data comes from a broad range of sources

24 Projection of Port 80 Scans Port 80 scans show a decreasing trend – biased by release of CR I/II May-july 2002 relatively steady with small upward slope

25 Projection of Non-worm Scans Projection: (avg scan per IP) * num IPs – similar projections for /24 and /16 aggregates 25B scans / day

26 Implications of Shared Information Many have looked to pool resources Many have looked to pool resources Do not identify speed of attacks Do not identify speed of attacks Can gain a view of trends in attacks, though Can gain a view of trends in attacks, though

27 Information Theoretic Approach Relative Entropy – measure of the distributional similarity between two variables Relative Entropy – measure of the distributional similarity between two variables Marginal Utility – amount of information gained by adding more samples Marginal Utility – amount of information gained by adding more samples

28 Information Theoretic Approach Goal – how much does adding intrusion logs improve the resolution of identifying “worst offenders” Goal – how much does adding intrusion logs improve the resolution of identifying “worst offenders” Can be measured using marginal utility Can be measured using marginal utility –Number of experiments is the number of logs identified

29 Evaluation of Marginal Utility Approach Use 100 /16’s and 100 /24’s from the total data sets Use 100 /16’s and 100 /24’s from the total data sets –Chosen at random Received promising results about the amount gained from adding more data sets Received promising results about the amount gained from adding more data sets

30 Marginal Utility for Worst Offenders Random day, 100 random /16s and /24s Diminished returns after 40 /16s and 50 /24s

31 Marginal Utility for Detecting Target Ports Random day, 100 random /16s and /24s Diminished returns after 40 nodes.

32 Conclusion A lot of scanning directed away from port 80 A lot of scanning directed away from port 80 –25B scans per day, 25% non port 80 A set of worst offenders does exist who are responsible for a lot of the scanning A set of worst offenders does exist who are responsible for a lot of the scanning Combining data from multiple sites gives more information Combining data from multiple sites gives more information –Data from larger sites is more useful

33 Backup for discussion Data bias Data bias –Different platforms: BlackIce Defender, CISCO PIX, ZoneAlarm, Linux IPchains, Portsentry and Snort –1600 firewall/NIDS across geography and IP space

34 Internet Intrusion vs. Scan Scan is the most common and versatile type of intrusion Scan is the most common and versatile type of intrusion Normally, before compromising hackers need to use scan to find out venerability Normally, before compromising hackers need to use scan to find out venerability From scans we can know the attempts from hackers From scans we can know the attempts from hackers

35 spoof bounce Up to now, not widely used Up to now, not widely used Although we cannot track where you send the scan packet but still can track the receiver or sensor. Although we cannot track where you send the scan packet but still can track the receiver or sensor. Known existing tools: Idlescan Known existing tools: Idlescan

36 projection of whole Internet Pretty rough but should work Pretty rough but should work The set of provider networks are reasonably well distributed (both geographically and over the IP space) The set of provider networks are reasonably well distributed (both geographically and over the IP space) Using the routable IP space from BGP table should be a better plan. Using the routable IP space from BGP table should be a better plan.

37 Information sharing vs. privacy What shared are scanning attempts, which may be malicious, so share them normally won’t hurt people’s privacy. What shared are scanning attempts, which may be malicious, so share them normally won’t hurt people’s privacy. We also may build in BGP like policy control into information sharing. We also may build in BGP like policy control into information sharing.

38 scan episodes The scans sent by one attacker The scans sent by one attacker

39 100 16's and 100 24’s DSHIELD Data set: 5 Class B, 45 Class C, many others DSHIELD Data set: 5 Class B, 45 Class C, many others Here the 100 16’s is 100 /16 prefix, although only 5 is full. Here the 100 16’s is 100 /16 prefix, although only 5 is full. Same thing for 100 24’s Same thing for 100 24’s

40 Scan Speed Stealth scan Stealth scan –Internal between scans should less 180seconds. horizontal scans and vertical scans horizontal scans and vertical scans –1 hour is the upper bound –Normal time interval is much less.

41 Service Distribution of Scans


Download ppt "Internet Intrusions: Global Characteristics and Prevalence Presented By: Zhichun Li Using slides from Vinod Yegneswaran’s presentation at SIGMETRICS 2003."

Similar presentations


Ads by Google