Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spamscatter 1 Aug. 9 th, 2007Usenix Security 2007 Spamscatter: David S. Anderson, Chris Fleizach, Stefan Savage, and Geoffrey M. Voelker University of.

Similar presentations


Presentation on theme: "Spamscatter 1 Aug. 9 th, 2007Usenix Security 2007 Spamscatter: David S. Anderson, Chris Fleizach, Stefan Savage, and Geoffrey M. Voelker University of."— Presentation transcript:

1 Spamscatter 1 Aug. 9 th, 2007Usenix Security 2007 Spamscatter: David S. Anderson, Chris Fleizach, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego Characterizing Internet Scam Hosting Infrastructure Introduction

2 Spamscatter 2 Aug. 9 th, 2007Usenix Security 2007 Motivation 70 billion spam messages are sent everyday for a simple reason, advertising websites. A scam then is any website marketed using spam This online resource is directly implicated in the spam profit cycle, meaning it is rarer and more valuable Characterizing the scam infrastructure helps – Reveal the dynamics and business pressures exerted on spammers – Identify means to reduce unwanted sites and spam Introduction

3 Spamscatter 3 Aug. 9 th, 2007Usenix Security 2007 Spamscatter Approach Mine a large quantity of spam – Extract URLs – Probe machines hosting the scams This works because URLs must be correct – Follow the scent of money… All we need is a reliably large source of spam – We have access to a four letter, top level domain producing 150K spam per day Introduction

4 Spamscatter 4 Aug. 9 th, 2007Usenix Security 2007 Understanding scams Are scams distributed across different servers? Do different scams share the same server? How long do scams stay active? How reliable is their hosting? Where are scam servers located? Why is it useful to study these characteristics? Introduction

5 Spamscatter 5 Aug. 9 th, 2007Usenix Security 2007 Spamscatter and the Scam Methodology

6 Spamscatter 6 Aug. 9 th, 2007Usenix Security 2007 Methodology Data collection – Extract links from large spam feed – Probe links every 3 hours for 7 days – Record browser redirection – Save screenshots Analysis – Identify scams across servers and domains – Report on distributed and shared infrastructure, lifetime, stability, and location Methodology

7 Spamscatter 7 Aug. 9 th, 2007Usenix Security 2007 Identifying Scams Goal: Identify multiple hosts in the same scam, since many scams are spread across different IPs and domain names Naïve Approaches: 1. Correlate independent spam emails 2. Use HTML content returned from the webserver Limitations: Spam has too much chaff and obfuscation HTML is uninteresting and mostly composed of images. Web crawlers fail with frames, iframes and JavaScript Methodology

8 Spamscatter 8 Aug. 9 th, 2007Usenix Security 2007 Image Shingling Solution: Use rendered screenshots of web pages for correlation. – How to compare upwards of 10,000 images? Image shingling – based on text shingling idea [BRO97] – Fragment images into blocks and hash the blocks – Two images are similar if T% of the hashed blocks are the same (T=70-80%) – Shingling allows us to essentially compare all images in O(N lg N) – Resilient to small variations among images Methodology

9 Spamscatter 9 Aug. 9 th, 2007Usenix Security 2007 An Example Scam: “Downloadable Software” Scam Perspective 99 observed virtual hosts 3 IP addresses Operated for months 85 senders No forwarding used 5535 probes (97% successful) An Example Scam

10 Spamscatter 10 Aug. 9 th, 2007Usenix Security 2007 Clustering with Image Shingling Images differ slightly Some pages rotate content An Example Scam

11 Spamscatter 11 Aug. 9 th, 2007Usenix Security 2007 Location 2 Web servers in China; 1 Webserver in Russia 85 senders from 30 countries (28 from US) Blue – Web servers hosting Downloadable Software Red – Spam Relays – Hosts that sent us spam An Example Scam

12 Spamscatter 12 Aug. 9 th, 2007Usenix Security 2007 Shared Infrastructure One of the IPs (221.4.246.3) hosting “Downloadable Software” was also hosting “Toronto Pharmacy” Server located in Guangzhou, China An Example Scam

13 Spamscatter 13 Aug. 9 th, 2007Usenix Security 2007 Summary Statistics 1,087,711 319,700 36,390 7,029 Spam messages 30% contain links 11.3% are distinct links 19.3% resolve to unique IP addresses 1 week of spam collection – Nov. 28 th – Dec. 4 th 2 weeks of probing – Nov. 28 th – Dec. 11 th 2,334 33.2% resolve to distinct scams Results

14 Spamscatter 14 Aug. 9 th, 2007Usenix Security 2007 Distributed Infrastructure To what extent is the infrastructure distributed for scams? Most scams are not distributed: – 94% used one IP Top three distributed scams were extensive – 22, 30, and 45 IPs Top three virtual- hosted scams – 110, 695, and 3029 domain names Results - Infrastructure

15 Spamscatter 15 Aug. 9 th, 2007Usenix Security 2007 Shared Infrastructure To what extent do multiple scams share infrastructure? 38% of scams hosted on a machine with at least one other scam 10 IPs hosted 10 or more scams Top three shared IPs – 15, 18, and 22 scams Results - Infrastructure

16 Spamscatter 16 Aug. 9 th, 2007Usenix Security 2007 Scam Lifetime & Stability How long are scams active, and how reliable are the hosts? Scam webhosts seem to be taken down shortly after scams disappear Overall scam lifetime approached two weeks Reliability is high > 97% usually Results - Lifetime

17 Spamscatter 17 Aug. 9 th, 2007Usenix Security 2007 Spam campaign lifetime How long do spam campaigns last for a scam? 137 spams messages per scam (Avg) Most spam campaigns relatively short – 88% last 20 hours or less Only 8% last more than 2 days Scam lifetimes considerably longer – on average one week Results - Lifetime < 20 hour < 2 days

18 Spamscatter 18 Aug. 9 th, 2007Usenix Security 2007 Location Where are scam hosting servers located? Blue – Web servers Red – Spam Relays Results - Location

19 Spamscatter 19 Aug. 9 th, 2007Usenix Security 2007 Location Web Servers Country Count Percent 1. usa5884 [57.40%] 2. chn741 [7.23%] 3. can379 [3.70%] 4. gbr315 [3.07%] 5. fra314 [3.06%] 6. deu258 [2.52%] 7. rus185[1.80%] 8. kor181 [1.77%] Spam Relays Country CountPercent 1. usa54159 [14.50%] 2. fra26371 [7.06%] 3. esp25196[6.75%] 4. chn24833[6.65%] 5. pol21199 [5.68%] 6. ind20235 [5.42%] 7. deu18678 [5.00%] 8. kor17446 [4.67%] Results - Location

20 Spamscatter 20 Aug. 9 th, 2007Usenix Security 2007 Scam Categorization Scam category % of scams Uncategorized………………………………. 29.57% Information Technology………………… 16.67% Dynamic Content …………………………. 11.52% Business and Economy …………………. 6.23% Shopping ……………………………………… 4.30% Financial Data and Services ………….. 3.61% Illegal or Questionable …………………. 2.15% Adult ……………………………………………. 1.80% Message Boards and Clubs …………… 1.80% Web Hosting ………………………………… 1.63% Results - Categorization

21 Spamscatter 21 Aug. 9 th, 2007Usenix Security 2007 Lifetime of scams with Categorization More than 40% of malicious scams disappear before 120 hours Same is true for less than 15% of all scams Results - Categorization

22 Spamscatter 22 Aug. 9 th, 2007Usenix Security 2007 Summary Started with over 1m spam messages and coalesced to fewer than 2,500 scams. Image shingling allowed us to scalably determine if two sites were part of the same scam Most scams use one web server (vulnerable to blacklisting) – Scams may use many virtual domains that point to one IP Most scams not malicious per se Scam infrastructure more stable, longer lived, concentrated in US, compared with spam senders Conclusion

23 Spamscatter 23 Aug. 9 th, 2007Usenix Security 2007 Spammers beware; These boffins are on the prowl Questions and Answers Conclusion

24 Spamscatter 24 Aug. 9 th, 2007Usenix Security 2007 Spamscope Visibility Collected spam from news.admin.net- abuse.sightings – a newsgroup for contributing spam For a 3 day period, we saw – 6,977 spam from the newsgroup  205 scams – 113,216 spam from our feed  1,687 12% of the newsgroup scams were in ours The “largest” scams (most emails and most domains/IP) were seen in both feeds Supplementary Information

25 Spamscatter 25 Aug. 9 th, 2007Usenix Security 2007 Blacklists Host type Classification % of hosts Spam relay Open proxy 72.27% Spam host 5.86% Scam host Open proxy 2.06% Spam host 14.86% 9.7% of the scam hosts also sent us spam Results - Blacklisting

26 Spamscatter 26 Aug. 9 th, 2007Usenix Security 2007 Web Server OS 1Linux recent 2.4 (1)11.97% 2Windows 2000 (SP1+) 11.05% 3Akamai ???10.86% 4Windows 2000 SP48.25% 5Linux recent 2.4 (2)7.84% 6FreeBSD 4.6-4.8 7.72% 7Slashdot or BusinessWeek 7.04% 8FreeBSD 5.06.49% 9Windows XP SP15.90% 10Linux older 2.45.56% Supplementary Information

27 Spamscatter 27 Aug. 9 th, 2007Usenix Security 2007 URL Classification WISP Dynamic Content 17.931% WISP Uncategorized 13.965% WISP Illegal or Questionable 10.306% WISP Information Technology 9.051% WISP Shopping 4.872% WISP Business and Economy 4.733% WISP Financial Data and Services 4.626% WISP Personals and Dating 1.867% WISP Advertisements 1.249% WISP Educational Institutions 1.247% WISP Pay-to-Surf 1.022% WISP Search Engines and Portals 0.884% WISP Supplements and Unregulated Compounds 0.865% WISP Sex 0.862% Supplementary Information

28 Spamscatter 28 Aug. 9 th, 2007Usenix Security 2007 Image Clustering 2,541,486 250,864 9572 2334 Total probes 9.8% of probes result in a captured image 3.8% of screenshots are the 'first' screenshot for a scam Clusters detected by image shingling 1 week of spam collection – Nov. 28 th – Dec. 4 th 2 weeks of probing – Nov. 28 th – Dec. 11 th Supplementary Information

29 Spamscatter 29 Aug. 9 th, 2007Usenix Security 2007 Image Shingling For a typical day of screenshots, we tested various thresholds A 70% threshold provided a good mixture between flexibility and accuracy Supplementary Information

30 Spamscatter 30 Aug. 9 th, 2007Usenix Security 2007 Overlap of pairs of scams on the same server For scams running on the same server, how much time do they overlap? 96% of all scam pairs overlapped with each other when they remained active Only 10% of scams fully overlapped each other One week Supplementary Information

31 Spamscatter 31 Aug. 9 th, 2007Usenix Security 2007 IP ranges What are the network locations of scams and spam relays? The cumulative distribution of IP addresses is highly non- uniform Majority of spam relays (60%) fall between 58.* -> 91.* Most scams (50%) fall between 64.* -> 72.* Supplementary Information


Download ppt "Spamscatter 1 Aug. 9 th, 2007Usenix Security 2007 Spamscatter: David S. Anderson, Chris Fleizach, Stefan Savage, and Geoffrey M. Voelker University of."

Similar presentations


Ads by Google