1 Characterizing Botnet from Email Spam Records Presenter: Yi-Ren Yeh ( 葉倚任 ) Authors: L. Zhuang, J. Dunagan, D. R. Simon, H. J. Wang, I. Osipkov, G. Hulten,

Slides:



Advertisements
Similar presentations
Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.
Advertisements

A probabilistic model for retrospective news event detection
© 2012 Eloqua, Inc. Confidential 1 Deliverability and IP Warming Overview and Implementation Using Eloqua.
Detecting Spam Zombies by Monitoring Outgoing Messages Zhenhai Duan Department of Computer Science Florida State University.
Han-na Yang Trace Clustering in Process Mining M. Song, C.W. Gunther, and W.M.P. van der Aalst.
A Survey of Botnet Size Measurement PRESENTED: KAI-HSIANG YANG ( 楊凱翔 ) DATE: 2013/11/04 1/24.
Detecting Malicious Flux Service Networks through Passive Analysis of Recursive DNS Traces Roberto Perdisci, Igino Corona, David Dagon, Wenke Lee ACSAC.
Unit 7 Section 6.1.
Graduate : Sheng-Hsuan Wang
1 A Spam Mail-based Solution for Botnet Detection and Network Bandwidth Protection 許富皓 資訊工程學系 中央大學 1.
Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin.
Understanding the Network-Level Behavior of Spammers Anirudh Ramachandran Nick Feamster.
Detecting Near Duplicates for Web Crawling Authors : Gurmeet Singh Mank Arvind Jain Anish Das Sarma Presented by Chintan Udeshi 6/28/ Udeshi-CS572.
1 BotGraph: Large Scale Spamming Botnet Detection Yao Zhao EECS Department Northwestern University.
Understanding the Network-Level Behavior of Spammers Mike Delahunty Bryan Lutz Kimberly Peng Kevin Kazmierski John Thykattil By Anirudh Ramachandran and.
1 Authors: Anirudh Ramachandran, Nick Feamster, and Santosh Vempala Publication: ACM Conference on Computer and Communications Security 2007 Presenter:
1. Introduction The underground Internet economy Web-based malware The system analyzing the post-infection network behavior of web-based malware How do.
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
Detecting Spammers with SNARE: Spatio-temporal Network-level Automatic Reputation Engine Shuang Hao, Nadeem Ahmed Syed, Nick Feamster, Alexander G. Gray,
Personalized Spam Filtering for Gray Mail Ming-wei Chang University of Illinois at Urbana-Champaign Wen-tau Yih and Robert McCann Microsoft Corporation.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Detecting Near-Duplicates for Web Crawling Manku, Jain, Sarma
Authors: Sheng-Po Kuo, Yu-Chee Tseng, Fang-Jing Wu, and Chun-Yu Lin
Analyzing Reliability and Validity in Outcomes Assessment (Part 1) Robert W. Lingard and Deborah K. van Alphen California State University, Northridge.
Economics of Malware: Spam Amir Houmansadr CS660: Advanced Information Assurance Spring 2015 Content may be borrowed from other resources. See the last.
1 Detecting Malicious Flux Service Networks through Passive Analysis of Recursive DNS Traces Speaker: Jun-Yi Zheng 2010/03/29.
Understanding the Network-Level Behavior of Spammers Best Student Paper, ACM Sigcomm 2006 Anirudh Ramachandran and Nick Feamster Ye Wang (sando)
Cloak and Dagger: Dynamics of Web Search Cloaking David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 左昌國 Seminar.
A novel approach of gateway selection and placement in cellular Wi-Fi system Presented By Rajesh Prasad.
Energy-Aware Scheduling with Quality of Surveillance Guarantee in Wireless Sensor Networks Jaehoon Jeong, Sarah Sharafkandi and David H.C. Du Dept. of.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Report on “Spamming Botnets: Signatures and Characteristics ” Heyong Wang Department of Computer Science Iowa State University.
Automated Classification and Analysis of Internet Malware M. Bailey J. Oberheide J. Andersen Z. M. Mao F. Jahanian J. Nazario RAID 2007 Presented by Mike.
Content Addressable Network CAN. The CAN is essentially a distributed Internet-scale hash table that maps file names to their location in the network.
Automatically Generating Models for Botnet Detection Presenter: 葉倚任 Authors: Peter Wurzinger, Leyla Bilge, Thorsten Holz, Jan Goebel, Christopher Kruegel,
Not So Fast Flux Networks for Concealing Scam Servers Theodore O. Cochran; James Cannady, Ph.D. Risks and Security of Internet and Systems (CRiSIS), 2010.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Extracting meaningful labels for WEBSOM text archives Advisor.
Load-Balancing Routing in Multichannel Hybrid Wireless Networks With Single Network Interface So, J.; Vaidya, N. H.; Vehicular Technology, IEEE Transactions.
Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin In First Workshop on Hot Topics in Understanding Botnets,
Spamscatter: Characterizing Internet Scam Hosting Infrastructure By D. Anderson, C. Fleizach, S. Savage, and G. Voelker Presented by Mishari Almishari.
Studying Spamming Botnets Using Botlab 台灣科技大學資工所 楊馨豪 2009/10/201 Machine Learning And Bioinformatics Laboratory.
Leveraging Asset Reputation Systems to Detect and Prevent Fraud and Abuse at LinkedIn Jenelle Bray Staff Data Scientist Strata + Hadoop World New York,
On Reducing Broadcast Redundancy in Wireless Ad Hoc Network Author: Wei Lou, Student Member, IEEE, and Jie Wu, Senior Member, IEEE From IEEE transactions.
BotGraph: Large Scale Spamming Botnet Detection Yao Zhao, Yinglian Xie, Fang Yu, Qifa Ke, Yuan Yu, Yan Chen, and Eliot Gillum Speaker: 林佳宜.
By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.
Spamming Botnets: Signatures and Characteristics Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten, and Ivan Osipkov. SIGCOMM, Presented.
Spamming Botnets: Signatures and Characteristics Authors:Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten+, Ivan Osipkov+ Presenter: Chia-Li.
Understanding the Network-Level Behavior of Spammers Author: Anirudh Ramachandran, Nick Feamster SIGCOMM ’ 06, September 11-16, 2006, Pisa, Italy Presenter:
Understanding the network level behavior of spammers Published by :Anirudh Ramachandran, Nick Feamster Published in :ACMSIGCOMM 2006 Presented by: Bharat.
Exploiting Temporal Persistence to Detect Covert Botnet Channels Authors: Frederic Giroire, Jaideep Chandrashekar, Nina Taft… RAID 2009 Reporter: Jing.
Studying Spamming Botnets Using Botlab
Search Worms, ACM Workshop on Recurring Malcode (WORM) 2006 N Provos, J McClain, K Wang Dhruv Sharma
Exploiting Network Structure for Proactive Spam Mitigation Shobha Venkataraman * Joint work with Subhabrata Sen §, Oliver Spatscheck §, Patrick Haffner.
What is Web Information retrieval from web Search Engine Web Crawler Web crawler policies Conclusion How does a web crawler work Synchronization Algorithms.
Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS
Speaker:Chiang Hong-Ren An Investigation and Implementation of Botnet Detection Schemes.
Relying on Safe Distance to Achieve Strong Partitionable Group Membership in Ad Hoc Networks Authors: Q. Huang, C. Julien, G. Roman Presented By: Jeff.
Spamming Botnets: Signatures and Characteristics Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Microsoft Research, Silicon Valley Geoff Hulten,
2009/6/221 BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure- Independent Botnet Detection Reporter : Fong-Ruei, Li Machine.
Brett Stone-Gross, Marco Cova, Lorenzo Cavallaro, Bob Gilbert, Martin Szydlowski, Richard Kemmerer, Christopher Kruegel, and Giovanni Vigna Proceedings.
Using Blog Properties to Improve Retrieval Gilad Mishne (ICWSM 2007)
How dynamic are IP addresses? Yinglian Xie, Fang Yu, Kannan Achan, Eliot Gillum, Moises Goldszmidt, Ted Wobber SIGCOMM ‘07 Chulhyun Park
Deliverability and IP Warming
Dec 14, 2014, Harvard University
By Arijit Chatterjee Dr
WEB SPAM.
Network Profiler: Towards Automatic Fingerprinting of Android Apps
Design open relay based DNS blacklist system
Discrete Event Simulation - 4
Detecting Phrase-Level Duplication on the World Wide Web
Presentation transcript:

1 Characterizing Botnet from Spam Records Presenter: Yi-Ren Yeh ( 葉倚任 ) Authors: L. Zhuang, J. Dunagan, D. R. Simon, H. J. Wang, I. Osipkov, G. Hulten, and J. Tygar. USENIX LEET 2008

2 Outline Introduction Overview Methodology Metrics and Findings Conclusion

3 Introduction Spam is a driving force in the economics of botnets This work map botnet membership and other characteristics of botnets using spam traces By grouping similar messages and related spam campaigns, the authors identify a set of botnets A large trace of spam from Hotmail Web mail service is used

4 Pros and Cons of using Spam The analysis can be done on an existing trace from one of the small number of large Web mail providers Directly related to the economic motivation behind many botnets Potentially a less ad-hoc and easier task than analyzing IRC/DNS logs Unable to uncover botnets not involved in spamming

5 Contributions The first one to analyze entire botnets (in contrast to individual bot) behavior from spam messages The first to study botnet traces based on economic motivation and monetizing activities New findings about botnets involved in spamming

6 Overview The major steps in the proposed method Cluster messages into spam campaigns Spam messages with identical or similar content are sent from the same controlling entity Use fingerprints to cluster message Assess IP dynamics Extract the average time until an IP address gets reassigned The IP reassignment range (both are under each C-subset) Merge spam campaigns into botnets Via the overlapping of the sending hosts

7 Methodology Datasets and initial processing Identifying spam campaigns Skipping spam from non-bots Assessing IP dynamics Identifying botnets Estimating botnet Size

8 Datasets and Initial Processing Collected from the Hotmail Web mail service (Junk Mail) Randomly sample 5 million spam messages collected over a 9-day period from May 21, 2007 to May 29, 2007 Extract a reliable sender IP address heuristically for each message Parse the body parts to get both HTML and text from each message

9 Identifying Spam Campaigns Use ad hoc approaches to pre-clean the raw content and get only the rendered content Use the shingling algorithm to cluster near- duplicate content together Associate each spam campaign with the list {(IP i, t i )} of IP events consisting of the IP address IP i and sending time t i

10 Skipping Spam from Non-bots Exclude an if the sender IP address is on the white list Remove campaigns whose senders are all within a single C-subnet Removes campaigns with senders from less than three geographic locations (cities)

11 Assessing IP Dynamics Assume that IP address reassignment is a Poisson process Measure two IP address reassignment parameters in each C-subnet (via MSN) The average lifetime J t of an IP address on a particular host The maximum distance J r between IP addresses assigned to the same host

12 Assessing IP Dynamics Rule of Aggregation Among all IP address in the same C-subset Given (IP 1, t 1 ) and (IP 2, t 2 ) Either IP 1 or IP 2 is out of the distance range (J r ) of another, we regard these two events as from two different machines If both IP 1 and IP 2 are within the distance range (Jr) of each other Keep the same IP address after an interval of duration t2 - t1 An IP reassignment happens during an interval of duration t2 - t1

13 Assessing IP Dynamics

14 Identifying Botnets Given two spam campaigns SC 1 and SC 2, how do we know whether they share the same controller For all events in a spam campaign SC 1, we use to measure the fraction of events in SC 1 that are connected to some events in SC 2, where i and j represents IP events in SC 1 and SC 2. W, called as connectivity degree, ranges from 0 to 1 Select 0.2 as a reasonable threshold

15 Identifying Botnets

16 Estimating Botnet Size Assumption: Each bot sends approximately equal number of spam messages Some quantities in hand r: downsample rate of the dataset N: number of spam messages observed N 1 : number of bots observed with only one spam in the dataset The goal is to estimate s: the mean number of spam messages sent per bot b: number of bots (i.e. botnet size)

17 Estimating Botnet Size The estimated number of spam messages from a botnet is N/r = sb The expected number of bots observed with only one spam message is The average number of spam messages sent per bot (s) and botnet size (b):

18 Metrics and Findings Spam campaign duration Botnet sizes Per-day aspect: life span of botnets and spam campaigns Geographic distribution of botnets

19 Spam campaign duration Spam campaigns duration: the time between the first and the last seen from a campaign Over 50% of spam campaigns actually finish within 12 hours

20 Spam campaign duration Short-lived spam campaigns actually have larger volume More than 70% of spam messages are sent by spam campaigns lasting less than 8 hours

21 Botnet Sizes

22 Botnet Sizes

23 Botnet Sizes

24 Per-day aspect: life span of botnets and spam campaigns 60% of spam received from botnets each day are sent from long-lived botnets

25 Geographic Distribution of Botnets About half of botnets detected from the JMS dataset control machines in over 30 countries The total number of bots during the 9-day observation period of the JMS dataset is about 460,000 machines

26 Conclusion This work is a first step to study botnets from their economic motivations Get a picture of bot activity by directly tracing the actual operation of bots using one of their primary revenue sources (spam ) Make estimating about the size of a botnet, behavioral characteristics, and the geographical distribution of botnets