Bayesian Bot Detection Based on DNS Traffic Similarity Ricardo Villamarín-Salomón, José Carlos Brustoloni Department of Computer Science University of.

Slides:



Advertisements
Similar presentations
Analyzing DNS Activities of Bot Processes Dr. Jose Andre Morales Areej Al-Bataineh Dr. Shouhuai Xu Dr.Ravi Sandhu 4th International Conference on Malicious.
Advertisements

Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.
A Survey of Botnet Size Measurement PRESENTED: KAI-HSIANG YANG ( 楊凱翔 ) DATE: 2013/11/04 1/24.
Detecting Malicious Flux Service Networks through Passive Analysis of Recursive DNS Traces Roberto Perdisci, Igino Corona, David Dagon, Wenke Lee ACSAC.
An Introduction of Botnet Detection – Part 2 Guofei Gu, Wenke Lee (Georiga Tech)
RB-Seeker: Auto-detection of Redirection Botnet Presenter: Yi-Ren Yeh Authors: Xin Hu, Matthew Knysz, Kang G. Shin NDSS 2009 The slides is modified from.
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
MOSQUITO BREEDING ATTACK: Spread of bots using Peer To Peer INSTRUCTOR: Dr.Cliff Zou PRESENTED BY : BHARAT SOUNDARARAJAN & AMIT SHRIVATSAVA.
BotMiner Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee College of Computing, Georgia Institute of Technology.
Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering On-line Alert Systems for Production Plants A Conflict Based Approach.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 9: Hypothesis Tests for Means: One Sample.
Understanding the Network-Level Behavior of Spammers Mike Delahunty Bryan Lutz Kimberly Peng Kevin Kazmierski John Thykattil By Anirudh Ramachandran and.
Detecting Botnets Using Hidden Markov Models on Network Traces Wade Gobel Bio-Grid, Summer 2008.
School of Computer Science and Information Systems
7-2 Estimating a Population Proportion
Threat infrastructure: proxies, botnets, fast-flux
BotFinder: Finding Bots in Network Traffic Without Deep Packet Inspection F. Tegeler, X. Fu (U Goe), G. Vigna, C. Kruegel (UCSB)
Can DNS Blacklists Keep Up With Bots? Anirudh Ramachandran, David Dagon, and Nick Feamster College of Computing, Georgia Tech.
11 Active Botnet Probing to Identify Obscure Command and Control Channels G Gu, V Yegneswaran, P Porras, J Stoll, and W Lee - on Annual Computer Security.
Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee College of Computing, Georgia Institute of Technology USENIX Security '08 Presented by Lei Wu.
Automated malware classification based on network behavior
Presentation by Kathleen Stoeckle All Your iFRAMEs Point to Us 17th USENIX Security Symposium (Security'08), San Jose, CA, 2008 Google Technical Report.
PhishNet: Predictive Blacklisting to Detect Phishing Attacks Pawan Prakash Manish Kumar Ramana Rao Kompella Minaxi Gupta Purdue University, Indiana University.
B OTNETS T HREATS A ND B OTNETS DETECTION Mona Aldakheel
An Evaluation model of botnet based on peer to peer Gao Jian KangFeng ZHENG,YiXian Yang,XinXin Niu 2012 Fourth International Conference on Computational.
 Collection of connected programs communicating with similar programs to perform tasks  Legal  IRC bots to moderate/administer channels  Origin of.
Active Learning for Class Imbalance Problem
BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection Guofei Gu, Roberto Perdisci, Junjie Zhang, and.
Speaker:Chiang Hong-Ren Botnet Detection by Monitoring Group Activities in DNS Traffic.
Behavior-based Spyware Detection By Engin Kirda and Christopher Kruegel Secure Systems Lab Technical University Vienna Greg Banks, Giovanni Vigna, and.
Statistics for Data Miners: Part I (continued) S.T. Balke.
11 Automatic Discovery of Botnet Communities on Large-Scale Communication Networks Wei Lu, Mahbod Tavallaee and Ali A. Ghorbani - in ACM Symposium on InformAtion,
1 Detecting Malicious Flux Service Networks through Passive Analysis of Recursive DNS Traces Speaker: Jun-Yi Zheng 2010/03/29.
FluXOR: Detecting and Monitoring Fast-Flux Service Networks Emanuele Passerini, Roberto Paleari, Lorenzo Martignoni, and Danilo Bruschi 5th international.
2012 4th International Conference on Cyber Conflict C. Czosseck, R. Ottis, K. Ziolkowski (Eds.) 2012 © NATO CCD COE Publications, Tallinn 朱祐呈.
A Multifaceted Approach to Understanding the Botnet Phenomenon Authors : Moheeb Abu Rajab, Jay Zarfoss, Fabian Monrose, Andreas Terzis Computer Science.
Botnet behavior and detection October RONOG Silviu Sofronie – a Head of Forensics.
Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.
Automatically Generating Models for Botnet Detection Presenter: 葉倚任 Authors: Peter Wurzinger, Leyla Bilge, Thorsten Holz, Jan Goebel, Christopher Kruegel,
Not So Fast Flux Networks for Concealing Scam Servers Theodore O. Cochran; James Cannady, Ph.D. Risks and Security of Internet and Systems (CRiSIS), 2010.
Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin In First Workshop on Hot Topics in Understanding Botnets,
Week 71 Hypothesis Testing Suppose that we want to assess the evidence in the observed data, concerning the hypothesis. There are two approaches to assessing.
Speaker:Chiang Hong-Ren Identifying Botnets Using Anomaly Detection Techniques Applied to DNS Traffic.
Studying Spamming Botnets Using Botlab 台灣科技大學資工所 楊馨豪 2009/10/201 Machine Learning And Bioinformatics Laboratory.
Cross-Analysis of Botnet Victims: New Insights and Implication Seungwon Shin, Raymond Lin, Guofei Gu Presented by Bert Huang.
Spamming Botnets: Signatures and Characteristics Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten, and Ivan Osipkov. SIGCOMM, Presented.
Understanding the Network-Level Behavior of Spammers Author: Anirudh Ramachandran, Nick Feamster SIGCOMM ’ 06, September 11-16, 2006, Pisa, Italy Presenter:
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Exploiting Temporal Persistence to Detect Covert Botnet Channels Authors: Frederic Giroire, Jaideep Chandrashekar, Nina Taft… RAID 2009 Reporter: Jing.
Studying Spamming Botnets Using Botlab
Search Worms, ACM Workshop on Recurring Malcode (WORM) 2006 N Provos, J McClain, K Wang Dhruv Sharma
Bradley Cowie Supervised by Barry Irwin Security and Networks Research Group Department of Computer Science Rhodes University DATA CLASSIFICATION FOR CLASSIFIER.
nd Joint Workshop between Security Research Labs in JAPAN and KOREA Polymorphic Worm Detection by Instruction Distribution Kihun Lee HPC Lab., Postech.
© Copyright McGraw-Hill 2004
BotCop: An Online Botnet Traffic Classifier 鍾錫山 Jan. 4, 2010.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Speaker:Chiang Hong-Ren An Investigation and Implementation of Botnet Detection Schemes.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
1 Modeling and Measuring Botnets David Dagon, Wenke Lee Georgia Institute of Technology Cliff C. Zou Univ. of Central Florida Funded by NSF CyberTrust.
Speaker: Hom-Jay Hom Date:2009/10/20 Botnet Research Survey Zhaosheng Zhu. et al July 28-August
Spamming Botnets: Signatures and Characteristics Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Microsoft Research, Silicon Valley Geoff Hulten,
2009/6/221 BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure- Independent Botnet Detection Reporter : Fong-Ruei, Li Machine.
Brett Stone-Gross, Marco Cova, Lorenzo Cavallaro, Bob Gilbert, Martin Szydlowski, Richard Kemmerer, Christopher Kruegel, and Giovanni Vigna Proceedings.
STA248 week 121 Bootstrap Test for Pairs of Means of a Non-Normal Population – small samples Suppose X 1, …, X n are iid from some distribution independent.
Speaker : YUN–KUAN,CHANG Date : 2009/11/17
P-values.
BotCatch: A Behavior and Signature Correlated Bot Detection Approach
INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Test Review: Ch. 7-9
Botnet Detection by Monitoring Group Activities in DNS Traffic
Presentation transcript:

Bayesian Bot Detection Based on DNS Traffic Similarity Ricardo Villamarín-Salomón, José Carlos Brustoloni Department of Computer Science University of Pittsburgh SAC '09, Proceedings of the 2009 ACM symposium on Applied Computing 陳怡寧 1

Outline Introduction Bayesian method Methodology Experimental results Discussion and limitations Conclusion 2

Introduction -- Problem Many botnets have centralized command and control (C&C) servers with fixed IP address or domain names. In such botnets, Bots can be detected by their communication with hosts whose IP address or domain name is that of a known C&C server. To evade detection, botmasters are increasingly obfuscating C&C communication, e.g., using fast-flux or P2P. 3

Introduction -- Goal Hypothesis: – Regardless of obfuscation, commands tend to cause similar activities in bots belonging to a same botnet. – Through which they can be distinguished from other hosts. Assume at least one bot in a botnet is known. Then using the Bayesian approach to find other hosts with similar DNS traffic. 4

5 (1) Query FQDN (2) Ask B1 (3) Query B1 (4) Ask M how to answer (5) Answer B2 (6) Answer B2 B1: Name serversB2: Web servers Normal dns server Normal host (7) HTTP GET (9) Response malicious website (8)GET redirection (10) Download website M: mothership Analyze domains queried

Bayesian method (1/4) 6 B: blacklist (domain name of known C&C server) D I : domain names queried by hosts in H bl (hosts in the blacklist B) D N : domain names queried by hosts in H-H bl HblHuHsq Uninfected hosts Infected hosts but not in H bl

Bayesian method (2/4) 1.Assign a score to every q ∈ Q indicating a probability that a host making it is infected 2.Assign to each host a score that combines the scores of all the queries it made. 7

q j : query Ih i : whether the host hi is infected The probability that a host h i will send query q j Bayesian method (3/4) 8

Assume P(Ih i =1) = o.5 An extreme case – If the only host querying the said domain belongs to H bl, S h (q j ) will be 1 (and 0 if h doesn’t belongs to H bl ) – So we need tune this value… Bayesian method (4/4) 9

Beta distribution is a continuous probability distributions defined on the interval (0, 1) parameterized by two positive shape parameters, α and β. The tuning calculation is based on – Observed DNS traffic – x : the a prior belief that a domain name that was never queried before will be queried by an infected host. Beta distribution (1/2) 10

Beta distribution (2/2) n : the number of trials s : number of successes involving q N qj : the total number of times a query q j has been made during the traffic monitoring period. f = α + β, a constant interpreted as the strength we want to give to x. α = f *x f=1, x = 0.5, N qj = 0, the result will be 0.5 => avoiding extreme value 11

Select indicators Previous studies [14][15] show that robust indicators are obtained by taking the geometric mean of the host’s most extreme S’ h (q) values (closet to 0 and 1). 12 [14] Gary Robinson, “Spam Detection”, [Online] [15] Greg Louis, “Bogofilter Calculations: Comparing Geometric Mean with Fisher’s Method for Combining Probabilities,” [Online]

N(h) and I(h) indicate how likely it is that a host is infected or non-infected, respectively. Combined score definition: Modify C(h) so that we can get a score between 0 and 1 P(h) indicates our degree of belief that a host is infected. Combined score 13

Methodology In this experiments, they use two sets. (1)computers that they know with certainty to be infected. (run variant of the same bot in computer under control to collecting DNS traffic of infected host) (2)hosts they confidently know to be uninfected. In infected host set, we altered traces to let the hosts to be masked (others that are unmodified => unmasked hosts). We apply Bayesian method to the merge traces and observe (1)which uninfected hosts were classified as such (2)which masked hosts were identified as infected, based on non- blacklisted names that both masked and unmasked hosts queried. 14

Blacklist and Bot Specimens Malware sample : MWCollect Blacklist of C&C server : Shadowserver Bot selection – Had the same name in both VirusTotal and Kaspersky antivirus – Contacted same known C&C server – Had distinct MD5 signatures Backdoor.Win32.SdBot.cmz Net-Worm.Win32.Bobic.k 15

DNS Data Collection Uninfected hosts – CSL-1: 89 PCs in instructional laboratories of Pittsburgh university, February 13-14, 2008 – CSL-2: 89 PCs in instructional laboratories of Pittsburgh university, February 14-15, 2008 Infected hosts – sandnet + a DNS server + bot specimens 16

17

Test Traces Altered traces: obfuscation names by appending to them a non-existent ccTLD (.nv) to each blacklisted name. SdBot-V1-1-T : the traces of all infected hosts except SdBot-V1-1 are altered. 18

Evaluation Metrics Recall, or True Positive Rate (TPR) False Positive Rate 19

Experimental Results We wanted to find parameters that could yield good classification results with trace CSL-1-SdBot-T, and then see if these same parameters were effective in trace CSL-2-Bobic.k-T. We set T h =0.95, P(I h )=0.5, and threshold of P(h) to be 0.9. How about T l ? 20

Selecting T l 21

FPR & TPR 22

True Positive TP is caused by the name ad.doubleclick.net which was queried by 0.87% of the uninfected hosts and the only misclassified masked hosts. 23

CSL-2-Bobic.k-T 24

Discussion and Limitation FP occurs: – If the parameters are not well tuned – If a domain name is queried only by an infected hosts and one or a few of the uninfected hosts. FN occurs: – If the parameters are not well tuned – While very popular domain names during a time period are queried by both infected and uninfected hosts. 25

Conclusion Proposed and evaluated a Bayesian method for botnet detection. In this study, we found that the technique successfully recognized C&C servers with multiple domain names, while at the same time generating few or no false positives. 26

Comments The sample size of DNS traffic of infected hosts is too small. Are parameters of Bayesian method really suitable for all kinds of bots? We can use the bots found by M8000 as seeds and collect DNS traffic to find other unspecified infected hosts. 27