Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Framework for Detection and Measurement of Phishing Attacks Reporter: Li, Fong Ruei National Taiwan University of Science and Technology 2/25/2016 Slide.

Similar presentations


Presentation on theme: "A Framework for Detection and Measurement of Phishing Attacks Reporter: Li, Fong Ruei National Taiwan University of Science and Technology 2/25/2016 Slide."— Presentation transcript:

1 A Framework for Detection and Measurement of Phishing Attacks Reporter: Li, Fong Ruei National Taiwan University of Science and Technology 2/25/2016 Slide 1 (of 35)

2 Machine Learning and Bioinformatics Laboratory Reference  Workshop On Rapid Malcode Proceedings of the 2007 ACM workshop on Recurring malcode Alexandria, Virginia, USA  SESSION: Threats  Pages: 1 - 8  Year of Publication: 2007  ISBN:978-1-59593-886-2 2/25/2016 Slide 2 (of 35)

3 Machine Learning and Bioinformatics Laboratory Outline  Introduction  Phishing URL Types  Modeling Phishing URLs  Feature Analysis  Training With Features  Analysis and Findings  Conclusion 2/25/2016 Slide 3 (of 35)

4 Machine Learning and Bioinformatics Laboratory INTRODUCTION  Phishing is form of identity theft  social engineering techniques  sophisticated attack vectors  To harvest financial information from unsuspecting consumers.  Often a phisher tries to lure her victim into clicking a URL pointing to a rogue page. 2/25/2016 Slide 4 (of 35)

5 Machine Learning and Bioinformatics Laboratory PHISHING URL TYPES  We examined a black list of phishing URLs maintained by Google  This black list is used to provide phishing protection in Firefox 2/25/2016 Slide 5 (of 35)

6 Machine Learning and Bioinformatics Laboratory PHISHING URL TYPES  The prominent obfuscation techniques are:  Type I: Obfuscating the Host with an IP address  Type II: Obfuscating the Host with another Domain  Type III: Obfuscating with large host names  Type IV: Domain unknown or misspelled 2/25/2016 Slide 6 (of 35)

7 Machine Learning and Bioinformatics Laboratory PHISHING URL TYPES 2/25/2016 Slide 7 (of 35)

8 Machine Learning and Bioinformatics Laboratory MODELING PHISHING URLS  Using logistic regression classifier  For training the model training black list and white list as follows  We use 1245 URLs from this list as our training black list  We used a list of the top 1000 most popular URLs as the basis of our training white list set 2/25/2016 Slide 8 (of 35)

9 Machine Learning and Bioinformatics Laboratory MODELING PHISHING URLS  Feature Analysis  We categorize our features into four groups:  Page Based  Domain Based  Type Based  Word Based 2/25/2016 Slide 9 (of 35)

10 Machine Learning and Bioinformatics Laboratory MODELING PHISHING URLS  Page Based :  a numeric value on a scale of [0,1]  relative importance of a page within a set of web pages 2/25/2016 Slide 10 (of 35)

11 Machine Learning and Bioinformatics Laboratory MODELING PHISHING URLS Page Based : 2/25/2016 Slide 11 (of 35)  Page Rank distribution for the white list and black list URLs hostname

12 Machine Learning and Bioinformatics Laboratory MODELING PHISHING URLS  Domain Based  This category contains only one feature:  whether or not the URL’s domain name can be found in the White Domain Table. 2/25/2016 Slide 12 (of 35)

13 Machine Learning and Bioinformatics Laboratory 2/25/2016 Slide 13 (of 35) MODELING PHISHING URLS Domain Based  51.2% of the white list URLs were present in the table  0.2% of the black list URLs were found in this table.

14 Machine Learning and Bioinformatics Laboratory MODELING PHISHING URLS  Type Based  Type I URL  Almost all non-phishing (white list) URLs in our training data do not contain host obfuscation  A significant portion of the phishing URLs are host obfuscated with an IP address.  Type II URL  portion of the black list URLs are Type II URLs. 2/25/2016 Slide 14 (of 35)

15 Machine Learning and Bioinformatics Laboratory MODELING PHISHING URLS Type Based 2/25/2016 Slide 15 (of 35)  Distribution of Type I and Type II URLs in the training data

16 Machine Learning and Bioinformatics Laboratory MODELING PHISHING URLS  Type Based  Type III URL  we determine the number of characters present after an organization in the hostname 2/25/2016 Slide 16 (of 35)

17 Machine Learning and Bioinformatics Laboratory MODELING PHISHING URLS Type Based  non-phishing URL  http://by124fd.bay124.hotmail.msn.com/cgi- bin/getmsg  0 characters after msn.com & before the path separator  the maximum number noticed in a white list URL are 14 characters  Type III phishing URLs  7.34 characters (on average) after the target before the path separator  a maximum of 63 characters 2/25/2016 Slide 17 (of 35)

18 Machine Learning and Bioinformatics Laboratory MODELING PHISHING URLS  Word Based Features  Phishing URLs are found to contain several suggestive word tokens  login and signin are very often found in a phishing URL  We discarded all tokens with length < 5  containe several common URL parts such as http://, and www.  We discarded organization name tokens  We further removed query parameters 2/25/2016 Slide 18 (of 35)

19 Machine Learning and Bioinformatics Laboratory MODELING PHISHING URLS 2/25/2016 Slide 19 (of 35)  Distribution of these features in our training set

20 Machine Learning and Bioinformatics Laboratory MODELING PHISHING URLS  Training With Features  Our labeled data consisted of 2508 URLs  1245 were phishing URLs  1263 were benign URLs  Phishing URLs were placed under the positive (true) class  non-phishing ones were under the negative (false) class  66% of URLs were used for training and the remaining 34% were used as the test set 2/25/2016 Slide 20 (of 35)

21 Machine Learning and Bioinformatics Laboratory MODELING PHISHING URLS  To indicate the relative strength of each feature in identifying a Phishing URL we report the corresponding odds ratios, ecoefficient 2/25/2016 Slide 21 (of 35)

22 Machine Learning and Bioinformatics Laboratory MODELING PHISHING URLS 2/25/2016 Slide 22 (of 35)

23 Machine Learning and Bioinformatics Laboratory MODELING PHISHING URLS  Evaluation Result  We evaluated the trained model on the 34% test set split.  We performed our evaluation over multiple runs with randomized partitioning.  This evaluation gave us an average accuracy of 97.31% with  True Positive Rate of 95.8 %  False Positive Rate of 1.2%. 2/25/2016 Slide 23 (of 35)

24 Machine Learning and Bioinformatics Laboratory ANALYSIS AND FINDINGS  We collected several million URLs from August 20th to August 31 2006  The data consisted of two main components, unique URLs  which are visited each day  consecutive look up requests to these URLs 2/25/2016 Slide 24 (of 35)

25 Machine Learning and Bioinformatics Laboratory ANALYSIS AND FINDINGS Average Phishing URLs per day.  The average number of phishing URLs which have been visited from Google’s toolbar in a day.  we find that on average there are  777 URL phishing attacks in a day  5073 viewers to a phishing page 2/25/2016 Slide 25 (of 35)

26 Machine Learning and Bioinformatics Laboratory ANALYSIS AND FINDINGS Average Phishing URLs per day. 2/25/2016 Slide 26 (of 35)  the distribution of phishing attacks on each day of our study.

27 Machine Learning and Bioinformatics Laboratory ANALYSIS AND FINDINGS Average Phishing URLs per day. 2/25/2016 Slide 27 (of 35)

28 Machine Learning and Bioinformatics Laboratory ANALYSIS AND FINDINGS Average Phishing URLs per day. 2/25/2016 Slide 28 (of 35)

29 Machine Learning and Bioinformatics Laboratory ANALYSIS AND FINDINGS Average Potential Phishing Victims per day.  Determine how many users interact with a phishing page  A user that has any interaction at a site classified as phishing is regarded as a potential phishing victim. 2/25/2016 Slide 29 (of 35)

30 Machine Learning and Bioinformatics Laboratory ANALYSIS AND FINDINGS Average Potential Phishing Victims per day.  Based on the number of users who view phishing pages in a day, we further can infer Potential Success Rate of a phisher as follows: 2/25/2016 Slide 30 (of 35)

31 Machine Learning and Bioinformatics Laboratory ANALYSIS AND FINDINGS Average Potential Phishing Victims per day. 2/25/2016 Slide 31 (of 35)  the distribution of phishing attacks on each day of our study.

32 Machine Learning and Bioinformatics Laboratory ANALYSIS AND FINDINGS Distribution of Phishing by Organization 2/25/2016 Slide 32 (of 35)

33 Machine Learning and Bioinformatics Laboratory 2/25/2016 Slide 33 (of 35) ANALYSIS AND FINDINGS Geographical Distribution of Phishing.  To determine country that hosts a particular phishing URL, we used Google’s IP to Geo-Location infrastructure.

34 Machine Learning and Bioinformatics Laboratory Anti-Phishing Tools 2/25/2016 Slide 34 (of 35)

35 Machine Learning and Bioinformatics Laboratory CONCLUSION  We use our features in a logistic regression classifier that achieves a very high accuracy.  One of the major contributions of this work is a large scale measurement study conducted on Google Toolbar URLs  On average we found around 777 unique phishing pages per day and on average 8.24% of the number users who view phishing pages are potential phishing victims 2/25/2016 Slide 35 (of 35)


Download ppt "A Framework for Detection and Measurement of Phishing Attacks Reporter: Li, Fong Ruei National Taiwan University of Science and Technology 2/25/2016 Slide."

Similar presentations


Ads by Google