A Framework for Detection and Measurement of Phishing Attacks Reporter: Li, Fong Ruei National Taiwan University of Science and Technology 2/25/2016 Slide.

Slides:



Advertisements
Similar presentations
PhishZoo: Detecting Phishing Websites By Looking at Them
Advertisements

Author: Carlos Pacheco, Shuvendu K. Lahiri, Michael D. Ernst, Thomas Ball MIT CSAIL.
Learning to Suggest: A Machine Learning Framework for Ranking Query Suggestions Date: 2013/02/18 Author: Umut Ozertem, Olivier Chapelle, Pinar Donmez,
11 PhishNet: Predictive Blacklisting to detect Phishing Attacks Reporter: Gia-Nan Gao Advisor: Chin-Laung Lei 2010/4/26.
Reporter: Jing Chiu Advisor: Yuh-Jye Lee /7/181Data Mining & Machine Learning Lab.
1 CANTINA : A Content-Based Approach to Detecting Phishing Web Sites WWW Yue Zhang, Jason Hong, and Lorrie Cranor.
Report : 鄭志欣 Advisor: Hsing-Kuo Pao 1 Learning to Detect Phishing s I. Fette, N. Sadeh, and A. Tomasic. Learning to detect phishing s. In Proceedings.
Design and Evaluation of a Real-Time URL Spam Filtering Service
Phishing and Pharming New Identity Theft Threats Presentation by Jason Guthrie.
Video Shot Boundary Detection at RMIT University Timo Volkmer, Saied Tahaghoghi, and Hugh E. Williams School of Computer Science & IT, RMIT University.
Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths.
10/20/2009 Loomi Liao.  The problems  Some anti-phishing solutions  The Web Wallet solutions  The Web Wallet User Interface  User study  Discussion.
Typo-Squatting: a Nuisance or a Threat to Your Traffic? Mishari Almishari.
Prophiler: A fast filter for the large-scale detection of malicious web pages Reporter : 鄭志欣 Advisor: Hsing-Kuo Pao Date : 2011/03/31 1.
Improving web image search results using query-relative classifiers Josip Krapacy Moray Allanyy Jakob Verbeeky Fr´ed´eric Jurieyy.
Presentation by Kathleen Stoeckle All Your iFRAMEs Point to Us 17th USENIX Security Symposium (Security'08), San Jose, CA, 2008 Google Technical Report.
URLDoc: Learning to Detect Malicious URLs using Online Logistic Regression Presented by : Mohammed Nazim Feroz 11/26/2013.
GONE PHISHING ECE 4112 Final Lab Project Group #19 Enid Brown & Linda Larmore.
PhishNet: Predictive Blacklisting to Detect Phishing Attacks Pawan Prakash Manish Kumar Ramana Rao Kompella Minaxi Gupta Purdue University, Indiana University.
PhishScore: Hacking Phishers’ Minds
Visual-Similarity-Based Phishing Detection Eric Medvet, Engin Kirda, Christopher Kruegel SecureComm 2008 Sep.
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
Niels Provos and Panayiotis Mavrommatis Google Google Inc. Moheeb Abu Rajab and Fabian Monrose Johns Hopkins University 17 th USENIX Security Symposium.
Fast Webpage classification using URL features Authors: Min-Yen Kan Hoang and Oanh Nguyen Thi Conference: ICIKM 2005 Reporter: Yi-Ren Yeh.
Discovery of Emergent Malicious Campaigns in Cellular Networks Nathaniel Boggs, Wei Wang, Suhas Mathur, Baris Coskun, Carol Pincock © 2013 AT&T Intellectual.
Active Learning for Class Imbalance Problem
KAIST Web Wallet: Preventing Phishing Attacks by Revealing User Intentions Min Wu, Robert C. Miller and Greg Little Symposium On Usable Privacy and Security.
Beyond Blacklists: Learning to Detect Malicious Web Sites from Suspicious URLs Justin Ma, Lawrence Saul, Stefan Savage, Geoff Voelker Computer Science.
Network and Systems Security By, Vigya Sharma (2011MCS2564) FaisalAlam(2011MCS2608) DETECTING SPAMMERS ON SOCIAL NETWORKS.
Using Social Networks to Harvest Addresses Reporter: Chia-Yi Lin Advisor: Chun-Ying Huang Mail: 9/14/
JavaScript, Fourth Edition
Reporter: Li, Fong Ruei National Taiwan University of Science and Technology 9/19/2015Slide 1 (of 32)
MIS Week 6 Site:
11 CANTINA: A Content- Based Approach to Detecting Phishing Web Sites Reporter: Gia-Nan Gao Advisor: Chin-Laung Lei 2010/6/7.
Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.
Using Identity Credential Usage Logs to Detect Anomalous Service Accesses Daisuke Mashima Dr. Mustaque Ahamad College of Computing Georgia Institute of.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Cloak and Dagger: Dynamics of Web Search Cloaking David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 左昌國 Seminar.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
11 A Hybrid Phish Detection Approach by Identity Discovery and Keywords Retrieval Reporter: 林佳宜 /10/17.
윤언근 DataMining lab.  The Web has grown exponentially in size but this growth has not been isolated to good-quality pages.  spamming and.
Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.
1 Behind Phishing: An Examination of Phisher Modi Operandi Speaker: Jun-Yi Zheng 2010/05/10.
BY : MUHAMMAD KHUZAIMI B. ISHAK 4 ADIL PUAN MAZITA INFORMATION AND COMMUNICATION OF TECHNOLOGY.
MIS Week 6 Site:
By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.
Detecting Phishing in s Srikanth Palla Ram Dantu University of North Texas, Denton.
1 Web Servers (Chapter 21 – Pages( ) Outline 21.1 Introduction 21.2 HTTP Request Types 21.3 System Architecture.
Reporter: Jing Chiu Advisor: Yuh-Jye Lee /3/17 1 Data Mining and Machine Learning Lab.
Trends in Circumventing Web-Malware Detection UTSA Moheeb Abu Rajab, Lucas Ballard, Nav Jagpal, Panayiotis Mavrommatis, Daisuke Nojiri, Niels Provos, Ludwig.
Social Tag Prediction Paul Heymann, Daniel Ramage, and Hector Garcia- Molina Stanford University SIGIR 2008.
Search Worms, ACM Workshop on Recurring Malcode (WORM) 2006 N Provos, J McClain, K Wang Dhruv Sharma
Post-Ranking query suggestion by diversifying search Chao Wang.
Bloom Cookies: Web Search Personalization without User Tracking Authors: Nitesh Mor, Oriana Riva, Suman Nath, and John Kubiatowicz Presented by Ben Summers.
Web Browsing *TAKE NOTES*. Millions of people browse the Web every day for research, shopping, job duties and entertainment. Installing a web browser.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
Dec 14, 2014, Harvard University
Automated Experiments on Ad Privacy Settings
BotCatch: A Behavior and Signature Correlated Bot Detection Approach
Intro to Ethical Hacking
A New Phishing Detection Approach
Binghui Wang, Le Zhang, Neil Zhenqiang Gong
GANG: Detecting Fraudulent Users in OSNs
TRANCO: A Research-Oriented Top Sites Ranking Hardened Against Manipulation By Prudhvi raju G id:
Presentation transcript:

A Framework for Detection and Measurement of Phishing Attacks Reporter: Li, Fong Ruei National Taiwan University of Science and Technology 2/25/2016 Slide 1 (of 35)

Machine Learning and Bioinformatics Laboratory Reference  Workshop On Rapid Malcode Proceedings of the 2007 ACM workshop on Recurring malcode Alexandria, Virginia, USA  SESSION: Threats  Pages:  Year of Publication: 2007  ISBN: /25/2016 Slide 2 (of 35)

Machine Learning and Bioinformatics Laboratory Outline  Introduction  Phishing URL Types  Modeling Phishing URLs  Feature Analysis  Training With Features  Analysis and Findings  Conclusion 2/25/2016 Slide 3 (of 35)

Machine Learning and Bioinformatics Laboratory INTRODUCTION  Phishing is form of identity theft  social engineering techniques  sophisticated attack vectors  To harvest financial information from unsuspecting consumers.  Often a phisher tries to lure her victim into clicking a URL pointing to a rogue page. 2/25/2016 Slide 4 (of 35)

Machine Learning and Bioinformatics Laboratory PHISHING URL TYPES  We examined a black list of phishing URLs maintained by Google  This black list is used to provide phishing protection in Firefox 2/25/2016 Slide 5 (of 35)

Machine Learning and Bioinformatics Laboratory PHISHING URL TYPES  The prominent obfuscation techniques are:  Type I: Obfuscating the Host with an IP address  Type II: Obfuscating the Host with another Domain  Type III: Obfuscating with large host names  Type IV: Domain unknown or misspelled 2/25/2016 Slide 6 (of 35)

Machine Learning and Bioinformatics Laboratory PHISHING URL TYPES 2/25/2016 Slide 7 (of 35)

Machine Learning and Bioinformatics Laboratory MODELING PHISHING URLS  Using logistic regression classifier  For training the model training black list and white list as follows  We use 1245 URLs from this list as our training black list  We used a list of the top 1000 most popular URLs as the basis of our training white list set 2/25/2016 Slide 8 (of 35)

Machine Learning and Bioinformatics Laboratory MODELING PHISHING URLS  Feature Analysis  We categorize our features into four groups:  Page Based  Domain Based  Type Based  Word Based 2/25/2016 Slide 9 (of 35)

Machine Learning and Bioinformatics Laboratory MODELING PHISHING URLS  Page Based :  a numeric value on a scale of [0,1]  relative importance of a page within a set of web pages 2/25/2016 Slide 10 (of 35)

Machine Learning and Bioinformatics Laboratory MODELING PHISHING URLS Page Based : 2/25/2016 Slide 11 (of 35)  Page Rank distribution for the white list and black list URLs hostname

Machine Learning and Bioinformatics Laboratory MODELING PHISHING URLS  Domain Based  This category contains only one feature:  whether or not the URL’s domain name can be found in the White Domain Table. 2/25/2016 Slide 12 (of 35)

Machine Learning and Bioinformatics Laboratory 2/25/2016 Slide 13 (of 35) MODELING PHISHING URLS Domain Based  51.2% of the white list URLs were present in the table  0.2% of the black list URLs were found in this table.

Machine Learning and Bioinformatics Laboratory MODELING PHISHING URLS  Type Based  Type I URL  Almost all non-phishing (white list) URLs in our training data do not contain host obfuscation  A significant portion of the phishing URLs are host obfuscated with an IP address.  Type II URL  portion of the black list URLs are Type II URLs. 2/25/2016 Slide 14 (of 35)

Machine Learning and Bioinformatics Laboratory MODELING PHISHING URLS Type Based 2/25/2016 Slide 15 (of 35)  Distribution of Type I and Type II URLs in the training data

Machine Learning and Bioinformatics Laboratory MODELING PHISHING URLS  Type Based  Type III URL  we determine the number of characters present after an organization in the hostname 2/25/2016 Slide 16 (of 35)

Machine Learning and Bioinformatics Laboratory MODELING PHISHING URLS Type Based  non-phishing URL  bin/getmsg  0 characters after msn.com & before the path separator  the maximum number noticed in a white list URL are 14 characters  Type III phishing URLs  7.34 characters (on average) after the target before the path separator  a maximum of 63 characters 2/25/2016 Slide 17 (of 35)

Machine Learning and Bioinformatics Laboratory MODELING PHISHING URLS  Word Based Features  Phishing URLs are found to contain several suggestive word tokens  login and signin are very often found in a phishing URL  We discarded all tokens with length < 5  containe several common URL parts such as and www.  We discarded organization name tokens  We further removed query parameters 2/25/2016 Slide 18 (of 35)

Machine Learning and Bioinformatics Laboratory MODELING PHISHING URLS 2/25/2016 Slide 19 (of 35)  Distribution of these features in our training set

Machine Learning and Bioinformatics Laboratory MODELING PHISHING URLS  Training With Features  Our labeled data consisted of 2508 URLs  1245 were phishing URLs  1263 were benign URLs  Phishing URLs were placed under the positive (true) class  non-phishing ones were under the negative (false) class  66% of URLs were used for training and the remaining 34% were used as the test set 2/25/2016 Slide 20 (of 35)

Machine Learning and Bioinformatics Laboratory MODELING PHISHING URLS  To indicate the relative strength of each feature in identifying a Phishing URL we report the corresponding odds ratios, ecoefficient 2/25/2016 Slide 21 (of 35)

Machine Learning and Bioinformatics Laboratory MODELING PHISHING URLS 2/25/2016 Slide 22 (of 35)

Machine Learning and Bioinformatics Laboratory MODELING PHISHING URLS  Evaluation Result  We evaluated the trained model on the 34% test set split.  We performed our evaluation over multiple runs with randomized partitioning.  This evaluation gave us an average accuracy of 97.31% with  True Positive Rate of 95.8 %  False Positive Rate of 1.2%. 2/25/2016 Slide 23 (of 35)

Machine Learning and Bioinformatics Laboratory ANALYSIS AND FINDINGS  We collected several million URLs from August 20th to August  The data consisted of two main components, unique URLs  which are visited each day  consecutive look up requests to these URLs 2/25/2016 Slide 24 (of 35)

Machine Learning and Bioinformatics Laboratory ANALYSIS AND FINDINGS Average Phishing URLs per day.  The average number of phishing URLs which have been visited from Google’s toolbar in a day.  we find that on average there are  777 URL phishing attacks in a day  5073 viewers to a phishing page 2/25/2016 Slide 25 (of 35)

Machine Learning and Bioinformatics Laboratory ANALYSIS AND FINDINGS Average Phishing URLs per day. 2/25/2016 Slide 26 (of 35)  the distribution of phishing attacks on each day of our study.

Machine Learning and Bioinformatics Laboratory ANALYSIS AND FINDINGS Average Phishing URLs per day. 2/25/2016 Slide 27 (of 35)

Machine Learning and Bioinformatics Laboratory ANALYSIS AND FINDINGS Average Phishing URLs per day. 2/25/2016 Slide 28 (of 35)

Machine Learning and Bioinformatics Laboratory ANALYSIS AND FINDINGS Average Potential Phishing Victims per day.  Determine how many users interact with a phishing page  A user that has any interaction at a site classified as phishing is regarded as a potential phishing victim. 2/25/2016 Slide 29 (of 35)

Machine Learning and Bioinformatics Laboratory ANALYSIS AND FINDINGS Average Potential Phishing Victims per day.  Based on the number of users who view phishing pages in a day, we further can infer Potential Success Rate of a phisher as follows: 2/25/2016 Slide 30 (of 35)

Machine Learning and Bioinformatics Laboratory ANALYSIS AND FINDINGS Average Potential Phishing Victims per day. 2/25/2016 Slide 31 (of 35)  the distribution of phishing attacks on each day of our study.

Machine Learning and Bioinformatics Laboratory ANALYSIS AND FINDINGS Distribution of Phishing by Organization 2/25/2016 Slide 32 (of 35)

Machine Learning and Bioinformatics Laboratory 2/25/2016 Slide 33 (of 35) ANALYSIS AND FINDINGS Geographical Distribution of Phishing.  To determine country that hosts a particular phishing URL, we used Google’s IP to Geo-Location infrastructure.

Machine Learning and Bioinformatics Laboratory Anti-Phishing Tools 2/25/2016 Slide 34 (of 35)

Machine Learning and Bioinformatics Laboratory CONCLUSION  We use our features in a logistic regression classifier that achieves a very high accuracy.  One of the major contributions of this work is a large scale measurement study conducted on Google Toolbar URLs  On average we found around 777 unique phishing pages per day and on average 8.24% of the number users who view phishing pages are potential phishing victims 2/25/2016 Slide 35 (of 35)