Reporter: Jing Chiu Advisor: Yuh-Jye Lee 2011/3/17 1 Data Mining and Machine Learning Lab.

Reporter: Jing Chiu Advisor: Yuh-Jye Lee Email: D9815013@mail.ntust.edu.tw 2011/3/17 1 Data Mining and Machine Learning Lab.

 Authors:  Anh Le, Athina Markopoulou (University of California, Irvine)  Michalis Faloutsos (University of California, Riverside)  Source:  to appear in IEEE INFOCOM 2011 Mini Conference, Shanghai, China, April 10-15, 2011. (poster, tech report) 2011/3/17 2 Data Mining and Machine Learning Lab.

 Introduction  Dataset and Feature Extraction  Classification Algorithms  Evaluation Results  System Deployment  Conclusion 2011/3/17 3 Data Mining and Machine Learning Lab.

 “How well can one detect phishing URLs using only lexical features compared to using full features?”  PhishDef Properties:  High accuracy: 96%-97%  Light-weight: Low latency Imposes a modest overhead  Proactive approach As opposed to reactively relying on blacklist  Resilience to noise 95%-86% accuracy when there is 5%-45% noise 2011/3/17 4 Data Mining and Machine Learning Lab.

 Dataset  Malicious URLs PhishTank MalwarePatrol  Legitimate URLs Yahoo Directory Open Directory (DMOZ)  External Feature Collection  WHOIS  Team Cymru 2011/3/17 5 Data Mining and Machine Learning Lab.

 Feature Extraction  Automatically selected features Delimiters: ‘/’, ’?’, ‘.’, ‘=‘, ‘_’, ‘&’ and ‘-’. Four parts: Domain Name Directory File Name Argument  Obfuscation-resistant lexical features Four different URL obfuscation techniques Five categories of hand-selected lexical features 2011/3/17 6 Data Mining and Machine Learning Lab.

 (I) Obfuscating the host with an IP address  (II) Obfuscating the host with another domain  (III) Obfuscating with large host names  (IV) Domain unknown or misspelled 2011/3/17 7 Data Mining and Machine Learning Lab.

 Features related to the full URL  Length of the URL (Type II)  Number of dots in the URL (Type II)  Blacklisted words (Type IV) confirm, account, banking, secure, ebayisapi, webscr, login and signin Paypal, free, lucky and bonus  Features related to the domain name  Length of the domain name (Type III)  IP or port number is used in the domain name (Type I)  Number of tokens of the domain name (Type III)  Number of hyphens used in the domain name (Type III)  The length of the longest token (Type III)  Features related to the directory  Length of the directory (Type II)  Number of sub-directory tokens (Type II)  Length of the longest sub-directory token (Type II)  Maximum number of dots and other delimiters used in a sub-directory token (Type II) 2011/3/17 Data Mining and Machine Learning Lab. 8

 Features related to the file name  Length of the file name (Type II)  Number of dots and other delimiters used in the file name (Type II)  Features related to the argument part  Length of the argument part  Number of variables  Length of the longest variable value  The maximum number of delimiters used in a value  Summary of dataset Summary of dataset 2011/3/17 Data Mining and Machine Learning Lab. 9

 Batch Learning  Support Vector Machine (SVM)  Online Learning  Online Perception (OP)  Confidence Weighted (CW)  Adaptive Regularization of Weights (AROW) 2011/3/17 Data Mining and Machine Learning Lab. 10

 Batch-based vs. Online algorithms  SVM vs. AROW  Yahoo-Phish 2011/3/17 Data Mining and Machine Learning Lab. 11

 Lexical Features vs. Full Features  OP, CW and AROW  Yahoo-Phish 2011/3/17 Data Mining and Machine Learning Lab. 12

 Obfuscation-Resistant Lexical Features  Performance of AROW with/without OR features after the last URL 2011/3/17 Data Mining and Machine Learning Lab. 13

 The resilience of AROW to noisy data  AROW and CW  Yahoo-Phish 2011/3/17 Data Mining and Machine Learning Lab. 14

 Minimum/Maximum URL Similarity Distance distribution 2011/3/17 Data Mining and Machine Learning Lab. 15

2011/3/17 Data Mining and Machine Learning Lab. 16

2011/3/17 Data Mining and Machine Learning Lab. 17  Proposed PhishDef – a proactive defense scheme of phishing attacks  PhishDef detecting phishing URLs on-the-fly  PhishDef use only lexical features  High accuracy (97%)  Low overhead  Resilient to noisy training data  Firefox and Chrome add-ons implementation

 Q&A? 2011/3/17 Data Mining and Machine Learning Lab. 18

2011/3/17 Data Mining and Machine Learning Lab. 19

Reporter: Jing Chiu Advisor: Yuh-Jye Lee 2011/3/17 1 Data Mining and Machine Learning Lab.

Similar presentations

Presentation on theme: "Reporter: Jing Chiu Advisor: Yuh-Jye Lee 2011/3/17 1 Data Mining and Machine Learning Lab."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Reporter: Jing Chiu Advisor: Yuh-Jye Lee 2011/3/17 1 Data Mining and Machine Learning Lab.

Similar presentations

Presentation on theme: "Reporter: Jing Chiu Advisor: Yuh-Jye Lee 2011/3/17 1 Data Mining and Machine Learning Lab."— Presentation transcript:

Similar presentations

About project

Feedback