Presentation is loading. Please wait.

Presentation is loading. Please wait.

PhishDef: URL Names Say It All Anh Le, Athina Markopoulou University of California, Irvine USA Michalis Faloutsos University of California, Riverside USA.

Similar presentations


Presentation on theme: "PhishDef: URL Names Say It All Anh Le, Athina Markopoulou University of California, Irvine USA Michalis Faloutsos University of California, Riverside USA."— Presentation transcript:

1 PhishDef: URL Names Say It All Anh Le, Athina Markopoulou University of California, Irvine USA Michalis Faloutsos University of California, Riverside USA

2 What is Phishing? Anh Le - UC Irvine - PhishDef2 Social engineering and technical means to steal consumers’ personal identity, data, etc. Cause billions of dollars of loss annually

3 Anh Le - UC Irvine - PhishDef3 Antiphishing.org

4 Example of a Phishing Site Anh Le - UC Irvine - PhishDef4

5 Current Protection Anh Le - UC Irvine - PhishDef5 Google Safe Browsing Microsoft Smart Screen Third-Party

6 Current Protection Model Anh Le - UC Irvine - PhishDef6 Motivation: Blacklist-based protection is reactive cannot protect against zero-day phishing Google Safe Browsing

7 Outline oPhishing Background oMotivation oOur proposal oNew Protection Model oLearning Algorithms oDataset oFeature Selection oEvaluation Results oConcluding Remarks Anh Le - UC Irvine - PhishDef7

8 Our Proposed Protection Model Anh Le - UC Irvine - PhishDef8 Main challenges: Accuracy and Classification Latency Which classification algorithm works best? Which set of features works best?

9 Prior Work oWhittaker et al. [NDSS ’10] oGoogle Safe Browsing oMa et al. [SIGKDD ’09] oBatch-based Classification oMa et al. [ICML ‘09] oBatch-based vs. Online Learning Anh Le - UC Irvine - PhishDef9 Server-Side Classification

10 Main Contributions oNew Protection Model: oClient-side classification oPropose using Adaptive Regularization of Weights (AROW) oHigh accuracy oResilient to noise oSet of Lexical Features oFast to extract at client side oObfuscation resistant Anh Le - UC Irvine - PhishDef10

11 Batch-based Support Vector Machine Online Perceptron Confident Weighted (CW) [Dredze et al., ICML 2008] Adaptive Regularization of Weights (AROW) [Crammer et al., NIPS 2009] Machine Learning Algorithms Anh Le - UC Irvine - PhishDef11

12 Online Classification Anh Le - UC Irvine - PhishDef12 Maintaining a weight vector and use it for classification Online Perceptron Trained Beforehand Extract In Real Time Client Side: Server Side:

13 Online Classification Anh Le - UC Irvine - PhishDef13 Confident Weighted (CW) Adaptive Regularization of Weights (AROW) minimum change enough to correct last mistake minimum change penalty for mistake increasing confidence

14 oPhishing URLs oPhishTank (4,082) oMalwarePatrol (2,001) oBenign URLs oOpen directory (4,012) oYahoo directory (4,143) oTime period: June 2010 Dataset Anh Le - UC Irvine - PhishDef14

15 Feature Selection Anh Le - UC Irvine - PhishDef15 oLexical Features oExternal Features oCountry, AS number, registration date, registrant, registrar, etc.

16 Outline oPhishing Background oMotivation oOur proposal oNew Protection Model oLearning Algorithms oDataset oFeature Selection oEvaluation Results oConcluding Remarks Anh Le - UC Irvine - PhishDef16

17 Evaluation Results: Lexical vs. Full Features Lexical features alone are better-suited than full features for client-side phishing classification Anh Le - UC Irvine - PhishDef17 (+) ~ 1% (-) Dependency on Remote Server (-) Avg. Latency: 1.64 s

18 Evaluation Results: CW vs. AROW AROW is more resilient to noise than CW Anh Le - UC Irvine - PhishDef18

19 Conclusion: PhishDef 19Anh Le - UC Irvine - PhishDef oClient-side phishing classification system oProactive, on-the-fly classification of zero-day phishing URLs oLow delay client side (ms), high accuracy (97%) oResilient to noisy data oFuture Work: oDevelop an add-on for Firefox

20 oQuestions Anh Le - UC Irvine - PhishDef20

21 Anh Le - UC Irvine - PhishDef21

22 Example of a Phishing Site 22Anh Le - UC Irvine - PhishDef

23 Evaluation Results: Batch-Based vs. Online Learning Online Learning outperforms Batched-Based Learning for Phishing classification Anh Le - UC Irvine - PhishDef23

24 Chrome 11 > Firefox 4 24Anh Le - UC Irvine - PhishDef


Download ppt "PhishDef: URL Names Say It All Anh Le, Athina Markopoulou University of California, Irvine USA Michalis Faloutsos University of California, Riverside USA."

Similar presentations


Ads by Google