Visual-Similarity-Based Phishing Detection Eric Medvet, Engin Kirda, Christopher Kruegel SecureComm 2008 Sep.
OUTLINE Introduction Our Approach Experimental Evaluation Conclusion And something else
Introduction – Phishing
Introduction – Related Work level solution ◦ Filters and content-analysis Browser-integrated solution ◦ SpoofGuard SpoofGuard ◦ PwdHash ◦ AntiPhish Keeps track of sensitive information ◦ DOMAntiPhish Compared the DOMs of the pages
Introduction – Related Work(cont.) But the most popular and widely- deployed solutions are based on the use of blacklists. ◦ IE 7 browser ◦ Google Safe Browsing ◦ NetCraft tool bar ◦ eBay tool bar ◦..etc
Introduction – Why Phishing Works Why Phishing Works Proc. CHI (2006) Why Phishing Works Proc. CHI (2006) ◦ SMTP does not contain any authentication mechanisms. ◦ About two million users gave information to spoofed websites resulting in direct losses of $1.2 billion (2003) ◦ 23% users base their trust only on page content
Introduction – Why Phishing Works(cont.) APWG detected more than 25,000 unique phishing URLs in Dec “Do-it-yourself” phishing kits are being made available for download free of charge from the internet. More sophisticated phishing attacks. ◦ Application-level vulnerability Application-level vulnerability
Our Approach Base on browser plugin ◦ AntiPhish ◦ DOMAntiPhish Comparing the visual similarity
Our Approach – Signature Extraction Three features ◦ Text pieces Content, color, size, font family, position ◦ Images embedded in the page Src value, area, color, Haar compression, position ◦ Overall visual appearance of the page Color and Haar compressionHaar compression Page signature: S(w) =
Our Approach – Signature comparison Similarity between textual contents: d l (T, Tˆ): Levenshtein distance Similarity betwwen colors: L 1 (C,Cˆ): 1-norm distance
Our Approach – Signature comparison Home banking Welcome! Copyright 2007 t 1 = t 2 = t 3 =
Our Approach – Signature comparison Your banking Welcome! T 1 = T 2 =
Our Approach – Signature comparison t 1 = T 1 =
Our Approach – Signature Similarity score s t : average the largest n elements of the S t Final similarity score: ◦ s = a t s t + a i s i + a o s o ◦ Threshold d Two pages are similar if and only if s ≥ d
Experimental Evaluation Web page dissimilarity level ◦ Level 0: almost perfect visual match ◦ Level 1: some different element ◦ Level 2: noticeable differences Dataset ◦ 41 positive pairs (from PhishTank) ◦ 161 negative pairs (common web pages)
Experimental Evaluation(cont.) Training set ◦ 14 positive paris and 21 negative pairs e k = 0, true positive or true negative |s - d|, otherwise ◦ s = a t s t + a i s i + a o s o a t = 2.11, a i = 0.11, a o = 1.20 Threshold d = 0.956
Experimental Evaluation(cont.)
Figure 2: One of the two missed positive pairs
Experimental Evaluation(cont.) Environment ◦ Dual AMD Opteron 64, 8GB RAM, Linux OS Computation Time ◦ 3.8 sec for positive pairs ◦ A few milliseconds for negative pairs after optimization
Conclusion A comparison technique that eliminates the shortcomings of AntiPhish and DOMAntiPhish Can also be integrated into any other anti-phishing system that can provide a list of legitimate sites
And something else Visual similarity-based phishing detection without victim site information CICS '09. IEEECICS '09. IEEE Visual Similarity between Phished Sites Virtual screen of X window to display a web browser Use ImgSeek to find similar images