Presentation is loading. Please wait.

Presentation is loading. Please wait.

Phishing Webpage Detection Jau-Yuan Chen COMS E6125 WHIM March 24, 2009.

Similar presentations


Presentation on theme: "Phishing Webpage Detection Jau-Yuan Chen COMS E6125 WHIM March 24, 2009."— Presentation transcript:

1 Phishing Webpage Detection Jau-Yuan Chen COMS E6125 WHIM March 24, 2009

2 Source: "Phishing Activity Trends Report," APWG, December 2008 APWG: Anti-Phishing Working Group (Definition) – Phishing is a criminal mechanism employing both social engineer- ing and technical subterfuge to steal consumers’ personal identity data and financial account credentials. – Social‐engineering schemes use spoofed e‐mails purporting to be from legitimate businesses and agencies to lead consumers to counterfeit websites designed to trick recipients into divulging financial data such as usernames and passwords. – Technical‐subterfuge schemes plant crimeware onto PCs to steal credentials directly, often using systems to intercept consumers online account user names and passwords ‐ and to corrupt local navigational infrastructures to misdirect consumers to counterfeit websites (or authentic websites through phisher‐controlled proxies used to monitor and intercept consumers’ keystrokes). October 21, 20152 What is “Phishing”?

3 The number of crimeware-spreading sites infecting PCs with password-stealing crimeware reached an all time high of 31,173 in December, 2008. Unique phishing reports submitted to APWG recorded a yearly high of 34,758 in December, 2008. in 2007 (a survey by Gartner, Inc.) – more than $3.2 billion was lost to phishing attacks in the US – 3.6 million adults lost money in phishing attacks October 21, 20153 Severity of the “Phishing” Problem

4 WHY PHISHING PAGE DETECTION? October 21, 20154

5 5 eBay? It’s difficult to distinguish these pages!

6 October 21, 20156 Most Targeted Industry

7 text-based page analysis – URL analysis – HTML parsing – keyword extraction however, phishers can easily avoid detection by using non-html components, such as – images, – Flash, – ActiveX, etc. October 21, 20157 Current Anti-phishing Solutions

8 Image-based Anti-phishing Scheme Image-based Anti-phishing Scheme focus on "what you see", not "how the page is composed"! J.-Y. Chen, and K.-T. Chen, “A Robust Local Feature-based Scheme for Phishing Page Detection and Discrimination,” Web 2.0 Trust 2008. K.-T. Chen, J.-Y. Chen, C.-R. Huang, and C.-S. Chen, “Fighting Phishing with Discriminative Keypoint Features of Webpages,” IEEE Internet Computing, to appear. October 21, 20158

9 9 Page Matching Image-based Page Matching Page Scoring Page Classification

10 October 21, 201510 effective grids a successful match Page Scoring Image-based Page Matching Page Scoring Page Classification

11 naïve Bayesian classifier with 10-fold cross-validation training data – a pre-stored phishing page set & a legitimate page set – phishing page set (positive data set) comparisons between phishing pages and their target pages – legitimate page set (negative data set) comparisons between legitimate pages of different sites October 21, 201511 Page Classification Image-based Page Matching Page Scoring Page Classification

12 PERFORMANCE EVALUATION October 21, 201512

13 phishing pages: 2,058 pages on 74 sites – source: http://www.phishtank.com, http://www.antiphishing.orghttp://www.phishtank.comhttp://www.antiphishing.org – records of top 5 phishing target sites are more than half of our records potential target pages: 300 vulnerable pages – source: http://www.ciphertrust.com/resources/statistics/http://www.ciphertrust.com/resources/statistics/ pre-stored data set – positive: 2,058 comparisons – negative: 44,000 comparisons October 21, 201513 Data description DomainNumber of Records eBay701 PayPal632 Marshall & Ilsley138 Charter One116 Bank of America51

14 Fu et al., IEEE Trans. on Dependable & Secure Computing, 2006 the 1 st image-based phishing detecting approach to evaluate the distance between two signatures Signature (S) – the frequency and the centroid of each color used Weight (p, q) – a linear combination of the Euclidian distance and the centroids of colors Visual similarity degree (VSD) – VSD = 1 – (EMD) α pros: simple and fast cons: only suitable for basic phishing cases – it tends to fail if phishing pages and the official ones are partially similar – however, phishing pages are usually partially different from their targets! October 21, 201514 Earth Mover’s Distance (EMD) based Scheme

15 CCH settings – levels to describe salient points (L) = 4 – Euclidean distance between two salient points (Dist) = 7 pixels – input image size: original webpage resolution (mostly 800 × 600) – k-means parameter (k) = 4 – naïve Bayesian classifier EMD settings – we follow the suggestion in Fu et al.'s previous work – input image size: 100 × 100 (Lanczos3 resampling algorithm) – color degrading factor (CDF): 32 – amplifier for the EMD value (α): 0.5 – the # of colors used for the signature (|S s |): 20 – the weight for the color distance (p): 0.5 – the weight for the color centroid distance (q): 0.5 – naïve Bayesian classifier is used instead of per-page threshold October 21, 201515 Parameter Settings

16 Top 5 Phishing Target Sites – AUC CCH: 0.998 EMD: 0.956 October 21, 201516

17 Impact of Image Size on Computation Time October 21, 201517 !!

18 We proposed an image-based phishing detection technique with local features. Our experimental results show that we have – an over 96% successful phishing recognition rate, and – less than 0.30 second per phishing identification on average. Our experiments show that local features are more suitable than global information for phishing page detection. October 21, 201518 Conclusions

19 THANK YOU!


Download ppt "Phishing Webpage Detection Jau-Yuan Chen COMS E6125 WHIM March 24, 2009."

Similar presentations


Ads by Google