Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reporter: Jing Chiu Advisor: Yuh-Jye Lee 2015/7/181Data Mining & Machine Learning Lab.

Similar presentations


Presentation on theme: "Reporter: Jing Chiu Advisor: Yuh-Jye Lee 2015/7/181Data Mining & Machine Learning Lab."— Presentation transcript:

1 Reporter: Jing Chiu Advisor: Yuh-Jye Lee Email: D9815013@mail.ntust.edu.tw 2015/7/181Data Mining & Machine Learning Lab

2 Paper Information Authors: Ying Pan School of Information systems, Singapore Management University Xuhua Ding School of Information systems, Singapore Management University Source Annual Computer Security Application Conference 2006 (ACSAC’06) 2015/7/182Data Mining & Machine Learning Lab

3 Outline Introduction Related Work Analysis of Phishing Pages Mechanism Architecture Identity Extractor Page Classifier Feature Vector Generation Experiments Experiments of Identity Extractor Experiments of Page Classifier Conclusion 2015/7/183Data Mining & Machine Learning Lab

4 Introduction A common factor among all phishing sites Maliciously mislead users to believe that they are other legitimate sites Phishing site maliciously claims a false identity Proposed Method Use web DOM object to obtain web identity Use the web identity to capture phishing site anomalies 2015/7/184Data Mining & Machine Learning Lab

5 Related Work Existing anti-phishing schemes Server based schemes Requiring server authentication to defend against phishing attacks Black listing services Browser based schemes Browser regulate web pages’ visual behaviors to prevent cheating Black list plug-in in browser Proactive schemes Detecting phishing pages based on visual similarity Detecting phishing pages by phishing-related activity 2015/7/185Data Mining & Machine Learning Lab

6 Analysis of Phishing Pages Web identity: a set of words which uniquely identify the web site’s ownership in the cyberspace An abbreviation of organization’s full name Unique string appearing in its domain name Phishing web site with its own identity A attempts to claim a false identity B A list of characteristics of phishing pages Based on study of about 300 phishing sites from APWG’s repository List I & List II List I List II 2015/7/186Data Mining & Machine Learning Lab

7 Mechanism Architecture Identity Extractor Page Classifier Feature Vector Generation 2015/7/187Data Mining & Machine Learning Lab

8 Architecture 2015/7/188Data Mining & Machine Learning Lab

9 Identity Extractor Extract identity from DOM objects/properties Title Description Copyright ALT/title Address Body Related DOM objects/properties Extract identity by following steps Form an identity relevant object set D Initiates a word set W from D as identity candidates Use Chi-square to separate identity from ordinary words Identity Extraction Algorithm (I, II)III 2015/7/189Data Mining & Machine Learning Lab

10 Page Classifier Support Vector Machine LibSVM Feature Vector Generation Given the identity set I 10 features are extracted 2015/7/1810Data Mining & Machine Learning Lab

11 Feature Vector Generation Feature 1: URL address F1 = 1 if no identity in URL address F1 = 0 if one page only use IP and can not be resolved into host name F1 = -1 otherwise Feature 2: DNS record F2 = -1 if all identity are substrings of DNS record R F2 = 0 if no record returned F2 = 1 otherwise 2015/7/1811Data Mining & Machine Learning Lab

12 Feature Vector Generation (cont.) Feature 3.1-3.3: URL of anchor F31: Nil anchor (point to nothing) F23: ID anchor (point to another domain contains identity) F33: Domain anchor (point to a foreign domain) 2015/7/1812Data Mining & Machine Learning Lab

13 Feature Vector Generation (cont.) Feature 4: Server form handler F4 = 1 if any void or foreign form handler exists F4 = 0 if no form F4 = -1 otherwise Feature 5.1-5.2: Request URL F51: ID Request URL (point to another domain contains identity) F52: Domain request URL (point to a foreign domain) 2015/7/1813Data Mining & Machine Learning Lab

14 Feature Vector Generation (cont.) Feature 6: Domain in cookie F6 = 1 if any foreign domain exists in cookie F6 = 0 if no domain in cookies of no cookies F6 = -1 otherwise Feature 7: Certificate in SSL F7 = 1 if one of the claimed identities does not appear in the certificate or URL specified in the certificate is different from L F7 = 0 if the SSL is not applied F7 = -1 otherwise 2015/7/1814Data Mining & Machine Learning Lab

15 Experiments Dataset 279 Phishing pages vs. 100 official pages 279 attacks only have 49 different targets Experiments of Identity Extractor Three web pages results Success rate Experiments of Page Classifier Dataset Training set size: 50 positive + 50 negative Testing set size: 50 pages Positive portions: 2%, 6%, 10%, 20%, 30%, 40%, 50% Use FP rate and miss rate (FN rate) as measurement 2015/7/1815Data Mining & Machine Learning Lab

16 Exp. of Identity Extractor Identity Extraction Results of Three Web Pages Success Rate(λ) of the Identity Extractor N is total number n is correct number 2015/7/1816Data Mining & Machine Learning Lab

17 Exp. of Page Classifier 2015/7/1817Data Mining & Machine Learning Lab

18 Exp. of Page Classifier (cont.) 2015/7/1818Data Mining & Machine Learning Lab

19 Conclusion The benefits Need not requires online interactions with a third party Also need not users to change their navigation behavior Resistant to adaptive phishing attackers Complete evasion of this scheme tolls attacker a high cost 2015/7/1819Data Mining & Machine Learning Lab

20 Characteristics of Phishing Pages I Disguised Keyword/Description Phishing page will use the fake identity to pretend a normal site Abnormal URL The hostname in URL or revolved from the IP does not match the claimed identity Abnormal DNS record DNS usually contains identity information Abnormal Anchors Domains of anchors’ URL are different from the page’s domain and these domains contain the claimed identity Anchors do not link to any page 2015/7/1820Data Mining & Machine Learning Lab

21 Characteristics of Phishing Pages II Abnormal Server Form Handler No action of the form or the action handled by a server in different domain Abnormal request URL Phishing site usually has objects referenced to real site Abnormal cookie Phishing sites’ cookie either point to its domain (inconsistent of claimed identity) or point to the real site (inconsistent with its own domain) Abnormal certificate in SSL The Distinguished Names in the certificates are inconsistent with the claimed identities 2015/7/1821Data Mining & Machine Learning Lab

22 Identity Extraction Algorithm Input: Web page P; Output: Identity set I Construction of object set D From the related DOM objects/properties Construction of word set W Tokenization by stop marks, remove stop words and stemming Remove all stop words object d from D Calculation of the occurrences C w,d Supplement of body object Calculation of term frequency 2015/7/1822Data Mining & Machine Learning Lab

23 Identity Extraction Algorithm (cont.) Calculation of expected probability Where Calculation of χ 2 value Output an identity set with the largest χ 2 value 2015/7/1823Data Mining & Machine Learning Lab

24 Related DOM objects/properties 2015/7/18Data Mining & Machine Learning Lab24


Download ppt "Reporter: Jing Chiu Advisor: Yuh-Jye Lee 2015/7/181Data Mining & Machine Learning Lab."

Similar presentations


Ads by Google