Presentation is loading. Please wait.

Presentation is loading. Please wait.

N-Gram-based Dynamic Web Page Defacement Validation Woonyon Kim Aug. 23, 2004 NSRI, Korea.

Similar presentations


Presentation on theme: "N-Gram-based Dynamic Web Page Defacement Validation Woonyon Kim Aug. 23, 2004 NSRI, Korea."— Presentation transcript:

1 N-Gram-based Dynamic Web Page Defacement Validation Woonyon Kim Aug. 23, 2004 NSRI, Korea

2 Contents Introduction Related Works N-Gram Frequency Index N-Gram-based Index Distance Experiments Conclusions

3 Introduction Defacement of Web Sites  CSI/FBI 2001  38 % of web sites were hacked.  21% of hacked sites were not aware of their own defacements.  Zone-h  The defaced web pages are rapidly increased year by year. (.kr domain : about 200% increase) Current solutions  Hash-based detection system for minimizing damage  Intrusion-tolerant system for contiguous service Problems of current solutions  Current solutions use hash code as validation metric. Hash code can ’ t support dynamic characteristics.

4 Introduction N-Gram-based Index Distance (NGID)  A validation metric of dynamically changing web pages  The sum of absolute differences of frequency probability of N-Grams that can be found from both indexes.  NGID represents the similarity of two web pages.  NGID can be used to validate web pages with dynamic components or static.

5 Related Works Hash-based validation system  Detecting web page defacements by comparing two hash codes  Hash code is useful metric for large and static web pages.  Hash code can ’ t work properly on the dynamically changing web pages. Intrusion-tolerant system  Hash code is used to validate web pages.  It also has limitation on dynamic web pages.

6 N-Gram Frequency Index (1) N-Gram  An N-character slice of a string  For example “ TEXT ”  2-Gram : TE, EX, XT N-Gram Frequency Index  An index file that is sorted from the most frequent N- Grams to the least frequent ones  It cuts off N-Grams below at a particular rank. So, minor changes are ignored. And this feature of N-Gram Frequency Index supports dynamics.

7 N-Gram Frequency Index (2) How to generate  Count all N-Grams frequencies in a web page.  Sort N-Grams from the most frequent to the least.  Cut off N-Grams below at a particular rank.  Sum up the frequencies of the remained N-Grams.  Compute the probability of each N-Gram frequency.  Save the N-Grams, frequency of the N-Grams, the probability of N-Grams into an index file.

8 N-Gram-based Index Distance(NGID) The sum of absolute difference of frequency probability of same N-Grams that can be found from both web pages. A metric for detecting whether a web page is defaced or not.

9 N-Gram-based Index Distance Evaluation is done by comparing NGID to validation threshold Evaluation  Valid : NGID <= Validation Threshold  Invalid : NGID > Validation Threshold

10 Experiments Assumptions  Select 100 web pages  Choose 0.1 for Validation Threshold of NGID. Procedure for false positive  Connect to a selected web page at a time in remote place.  Download a page and save it a file.  Validate it using NGID.  Validate it using Hash Code.  Above four steps are recursively applied.  Every 30-minute in a day News PaperBroadcastPortalPublicTotal 38151433100

11 Experiments False Positive News Paper Broadc ast PortalPublicTotal No. of Web Sites38151433100 No. of False Positive (MD5) 291412863 No. of False Positive (NGID) 11002

12 Experiments False Positive

13 Experiments NGID value as time flows The time of contents update 1 2

14 Experiments Procedure for false negative  Collecting 50 web pages that are normal pages and hacked pages from zone-h.  Validate it using NGID.  Validate it using Hash Code. Result of Hash code  50-web pages are detected to be defaced.  The number of false negative is 0.

15 Experiments False Negative

16 Conclusions N-Gram-based Index Distance  A metric to evaluate dynamic web page defacement.  NGID can validate dynamically changing web pages. Future Works  Need a learning model to resolve a validation threshold of each web page.  Need a feedback mechanism of normal index.


Download ppt "N-Gram-based Dynamic Web Page Defacement Validation Woonyon Kim Aug. 23, 2004 NSRI, Korea."

Similar presentations


Ads by Google