Harvesting SSL Certificate Data to Identify Web-Fraud Reporter : 鄭志欣 Advisor : Hsing-Kuo Pao 2010/10/04 1.

Slides:



Advertisements
Similar presentations
PhishZoo: Detecting Phishing Websites By Looking at Them
Advertisements

Using Frankencerts for Automated Adversarial Testing of Certificate Validation in SSL/TLS Implementations Chad Brubaker1 Suman Jana1 Baishakhi Ray2.
Detecting Malicious Flux Service Networks through Passive Analysis of Recursive DNS Traces Roberto Perdisci, Igino Corona, David Dagon, Wenke Lee ACSAC.
The role of Domain Knowledge in a large scale Data Mining Project Kopanas I., Avouris N., Daskalaki S. University of Patras.
Typo-Squatting: a Nuisance or a Threat to Your Traffic? Mishari Almishari.
Certificates Last Updated: Aug 29, A certificate was originally created to bind a subject to the subject’s public key Intended to solve the key.
All Your Contacts Are Belong to Us: Automated Identity Theft Attacks on Social Networks Reporter : 鄭志欣 Advisor: Hsing-Kuo Pao Date : 2010/12/06 1.
SSL & SharePoint IT:Network:Applications. Agenda Secure Socket Layer Encryption 101 SharePoint Customization SharePoint Integration.
1 CANTINA : A Content-Based Approach to Detecting Phishing Web Sites WWW Yue Zhang, Jason Hong, and Lorrie Cranor.
Report : 鄭志欣 Advisor: Hsing-Kuo Pao 1 Learning to Detect Phishing s I. Fette, N. Sadeh, and A. Tomasic. Learning to detect phishing s. In Proceedings.
Privacy Wizards for Social Networking Sites Reporter : 鄭志欣 Advisor: Hsing-Kuo Pao Date : 2011/01/17 1.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Diagnosis of Ovarian Cancer Based on Mass Spectra of Blood Samples Hong Tang Yelena Mukomel Eugene Fink.
The Inconvenient Truth about Web Certificates Jean-Pierre Hubaux Joint work with N. Vratonjic, J. Freudiger and V. Bindschaedler Work presented at WEIS.
Accurately Detect Parked Domain Typo- squatting Attacks Mishari Almishari and Xiaowei Yang University of California, Irvine Donald Bren School of Information.
Typo-Squatting: a Nuisance or a Threat to Your Traffic? Mishari Almishari.
Certificates, Browsers & You: What is all this certificate crud? Frank J. Nagy God of Kerberos And Associates...
Predicting Unix Commands With Decision Tables and Decision Trees Kathleen Durant Third International Conference on Data Mining Methods and Databases September.
Learning Transportation Mode from Raw GPS Data for Geographic Applications on the Web Yu Zheng, Like Liu, Xing Xie Microsoft Research.
The Inconvenient Truth about Web Certificates Nevena Vratonjic Julien Freudiger Vincent Bindschaedler Jean-Pierre Hubaux June 2011, WEIS’11.
Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.
Prophiler: A fast filter for the large-scale detection of malicious web pages Reporter : 鄭志欣 Advisor: Hsing-Kuo Pao Date : 2011/03/31 1.
Measuring DANE TLSA Deployment Liang Zhu 1, Duane Wessels 2, Allison Mankin 2, John Heidemann 1 1. USC ISI 2. Verisign Labs 1.
3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 Wei FENG Geo-Resources and Environment Lab, Bordeaux INP (Bordeaux Institute of Technology), France Supervisor:
Web Spam Detection: link-based and content-based techniques Reporter : 鄭志欣 Advisor : Hsing-Kuo Pao 2010/11/8 1.
WhoWas: A Platform for Measuring Web Deployments on IaaS Clouds Liang Wang *, Antonio Nappa +, Juan Caballero +, Thomas Ristenpart *, Aditya Akella * *
Presentation by Kathleen Stoeckle All Your iFRAMEs Point to Us 17th USENIX Security Symposium (Security'08), San Jose, CA, 2008 Google Technical Report.
Using Friendship Ties and Family Circles for Link Prediction Elena Zheleva, Lise Getoor, Jennifer Golbeck, Ugur Kuter (SNAKDD 2008)
PhishNet: Predictive Blacklisting to Detect Phishing Attacks Pawan Prakash Manish Kumar Ramana Rao Kompella Minaxi Gupta Purdue University, Indiana University.
PhishScore: Hacking Phishers’ Minds
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Threat Management Gateway 2010 Questo sconosciuto? …ancora per poco! Manuela Polcaro Security Advisor.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Boosting Neural Networks Published by Holger Schwenk and Yoshua Benggio Neural Computation, 12(8): , Presented by Yong Li.
Web Page Language Identification Based on URLs Reporter: 鄭志欣 Advisor: Hsing-Kuo Pao 1.
Network Security Lecture 26 Presented by: Dr. Munam Ali Shah.
11 CANTINA: A Content- Based Approach to Detecting Phishing Web Sites Reporter: Gia-Nan Gao Advisor: Chin-Laung Lei 2010/6/7.
Improving Web Spam Classification using Rank-time Features September 25, 2008 TaeSeob,Yun KAIST DATABASE & MULTIMEDIA LAB.
Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia.
Certificate-Based Operations. Module Objectives By the end of this module participants will be able to: Define how cryptography is used to secure information.
11 A Hybrid Phish Detection Approach by Identity Discovery and Keywords Retrieval Reporter: 林佳宜 /10/17.
Chapter 21 Distributed System Security Copyright © 2008.
Date: 2011/12/26 Source: Dustin Lange et. al (CIKM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Frequency-aware Similarity Measures 1.
Secure Sensor Data/Information Management and Mining Bhavani Thuraisingham The University of Texas at Dallas October 2005.
Privacy Preservation of Aggregates in Hidden Databases: Why and How? Arjun Dasgupta, Nan Zhang, Gautam Das, Surajit Chaudhuri Presented by PENG Yu.
DIGITAL SIGNATURE.
Reporter : 鄭志欣 Advisor: Hsing-Kuo Pao Botnet Judo: Fighting Spam with Itself.
Reporter: Jing Chiu Advisor: Yuh-Jye Lee /3/17 1 Data Mining and Machine Learning Lab.
A Repetition Based Measure for Verification of Text Collections and for Text Categorization Dmitry V.Khmelev Department of Mathematics, University of Toronto.
Classification Ensemble Methods 1
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
A Framework for Detection and Measurement of Phishing Attacks Reporter: Li, Fong Ruei National Taiwan University of Science and Technology 2/25/2016 Slide.
Don’t Follow me : Spam Detection in Twitter January 12, 2011 In-seok An SNU Internet Database Lab. Alex Hai Wang The Pensylvania State University International.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
The Wisdom of the Few Xavier Amatrian, Neal Lathis, Josep M. Pujol SIGIR’09 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
DNS Security Risks Section 0x02. Joke/Cool thing traceroute traceroute c
Key management issues in PGP
Presented by Edith Ngai MPhil Term 3 Presentation
Learning to Detect and Classify Malicious Executables in the Wild by J
SSL Certificates for Secure Websites
CATEGORIZATION OF NEWS ARTICLES USING NEURAL TEXT CATEGORIZER
Prof. Dr. Marc Rennhard Head of Information Security Research Group
Source: Procedia Computer Science(2015)70:
A New Phishing Detection Approach
iSRD Spam Review Detection with Imbalanced Data Distributions
PROJECTS SUMMARY PRESNETED BY HARISH KUMAR JANUARY 10,2018.
GANG: Detecting Fraudulent Users in OSNs
Leverage Consensus Partition for Domain-Specific Entity Coreference
Machine Learning with Clinical Data
Presentation transcript:

Harvesting SSL Certificate Data to Identify Web-Fraud Reporter : 鄭志欣 Advisor : Hsing-Kuo Pao 2010/10/04 1

Mishari Al Mishari, Emiliano De Cristofaro, Karim El Defrawy, and Gene Tsudik. "Harvesting SSL Certificate Data to Identify Web-Fraud.",Submitted to ICDCS’10, 2 Conference

 Introduction  X.509 certificates  Measurements and Analysis of SSL Certificates  Certificate-Based Classifier  Conclusion 3 Outline

 Web-fraud is one of the most unpleasant features of today’s Internet.  Phishing, Typosquatting  Can we use the information in the SSL certificates to identify web-fraud activities such as phishing and typosquatting, without compromising user privacy?  This paper presents a novel technique to detect web- fraud domains that utilize HTTPS. 4 Introduction

5 Typosqatting

 The classifier achieves a detection accuracy over 80% and, in some cases, as high as 95%.  Our classifier is orthogonal to prior mitigation techniques and can be integrated with other methods.  Note that the classifier only relies on data in the SSL certificate and not any other private user information. 6 Contributions

7 X.509 certificates

 A. HTTPS Usage and Certificate Harvest  Legitimate  Phishing and Typosquatting  B. Certificate Analysis  Analysis of Certificate Boolean Features  Analysis of Certificate Non-Boolean Features 8 Measurements and Analysis of SSL Certificates

9 A. HTTPS Usage and Certificate Harvest

 Legitimate and Popular Domain Data Sets.  Alexa: 100, 000 most popular domains according to Alexa. .com: 100, 000 random samples of.com domain zone file, collected from VeriSign. .net: 100, 000 random samples of.net domain zone file, collected from VeriSign.  We find that 34% of Alexa domains use HTTPS; 21% in.com and 16% in.net. (Commercial) 10 A. HTTPS Usage and Certificate Harvest

 Phishing Data Set  We collected 2, 811 domains considered to be hosting phishing scams from the PhishTank web site.  30% of these phishing web sites employ HTTPS.  Typosquatting Data Set  we first identified the typo domains in our.com and.net data sets by using Google’s typo correction service.  We discovered that 9, 830 out of 38, 617 are parked domains. 11 A. HTTPS Usage and Certificate Harvest

FeatureNameType Used in Classifier Notes F1md5booleanYes The Signature Algorithm of the certificate is "md5WithRSAWncryoption" F2bogus subjectbooleanYes The subject section of the certificate has bogus values (e.g., -. Somestate, somecity) F3self-signedbooleanYesThe certificate is self-signed F4expiredbooleanYesThe certificate is expired F5 verification failed booleanNo The certificate passed the verification of OpenSSL 0.9.8k 25 Mar 2009 (for Debian Linux) F6 common certificate booleanYes The certificate of the given domain is the same as a certificate of another domain F7common serialbooleanYes The serical number of the certificate is the same as the serial of another one. F8 validity period > 3 yesrs booleanYesThe validity period is more than 3 years F9 issuer common name stringYesThe common name of the issuer F10 issuer organization stringYesThe organization name of the issuer F11issuer countrystringYesThe country name of the issuer F12subject countrystringYesThe country name of the subject F13 exact validity duration integerNo The number of days between the starting date and the expiration date F14 serial number length integerYesThe number of characters in the serial number F15 host-common name distance realYes The Jaccard distance value between host name and common name in the subject section 12 B. Certificate Analysis

 Analysis of Certificate Boolean Features 13 B. Certificate Analysis

14 B.Certificate Analysis Fig : CDF of Serial Number Length of Alexa,.com.net (c) phishing (d) typosquatting F14 : Serial Number Length

15 Certificate Analysis F15 : Jaccard Distance

 Around 20% of legitimate popular domains are still using the signature algorithm “md5WithRSAEncryption“ despite its clear insecurity.  A significant percentage (> 30%) of legitimate domain certificates are expired and/or self-signed.  Duplicate certificate percentages are very high in phishing domains.  For most features, the difference in distributions between Alexa and malicious sets is larger than that between.com/.net and malicious sets. 16 Summary of certificate Feature Analysis

 A. Phishing Classifier  B. Typosquatting Classifier 17 Certificate-Based Classifier

ClassifierPositive RecallPositive Precision Random Forest SVM Decision Tree Bagging Decision Tree Boosting Decision Tree Decision Tree Nesrest Neighbor Neural Networks Phishing Classifier Table IV Performance of classifiers - Data set consists of (A)420 phishing certificates and (B)420 non-phishing certificates (Alexa,.COM and.NET) ClassifierPositive RecallPositive Precision Random Forest SVM Decision Tree Bagging Decision Tree Boosting Decision Tree Decision Tree Nesrest Neighbor Neural Networks Table V Performance of classifiers - Data set consists of (A)420 phishing certificates and (B)420 non-phishing certificates (Alexa)

19 Typosquatting Classifier ClassifierPositive RecallPositive Precision Random Forest SVM Decision Tree Bagging Decision Tree Boosting Decision Tree Decision Tree Nesrest Neighbor Neural Networks Table VI Preformance of classifiers - Data set consists of (A)486 typosquatting certificates and (B)486 non-typosquatting certificates (Top Alexa,.COM and.NET) ClassifierPositive RecallPositive Precision Random Forest SVM Decision Tree Bagging Decision Tree Boosting Decision Tree Decision Tree Nesrest Neighbor Neural Networks0.95 Table VII Preformance of classifiers - Data set consists of (A)486 typosquatting certificates and (B)486 popular domain certificates

 We design and build a machine-learning-based classifier that identifies fraudulent domains using HTTPS based solely on their SSL certificates, thus also preserving user privacy.  We believe that our results may serve as a motivating factor to increase the use of HTTPS on the Web.  Use of HTTPS can help identifying web-fraud domains. 20 Conclusion