Report : 鄭志欣 Advisor: Hsing-Kuo Pao 1 Learning to Detect Phishing Emails I. Fette, N. Sadeh, and A. Tomasic. Learning to detect phishing emails. In Proceedings.

Slides:



Advertisements
Similar presentations
On the Optimality of Probability Estimation by Random Decision Trees Wei Fan IBM T.J.Watson.
Advertisements

Reporter: Jing Chiu Advisor: Yuh-Jye Lee /7/181Data Mining & Machine Learning Lab.
Detecting Malicious Flux Service Networks through Passive Analysis of Recursive DNS Traces Roberto Perdisci, Igino Corona, David Dagon, Wenke Lee ACSAC.
Learning to Detect Phishing s
What is Spam  Any unwanted messages that are sent to many users at once.  Spam can be sent via , text message, online chat, blogs or various other.
All Your Contacts Are Belong to Us: Automated Identity Theft Attacks on Social Networks Reporter : 鄭志欣 Advisor: Hsing-Kuo Pao Date : 2010/12/06 1.
1 CANTINA : A Content-Based Approach to Detecting Phishing Web Sites WWW Yue Zhang, Jason Hong, and Lorrie Cranor.
Design and Evaluation of a Real-Time URL Spam Filtering Service
Privacy Wizards for Social Networking Sites Reporter : 鄭志欣 Advisor: Hsing-Kuo Pao Date : 2011/01/17 1.
Design and Evaluation of a Real- Time URL Spam Filtering Service Kurt Thomas, Chris Grier, Justin Ma, Vern Paxson, Dawn Song University of California,
Decision Tree Algorithm
Deep Belief Networks for Spam Filtering
Document Classification Comparison Evangel Sarwar, Josh Woolever, Rebecca Zimmerman.
Goal: Goal: Learn to automatically  File s into folders  Filter spam Motivation  Information overload - we are spending more and more time.
Rotation Forest: A New Classifier Ensemble Method 交通大學 電子所 蕭晴駿 Juan J. Rodríguez and Ludmila I. Kuncheva.
Prophiler: A fast filter for the large-scale detection of malicious web pages Reporter : 鄭志欣 Advisor: Hsing-Kuo Pao Date : 2011/03/31 1.
COMPUTER TERMS PART 1. COOKIE A cookie is a small amount of data generated by a website and saved by your web browser. Its purpose is to remember information.
Web Spam Detection: link-based and content-based techniques Reporter : 鄭志欣 Advisor : Hsing-Kuo Pao 2010/11/8 1.
WARNINGBIRD: A Near Real-time Detection System for Suspicious URLs in Twitter Stream.
PhishScore: Hacking Phishers’ Minds
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
WEB SPOOFING by Miguel and Ngan. Content Web Spoofing Demo What is Web Spoofing How the attack works Different types of web spoofing How to spot a spoofed.
Reporter: Li, Fong Ruei National Taiwan University of Science and Technology 9/19/2015Slide 1 (of 32)
1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset.
Web Page Language Identification Based on URLs Reporter: 鄭志欣 Advisor: Hsing-Kuo Pao 1.
11 CANTINA: A Content- Based Approach to Detecting Phishing Web Sites Reporter: Gia-Nan Gao Advisor: Chin-Laung Lei 2010/6/7.
FluXOR: Detecting and Monitoring Fast-Flux Service Networks Emanuele Passerini, Roberto Paleari, Lorenzo Martignoni, and Danilo Bruschi 5th international.
Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.
Anti-Phishing Approaches Lifeng Hu
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Features and Algorithms Paper by: XIAOGUANG QI and BRIAN D. DAVISON Presentation by: Jason Bender.
SCAVENGER: A JUNK MAIL CLASSIFICATION PROGRAM Rohan Malkhare Committee : Dr. Eugene Fink Dr. Dewey Rundus Dr. Alan Hevner.
Web Spoofing Steve Newell Mike Falcon Computer Security CIS 4360.
Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)
SOCIAL NETWORKS ANALYSIS SEMINAR INTRODUCTORY LECTURE #2 Danny Hendler and Yehonatan Cohen Advanced Topics in on-line Social Networks Analysis.
Spam Detection Ethan Grefe December 13, 2013.
URL Obscuring COEN 252 Computer Forensics  Thomas Schwarz, S.J
By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.
Lexical Feature Based Phishing URL Detection Using Online Learning Reporter: Jing Chiu Advisor: Yuh-Jye Lee /3/17Data.
HTML Forms. Slide 2 Forms (Introduction) The purpose of input forms Organizing forms with a and Using different element types to get user input A brief.
Detecting Phishing in s Srikanth Palla Ram Dantu University of North Texas, Denton.
Reporter : 鄭志欣 Advisor: Hsing-Kuo Pao Botnet Judo: Fighting Spam with Itself.
Reporter: Jing Chiu Advisor: Yuh-Jye Lee /3/17 1 Data Mining and Machine Learning Lab.
1 Fighting Against Spam. 2 How might we analyze ? Identify different parts – Reply blocks, signature blocks Integrate with workflow tasks Build.
Social Tag Prediction Paul Heymann, Daniel Ramage, and Hector Garcia- Molina Stanford University SIGIR 2008.
Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
TO Each His Own: Personalized Content Selection Based on Text Comprehensibility Date: 2013/01/24 Author: Chenhao Tan, Evgeniy Gabrilovich, Bo Pang Source:
1 HTML Forms
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
PANACEA: AUTOMATING ATTACK CLASSIFICATION FOR ANOMALY-BASED NETWORK INTRUSION DETECTION SYSTEMS Reporter : 鄭志欣 Advisor: Hsing-Kuo Pao.
A Framework for Detection and Measurement of Phishing Attacks Reporter: Li, Fong Ruei National Taiwan University of Science and Technology 2/25/2016 Slide.
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Spamming Botnets: Signatures and Characteristics Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Microsoft Research, Silicon Valley Geoff Hulten,
Phishing & Pharming Methods and Safeguards Baber Aslam and Lei Wu.
Unveiling Zeus Automated Classification of Malware Samples Abedelaziz Mohaisen Omar Alrawi Verisign Inc, VA, USA Verisign Labs, VA, USA
Vertical Search for Courses of UIUC Homepage Classification The aim of the Course Search project is to construct a database of UIUC courses across all.
Off the Hook: Real-Time Client- Side Phishing Prevention System July 28 th, 2016 University of Helsinki Samuel Marchal*, Giovanni Armano*, Kalle Saari*,
Web Security (cont.) 1. Referral issues r HTTP referer (originally referrer) – HTTP header that designates calling resource  Page on which a link is.
Detecting Web Attacks Using Multi-Stage Log Analysis
Learning to Detect and Classify Malicious Executables in the Wild by J
Unit 20 - Client Side Customisation of Web Pages
Source: Procedia Computer Science(2015)70:
BotCatch: A Behavior and Signature Correlated Bot Detection Approach
A New Phishing Detection Approach
iSRD Spam Review Detection with Imbalanced Data Distributions
Evaluating Classifiers
Predicting Loan Defaults
Extracting Why Text Segment from Web Based on Grammar-gram
Presentation transcript:

Report : 鄭志欣 Advisor: Hsing-Kuo Pao 1 Learning to Detect Phishing s I. Fette, N. Sadeh, and A. Tomasic. Learning to detect phishing s. In Proceedings of the International World Wide Web Conference (WWW), pages 649–656, 2007.

Outline 2 Introduction Method Empirical evaluation Conclusion

Introduction 3 Phishing (Spoofed websites) Stealing account information Logon credentials Identity information Phishing Problem – Hard

Method 4 PILFER – A Machine Learning based approach to classification. phishing s / ham (good) s Feature Set Features as used in classification

5 IP-based URLs: Phishing attacks are hosted off of compromised PCs. This feature is binary.

6 Age of linked-to domain names Legitimate-sounding domain name Palypal.com paypal-update.com These domains often have a limited life WHOIS query date is within 60 days of the date the was sent – “fresh” domain. This is a binary feature

7 Nonmatching URLs This is a case of a link that says paypal.com but actually links to badsite.com. Such a link looks like paypal.com. This is a binary feature.

8 “Here” links to non-modal domain “Click here to restore your account access” Link with the text “link”, “click”, or “here” that links to a domain other than this “modal domain” This is a binary feature.

9 HTML s s are sent as either plain text, HTML, or a combination of the two - multipart/alternative format. To launch an attack without using HTML is difficult. This is a binary feature.

10 Number of links The number of links present in an . in HTML tag This is a continuous feature.

11 Number of domains Simply take the domain names previously extracted from all of the links, and simply count the number of distinct domains. Look at the “main” part of a domain This is a continuous feature.

12 Number of dots Subdomains like Redirection script, such as This feature is simply the maximum number of dots (`.') contained in any of the links present in the , and is a continuous feature.

13 Contains javascript Attackers can use JavaScript to hide information from the user, and potentially launch sophisticated attacks. An is flagged with the “contains javascript” feature if the string “javascript” appears in the , regardless of whether it is actually in a or tag This is a binary feature.

14 Spam-filter output This is a binary feature, using the trained version of SpamAssassin with the default rule weights and threshold. “Ham” or “Spam” This is a Binary feature.

Empirical Evaluation 15 Machine-Learning Implementation Testing Spam Assassin Datasets Additional Challenges False Positives vs. False Negatives

16 Machine-Learning Implementation-PILFER First, run a set of scripts to extract all the features listed. Second, we train and test a classifier using 10-fold cross validation. Random Forest (classifier) Random forests create a number of decision trees and each decision tree is made by randomly choosing an attribute to split on at each level, and then pruning the tree.

17 we use a random forest as a classifier.

18 Testing SpamAssassin SpamAssassin is a widely-deployed freely-available spam filter that is highly accurate in classifying spam s. We classify the exact same dataset using SpamAssassin version 3.1.0, using the default thresholds and rules. Using “Untrain” SpamAssassin “Training” on 10-fold

19 Datasets Two publicly available datasets. ham corpora from the SpamAssassin project 6950 non-phishing non-spam s Phishingcorpus approximately 860 messages

20 Additional Challenges The age of the dataset. Phishing websites are short-lived. Some of our features can therefore not be extracted from older s, making our tests difficult. EX: Domain linked to

Result 21

22

Conclusion 23 it is possible to detect phishing s with high accuracy by using a specialized filter, using features that are more directly applicable to phishing s than those employed by general purpose spam filters.

Reference 24 I. Fette, N. Sadeh, and A. Tomasic. Learning to detect phishing s. In Proceedings of the International World Wide Web Conference (WWW), pages 649–656, Phishing%20 s.pptx 0Phishing%20 s.pptx cht.blogspot.com/2010/01/phishing-mail.html cht.blogspot.com/2010/01/phishing-mail.html

25