Presented by: Alex Misstear Spam Filtering An Artificial Intelligence Showcase.

Slides:



Advertisements
Similar presentations
Network Security Highlights Nick Feamster Georgia Tech.
Advertisements

Anti-SPAM experience at LAL Michel Jouvin LAL / IN2P3
Document Filtering Dr. Frank McCown Intro to Web Science Harding University This work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike.
Basic Communication on the Internet:
Bayesian Theorem & Spam Filtering
Supervised Learning Techniques over Twitter Data Kleisarchaki Sofia.
What is Spam  Any unwanted messages that are sent to many users at once.  Spam can be sent via , text message, online chat, blogs or various other.
Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks Yehonatan Cohen Daniel Gordon Danny Hendler Ben-Gurion University Yehonatan.
----Presented by Di Xu  Introduction  Overview of Spam  Solutions to Spam  Conclusion.
CSC 380 Algorithm Project Presentation Spam Detection Algorithms Kyle McCombs Bridget Kelly.
6/1/2015 Spam Filtering - Muthiyalu Jothir 1 Spam Filtering Computer Security Seminar N.Muthiyalu Jothir – Media Informatics.
IMF Mihály Andó IT-IS 6 November Mihály Andó 2 / 11 6 November 2006 What is IMF? ­ Intelligent Message Filter ­ provides server-side message filtering,
Search Engines and Information Retrieval
CS345 Data Mining Web Spam Detection. Economic considerations  Search has become the default gateway to the web  Very high premium to appear on the.
Chapter Six Errors, Error Detection, and Error Control Data Communications and Computer Networks: A Business User’s Approach Sixth Edition.
1 Chapter Six - Errors, Error Detection, and Error Control Chapter Six.
Network Security: Spam Nick Feamster Georgia Tech CS 6250 Joint work with Anirudh Ramachanrdan, Shuang Hao, Santosh Vempala, Alex Gray.
© 2003 Franz J. Kurfess Spam Filtering 1 CPE/CSC 481: Knowledge-Based Systems Dr. Franz J. Kurfess Computer Science Department Cal Poly.
1 Spam Filtering Using Bayesian Approach Presented by: Nitin Kumar.
Analyzing Behavioral Features for Classification.
Chapter 6: Errors, Error Detection, and Error Control
Spam Filters. What is Spam? Unsolicited (legally, “no existing relationship” Automated Bulk Not necessarily commercial – “flaming”, political.
Spam May CS239. Taxonomy (UBE)  Advertisement  Phishing Webpage  Content  Links From: Thrifty Health-Insurance Mailed-By: noticeoption.comReply-To:
Academic Advisor: Dr. Yuval Elovici Technical Advisor: Dr. Lidror Troyansky.
Chapter 6 Errors, Error Detection, and Error Control
23 October 2002Emmanuel Ormancey1 Spam Filtering at CERN Emmanuel Ormancey - 23 October 2002.
Spam Reduction Techniques Using greylisting and SpamAssassin.
TrustPort Net Gateway traffic protection. Keep It Secure Entry point protection –Clear separation of the risky internet and secured.
An Effective Defense Against Spam Laundering Paper by: Mengjun Xie, Heng Yin, Haining Wang Presented at:CCS'06 Presentation by: Devendra Salvi.
Detecting Spammers with SNARE: Spatio-temporal Network-level Automatic Reputation Engine Shuang Hao, Nadeem Ahmed Syed, Nick Feamster, Alexander G. Gray,
Probability, Bayes’ Theorem and the Monty Hall Problem
Visit for Marketing and Deliverability Tips, Tools, & Trainingwww. Delivered.com.
Spam Filtering Techniques Arnold Perez Joseph Tilley.
Good Word Attacks on Statistical Spam Filters Daniel Lowd University of Washington (Joint work with Christopher Meek, Microsoft Research)
Personalized Spam Filtering for Gray Mail Ming-wei Chang University of Illinois at Urbana-Champaign Wen-tau Yih and Robert McCann Microsoft Corporation.
Processing of large document collections Part 3 (Evaluation of text classifiers, applications of text categorization) Helena Ahonen-Myka Spring 2005.
Network and Systems Security By, Vigya Sharma (2011MCS2564) FaisalAlam(2011MCS2608) DETECTING SPAMMERS ON SOCIAL NETWORKS.
Client X CronLab Spam Filter Technical Training Presentation 19/09/2015.
A Neural Network Classifier for Junk Ian Stuart, Sung-Hyuk Cha, and Charles Tappert CSIS Student/Faculty Research Day May 7, 2004.
Error Detection and Correction
Information Coding in noisy channel error protection:-- improve tolerance of errors error detection: --- indicate occurrence of errors. Source.
Small Business Resource Power Point Series How to Avoid Your Marketing Messages Being Labelled as Spam.
A Technical Approach to Minimizing Spam Mallory J. Paine.
SCAVENGER: A JUNK MAIL CLASSIFICATION PROGRAM Rohan Malkhare Committee : Dr. Eugene Fink Dr. Dewey Rundus Dr. Alan Hevner.
Adapting Statistical Filtering David Kohlbrenner IT.com TJHSST.
1 A Study of Supervised Spam Detection Applied to Eight Months of Personal E- Mail Gordon Cormack and Thomas Lynam Presented by Hui Fang.
C August 24, 2004 Page 1 SMS Spam Control Nobuyuki Uchida QUALCOMM Incorporated Notice ©2004 QUALCOMM Incorporated. All rights reserved.
Leveraging Asset Reputation Systems to Detect and Prevent Fraud and Abuse at LinkedIn Jenelle Bray Staff Data Scientist Strata + Hadoop World New York,
Improving Spam Detection Based on Structural Similarity By Luiz H. Gomes, Fernando D. O. Castro, Rodrigo B. Almeida, Luis M. A. Bettencourt, Virgílio A.
Spam Detection Ethan Grefe December 13, 2013.
By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.
1 Chapter Six - Errors, Error Detection, and Error Control Chapter Six.
Web Content Filtering Mayur Lodha (mdl2130). Agenda  Need of Filtering  Content Filtering  Basic Model  Filtering Techniques  Filtering  Circumvent.
Database Techniques for fighting SPAM Telvis Calhoun CSc 8710 – Advanced Databases Dr. Yingshu Li.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
1 Fighting Against Spam. 2 How might we analyze ? Identify different parts – Reply blocks, signature blocks Integrate with workflow tasks Build.
Witold Litwin Université Paris Dauphine Darrell LongUniversity of California Santa Cruz Thomas SchwarzUniversidad Católica del Uruguay Combining Chunk.
Bayesian Filtering Team Glyph Debbie Bridygham Pravesvuth Uparanukraw Ronald Ko Rihui Luo Thuong Luu Team Glyph Debbie Bridygham Pravesvuth Uparanukraw.
Internal and Confidential Cognos CoE COGNOS 8 – Event Studio.
© Copyright 2009 SSLPost 01. © Copyright 2009 SSLPost 02 a recipient is sent an encrypted that contains data specific to that recipient the data.
A False Positive Safe Neural Network for Spam Detection Alexandru Catalin Cosoi
Don’t Follow me : Spam Detection in Twitter January 12, 2011 In-seok An SNU Internet Database Lab. Alex Hai Wang The Pensylvania State University International.
1 Detecting Spammers with SNARE: Spatio-temporal Network-level Automatic Reputation Engine Speaker: Jun-Yi Zheng 2010/01/18.
Spam By Dan Sterrett. Overview ► What is spam? ► Why it’s a problem ► The source of spam ► How spammers get your address ► Preventing Spam ► Possible.
Sampling Distributions Chapter 18. Sampling Distributions A parameter is a number that describes the population. In statistical practice, the value of.
Sender Reputation in a Large Webmail Service by Bradley Taylor (2006) Presented by : Manoj Kumar & Harsha Vardhana.
Exploiting Machine Learning to Subvert Your Spam Filter
Spam Fighting at CERN 12 January 2019 Emmanuel Ormancey.
Text Mining Application Programming Chapter 9 Text Categorization
Presentation transcript:

Presented by: Alex Misstear Spam Filtering An Artificial Intelligence Showcase

What is Spam Messages sent indiscriminately to a large number of recipients We all hate it Term attributed to a Monty Python skit Legitimate messages sometimes referred to as “ham”.

History of Spam First recorded case in 1978 An ad created by Digital Equipment Corporation Sent to a few hundred over ARPANET Instant negative feedback but did result in some sales Term first used to describe an accidental post caused by a bug to a USENET newsgroup in 1993 Considered humorous at the time First major use as a business practice in 1994

Spam Everywhere Spam estimations (Symantec): January 2013: 64.1% December 2012: 70.6% July 2012: 67.6% January 2012: 69.0% At times these figures can be > 80%

Filtering Techniques Rule based Prone to false positives E.g.: The word mortgage appears in a lot of spam but also some very important ham. Checksum Filtering Easily circumvented by senders Insert random characters to disrupt the hash Blacklisting/whitelisting Prone to complications for the recipient Bayesian Filtering Low false positives Many more…

Bayesian Spam Filtering Particular chunks of text occur often in spam while seldom in ham messages. First introduced in 1996 Improved upon by Paul Graham in 2002 Not just a simple text classification problem. Obscure characters/HTML content is seen. leetspeak: v1agra IP addresses: ( ) Empty HTML comments:

Concept Based on the idea that the probability of a message being spam is related to the previous occurrences of words in the message. Each word can be used to help calculate this probability. Maintain a database of words to probabilities Probability the word appears in spam Probability the word appears in ham

Bayes Theorem

Biased & Unbiased Filtering

Example

Applying Bayes Theorem Break down messages into words as they arrive. Single out the most interesting/relevant words (those with the greatest spam probability in the database). Generate the spamicity for each. Combine all the spamicities If the overall spamicity is greater than a certain threshold the message is marked as spam

Combining Probabilities

Results Statistics vary based on the individual/message received Spam detection rates of 99.7% are common 0.03% of false positives Calculating spamicity for phrases has been shown to improve these numbers slightly Requires an initial learning period with ham/spam classification feedback to build the database Typically a couple weeks

Bayesian Poisoning Spammers send messages with random, seemingly legitimate words to degrade the filters Future spam messages may then get through later on Can also increase the false positive rate Difficult for the attacker to train the filter if no feedback is given (critical to protection) Can be prevented with periodic retraining

Conclusion Bayesian Filtering considered the best Adaptive solution Can look at more than just the message body Inherently multilingual Individuals/corporations can have their own filter which learns from their message behavior Difficult to circumvent for attackers Requires an initial learning period

References filtering.pdf filtering.pdf pam pam ftp://ftp.research.microsoft.com/users/joshuago/papers- 2005/125.pdf ftp://ftp.research.microsoft.com/users/joshuago/papers- 2005/125.pdf

Questions?