Spam: An Analysis of Spam Filters Joe Chiarella Jason O’Brien Advisors: Professor Wills and Professor Claypool.

Slides:



Advertisements
Similar presentations
Anti-SPAM experience at LAL Michel Jouvin LAL / IN2P3
Advertisements

Document Filtering Dr. Frank McCown Intro to Web Science Harding University This work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike.
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Report : 鄭志欣 Advisor: Hsing-Kuo Pao 1 Learning to Detect Phishing s I. Fette, N. Sadeh, and A. Tomasic. Learning to detect phishing s. In Proceedings.
----Presented by Di Xu  Introduction  Overview of Spam  Solutions to Spam  Conclusion.
A Survey on Text Categorization with Machine Learning Chikayama lab. Dai Saito.
CSC 380 Algorithm Project Presentation Spam Detection Algorithms Kyle McCombs Bridget Kelly.
INHA UNIVERSITY INCHEON, KOREA ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spam System Z. Zhong, L. Ramaswamy and.
Biometrics & Security Tutorial 7. 1 (a) Please compare two different kinds of biometrics technologies: Retina and Iris. (P8:2-3)
Capstone Project By Samer Al-khateeb Date: May/10/2013.
6/1/2015 Spam Filtering - Muthiyalu Jothir 1 Spam Filtering Computer Security Seminar N.Muthiyalu Jothir – Media Informatics.
IMF Mihály Andó IT-IS 6 November Mihály Andó 2 / 11 6 November 2006 What is IMF? ­ Intelligent Message Filter ­ provides server-side message filtering,
1 BotGraph: Large Scale Spamming Botnet Detection Yao Zhao EECS Department Northwestern University.
1 Spam Filtering Using Bayesian Approach Presented by: Nitin Kumar.
Fighting Spam Enterprise Spam Filtering Using Open Source Tools.
Survey Experiments. Defined Uses a survey question as its measurement device Manipulates the content, order, format, or other characteristics of the survey.
Antispam GARR Michele Michelotto Hepix Karlsruhe, 11 May 2005.
SPAM Control. Users Mailboxes 123RiverStreet Contracts Main (Inbox) Sent Someone sends a message to.
Anti Phishing & Spam -- by lynn. Spam Anti Spam and How White-lists Black-lists Heuristics –Bayes –Neural Networks Static technique –keyword checking.
© 2014 Cengage Learning. All Rights Reserved. Learning Objectives © 2014 Cengage Learning. All Rights Reserved. LO1 Prepare an income statement for a service.
11 CANTINA: A Content- Based Approach to Detecting Phishing Web Sites Reporter: Gia-Nan Gao Advisor: Chin-Laung Lei 2010/6/7.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
A Technical Approach to Minimizing Spam Mallory J. Paine.
CSC 556– DBMS II, Spring 2013, Week 7 Bayesian Inference Paul Graham’s Plan for Spam, + A Patent Application for Learning Mobile Preferences, + some text.
Enron Corpus: A New Dataset for Classification By Bryan Klimt and Yiming Yang CEAS 2004 Presented by Will Lee.
SCAVENGER: A JUNK MAIL CLASSIFICATION PROGRAM Rohan Malkhare Committee : Dr. Eugene Fink Dr. Dewey Rundus Dr. Alan Hevner.
Classification: Feature Vectors
Bayesian Spam Filter By Joshua Spaulding. Statement of Problem “Spam now accounts for more than half of all messages sent and imposes huge productivity.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
1 A Study of Supervised Spam Detection Applied to Eight Months of Personal E- Mail Gordon Cormack and Thomas Lynam Presented by Hui Fang.
Project Presentation B 王 立 B 陳俊甫 B 張又仁 B 李佳穎.
Prostate Cancer CAD Michael Feldman, MD, PhD Assistant Professor Pathology University Pennsylvania.
Department of Electrical Engineering and Computer Science Kunpeng Zhang, Yu Cheng, Yusheng Xie, Doug Downey, Ankit Agrawal, Alok Choudhary {kzh980,ych133,
Music Genre Classification Alex Stabile. Example File
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
POSTER TEMPLATE BY: Background Objectives Psychophysical Experiment Smoothness Features Project Pipeline and outlines The purpose.
A COMPARISON OF ANN, NAÏVE BAYES, AND DECISION TREE FOR THE PURPOSE OF SPAM FILTERING KAASHYAPEE JHA ECE/CS
0 Glencoe Accounting Unit 1 Chapter 2 Copyright © by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 2, Section 2 Accounting: The Universal.
Don’t Follow me : Spam Detection in Twitter January 12, 2011 In-seok An SNU Internet Database Lab. Alex Hai Wang The Pensylvania State University International.
Spam Detection Kingsley Okeke Nimrat Virk. Everyone hates spams!! Spam s, also known as junk s, are unwanted s sent to numerous recipients.
6 Traits of Writing Secondary November 6, 2006 September 2012 Janet Foss ESU 3.
Improvement of Apriori Algorithm in Log mining Junghee Jaeho Information and Communications University,
CMS SAS Users Group Conference Learn more about THE POWER TO KNOW ® October 17, 2011 Medicare Payment Standardization Modeling using SAS Enterprise Miner.
Exponential Differential Document Count A Feature Selection Factor for Improving Bayesian Filters Fidelis Assis 1 William Yerazunis 2 Christian Siefkes.
© 2014 Cengage Learning. All Rights Reserved.
Artificial Intelligence
What are the steps to stop spam s on Yahoo?
Spam Image Identification Using an Artificial Neural Network
Exploiting Machine Learning to Subvert Your Spam Filter
Performance Testing Services. Table Of Contents Contents 1.1. Company Profile 2.2. Performance Testing Methodology 3.3. Benefits of Performance Testing.
A Study On Solutions To Spam
© 2014 Cengage Learning. All Rights Reserved.
Language Models for Information Retrieval
M.Sc. Project Doron Harlev Supervisor: Dr. Dana Ron
CS 188: Artificial Intelligence
Comparisons among methods to analyze clustered multivariate biomarker predictors of a single binary outcome Xiaoying Yu, PhD Department of Preventive Medicine.
David Cyphert CS 2310 – Software Engineering
© 2014 Cengage Learning. All Rights Reserved.
LESSON 7-1 Preparing an Income Statement
© 2014 Cengage Learning. All Rights Reserved.
Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang
Project Presentation B 王 立 B 陳俊甫 B 張又仁
© 2014 Cengage Learning. All Rights Reserved.
A quick tour of possibilities: The Good and Not-so-Good
Homework 03 Announce: Due:
Text Mining Application Programming Chapter 9 Text Categorization
Spam control Old emphasis: detect spam
Music Signal Processing
Presentation transcript:

Spam: An Analysis of Spam Filters Joe Chiarella Jason O’Brien Advisors: Professor Wills and Professor Claypool

Project Goals To analyze the effectiveness of different kinds of spam filters. To analyze the effectiveness of different kinds of spam filters. Focused on SpamAssassin and Bogofilter Focused on SpamAssassin and Bogofilter

SpamAssassin Rule-based filter – over 400 rules. Rule-based filter – over 400 rules. Each Rule has an associated weight. Each Rule has an associated weight. Score of an is sum of weights across all matching rules. Score of an is sum of weights across all matching rules. User adjustable threshold. User adjustable threshold.

Bogofilter Bayesian filter. Bayesian filter. Calculates probability that an is spam using past . Calculates probability that an is spam using past . Looks at frequency of words (not order of words). Looks at frequency of words (not order of words). Accuracy should improve over time. Accuracy should improve over time.

Data Collection collected from students, professors, small business employees, and free accounts. collected from students, professors, small business employees, and free accounts ham s, 5010 spam s, separated into ham and spam mailboxes for each user ham s, 5010 spam s, separated into ham and spam mailboxes for each user.

Methodology Compared accuracy of SpamAssassin and Bogofilter for each user’s . Compared accuracy of SpamAssassin and Bogofilter for each user’s . Tested same number of ham s and spam s from each user. Tested same number of ham s and spam s from each user. Ignored results from first 50 s to allow Bogofilter to learn. Ignored results from first 50 s to allow Bogofilter to learn.

Comparison of Bogofilter and SpamAssassin on Ham CP = Company Person PR = Professor ST = Student FE = Free

Comparison of Bogofilter and SpamAssassin on Spam CP = Company Person PR = Professor ST = Student FE = Free

SpamAssassin Score Analysis

Conclusion Bogofilter and SpamAssassin effectiveness depend greatly on the user. Bogofilter and SpamAssassin effectiveness depend greatly on the user. Neither filter outperformed the other in all cases. Neither filter outperformed the other in all cases. Filtering Spam is hard. Filtering Spam is hard.

Questions?