Spam? Not any more !! Detecting spam emails using neural networks ECE/CS/ME 539 Project presentation Submitted by Sivanadyan, Thiagarajan.

Slides:



Advertisements
Similar presentations
Molecular Biomedical Informatics 分子生醫資訊實驗室 Machine Learning and Bioinformatics 機器學習與生物資訊學 Machine Learning & Bioinformatics 1.
Advertisements

Sophomore Slumpware Predicting Album Sales with Artificial Neural Networks Matthew Wirtala ECE 539.
Scott Wiese ECE 539 Professor Hu
1 A LVQ-based neural network anti-spam approach 楊婉秀 教授 資管碩一 詹元順 /12/07.
MUSICAL SCALE IDENTIFICATION USING NEURAL NETWORKS -Lyndon Quadros.
MLP Lyrical Analysis ● % of Unique Words ● # of Unique Words ● Average Word Length ● # of Lyrics ● # of Characters Input Feature Vectors:
Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.
Deep Belief Networks for Spam Filtering
Lecture 5 (Classification with Decision Trees)
Implementing a reliable neuro-classifier
Three kinds of learning
Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.
Goal: Goal: Learn to automatically  File s into folders  Filter spam Motivation  Information overload - we are spending more and more time.
Spam Detection Jingrui He 10/08/2007. Spam Types  Spam Unsolicited commercial  Blog Spam Unwanted comments in blogs  Splogs Fake blogs.
1 Authors: Anirudh Ramachandran, Nick Feamster, and Santosh Vempala Publication: ACM Conference on Computer and Communications Security 2007 Presenter:
Pro Exchange SPAM Filter An Exchange 2000 based spam filtering solution.
Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.
by B. Zadrozny and C. Elkan
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Kumar Srijan ( ) Syed Ahsan( ). Problem Statement To create a Neural Networks based multiclass object classifier which can do rotation,
Hierarchical Distributed Genetic Algorithm for Image Segmentation Hanchuan Peng, Fuhui Long*, Zheru Chi, and Wanshi Siu {fhlong, phc,
A Technical Approach to Minimizing Spam Mallory J. Paine.
Universit at Dortmund, LS VIII
Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)
Overview of Part I, CMSC5707 Advanced Topics in Artificial Intelligence KH Wong (6 weeks) Audio signal processing – Signals in time & frequency domains.
CLASSIFICATION: Ensemble Methods
SPAM DETECTION AND FILTERING By Prasanna Kunchavaram.
Spam Detection Ethan Grefe December 13, 2013.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
Protein motif extraction with neuro-fuzzy optimization Bill C. H. Chang and Author : Bill C. H. Chang and Saman K. Halgamuge Saman K. Halgamuge Adviser.
Intro. ANN & Fuzzy Systems Lecture 14. MLP (VI): Model Selection.
Feature selection with Neural Networks Dmitrij Lagutin, T Variable Selection for Regression
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Neural Text Categorizer for Exclusive Text Categorization Journal of Information Processing Systems, Vol.4, No.2, June 2008 Taeho Jo* 報告者 : 林昱志.
1 An Anti-Spam filter based on Adaptive Neural Networks Alexandru Catalin Cosoi Researcher / BitDefender AntiSpam Laboratory
Machine Learning for Spam Filtering 1 Sai Koushik Haddunoori.
Choose Your Hosting Plan Carefully By: Alphasandesh.comAlphasandesh.com.
Bayesian Filtering Team Glyph Debbie Bridygham Pravesvuth Uparanukraw Ronald Ko Rihui Luo Thuong Luu Team Glyph Debbie Bridygham Pravesvuth Uparanukraw.
Discriminative Phonetic Recognition with Conditional Random Fields Jeremy Morris & Eric Fosler-Lussier The Ohio State University Speech & Language Technologies.
Authors : Chun-Tang Chao, Chi-Jo Wang,
Final Report (30% final score) Bin Liu, PhD, Associate Professor.
Classification using Co-Training
Computational Linguistics Courses Experiment Test.
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
Neural Network Recognition of Frequency Disturbance Recorder Signals Stephen Tang REU Final Presentation July 22, 2014.
Spam Detection Kingsley Okeke Nimrat Virk. Everyone hates spams!! Spam s, also known as junk s, are unwanted s sent to numerous recipients.
Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.
Next, this study employed SVM to classify the emotion label for each EEG segment. The basic idea is to project input data onto a higher dimensional feature.
The article written by Boyarshinova Vera Scientific adviser: Eltyshev Denis THE USE OF NEURO-FUZZY MODELS FOR INTEGRATED ASSESSMENT OF THE CONDITIONS OF.
A Presentation on Adaptive Neuro-Fuzzy Inference System using Particle Swarm Optimization and it’s Application By Sumanta Kundu (En.R.No.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Course Outline (6 Weeks) for Professor K.H Wong
Neural networks.
Machine Learning for Computer Security
Koichi Odajima & Yoichi Hayashi
Deep Feedforward Networks
Advanced data mining with TagHelper and Weka
Feature Selection for Pattern Recognition
MID-SEM REVIEW.
Asymmetric Gradient Boosting with Application to Spam Filtering
Cache Replacement Scheme based on Back Propagation Neural Networks
Text Categorization Rong Jin.
An Improved Neural Network Algorithm for Classifying the Transmission Line Faults Slavko Vasilic Dr Mladen Kezunovic Texas A&M University.
Machine Learning Algorithms – An Overview
Realtime Recognition of Orchestral Instruments
A task of induction to find patterns
Text Mining Application Programming Chapter 9 Text Categorization
Lecture 16. Classification (II): Practical Considerations
Spam Detection Using Support Vector Machine Presenting By Nan Mya Oo University of Computer Studies Taunggyi.
Goodfellow: Chapter 14 Autoencoders
Presentation transcript:

Spam? Not any more !! Detecting spam s using neural networks ECE/CS/ME 539 Project presentation Submitted by Sivanadyan, Thiagarajan

Importance of the topic  Spam is unsolicited and unwanted s  Wastage of bandwidth, storage space and most of all, recipient’s time Goals of the Anti-spam Network  Reliably block spam mails  Should not block any non-spam mails, but can allow few spam mails to slip through  Adapt to the specific types of messages

Input Features – Data Set  Original data set: 57 input attributes  Output attribute: 1 (for spam) 0 (for nonspam)  Inputs derived from content  Attributes indicate the frequency of specific words and characters  Examples: ‘credit’, ‘free’ (in spam) ‘meeting’, ’project’, (in nonspam)

Preprocess the data  Choose only the inputs which differ for spam and non-spam mails  Two reduced data sets are obtained (21 Inputs and 9 Inputs)  The data is made zero mean, unit variance (4025 Input Vectors)  Split the data into two independent training and testing data sets

MLP Implementation  Learning by back propagation algorithm  Using complete data set Poor performance (Classification rate: 63.2%)Poor performance (Classification rate: 63.2%) Classified most of the mails as non-spamClassified most of the mails as non-spam  Using reduced data set (Inputs – 21) Good performance (Classification rate: 93.8%)Good performance (Classification rate: 93.8%) All the non-spam is detectedAll the non-spam is detected Optimal MLP Configuration: Optimal MLP Configuration:

Cross Validation  Using reduced data set (Inputs – 9) Good performance (Classification rate: 92.1%)Good performance (Classification rate: 92.1%) Nearly all the non-spam is detectedNearly all the non-spam is detected Optimal MLP Configuration: Optimal MLP Configuration:  Using Cross - Validation Negligible improvement in performanceNegligible improvement in performance Since all the data is derived from the same source, cross validation offers no advantageSince all the data is derived from the same source, cross validation offers no advantage

Inference of the results  Larger number of inputs does not necessarily improve the performance  It is important to remove redundant and irrelevant features  There is no optimum MLP configuration for all inputs – need to adapt depending on the content  A combination of other types of spam filters along with neural networks can be used

Conclusion  Neural networks are a viable option in spam filtering  A number of heuristic methods are being increasingly applied in this field  Need to exploit the differences between spam and ‘good’ s  Further opportunities Data sets from different sources need to be used for trainingData sets from different sources need to be used for training Fuzzy logic and combinational algorithms can be used in this applicationFuzzy logic and combinational algorithms can be used in this application