Deep Belief Networks for Spam Filtering

Slides:



Advertisements
Similar presentations
Greedy Layer-Wise Training of Deep Networks
Advertisements

Neural networks Introduction Fitting neural networks
Deep Learning Bing-Chen Tsai 1/21.
Ch. Eick: More on Machine Learning & Neural Networks Different Forms of Learning: –Learning agent receives feedback with respect to its actions (e.g. using.
Stochastic Neural Networks Deep Learning and Neural Nets Spring 2015.
CS590M 2008 Fall: Paper Presentation
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Advanced topics.
Stacking RBMs and Auto-encoders for Deep Architectures References:[Bengio, 2009], [Vincent et al., 2008] 2011/03/03 강병곤.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
POSTER TEMPLATE BY: Multi-Sensor Health Diagnosis Using Deep Belief Network Based State Classification Prasanna Tamilselvan.
Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009
AN INVESTIGATION OF DEEP NEURAL NETWORKS FOR NOISE ROBUST SPEECH RECOGNITION Michael L. Seltzer, Dong Yu Yongqiang Wang ICASSP 2013 Presenter : 張庭豪.
Deep Learning.
Wake-Sleep algorithm for Representational Learning
Neural Networks Basic concepts ArchitectureOperation.
Distributed Representations of Sentences and Documents
CSC321: Introduction to Neural Networks and Machine Learning Lecture 20 Learning features one layer at a time Geoffrey Hinton.
Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y )Prof. K S Venkatesh.
Deep Boltzman machines Paper by : R. Salakhutdinov, G. Hinton Presenter : Roozbeh Gholizadeh.
Spam Detection Jingrui He 10/08/2007. Spam Types  Spam Unsolicited commercial  Blog Spam Unwanted comments in blogs  Splogs Fake blogs.
Document Classification using Deep Belief Nets Lawrence McAfee 6/9/08 CS224n, Sprint ‘08.
How to do backpropagation in a brain
Biointelligence Laboratory, Seoul National University
Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.
Using Fast Weights to Improve Persistent Contrastive Divergence Tijmen Tieleman Geoffrey Hinton Department of Computer Science, University of Toronto ICML.
Neural Networks Chapter 6 Joost N. Kok Universiteit Leiden.
Chapter 9 Neural Network.
Hierarchical Distributed Genetic Algorithm for Image Segmentation Hanchuan Peng, Fuhui Long*, Zheru Chi, and Wanshi Siu {fhlong, phc,
A shallow introduction to Deep Learning
Neural and Evolutionary Computing - Lecture 9 1 Evolutionary Neural Networks Design  Motivation  Evolutionary training  Evolutionary design of the architecture.
Spam Detection Ethan Grefe December 13, 2013.
Neural Text Categorizer for Exclusive Text Categorization Journal of Information Processing Systems, Vol.4, No.2, June 2008 Taeho Jo* 報告者 : 林昱志.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
Hazırlayan NEURAL NETWORKS Backpropagation Network PROF. DR. YUSUF OYSAL.
Neural Networks Vladimir Pleskonjić 3188/ /20 Vladimir Pleskonjić General Feedforward neural networks Inputs are numeric features Outputs are in.
Introduction to Deep Learning
Cognitive models for emotion recognition: Big Data and Deep Learning
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
A Parallel Mixture of SVMs for Very Large Scale Problems Ronan Collobert Samy Bengio Yoshua Bengio Prepared : S.Y.C. Neural Information Processing Systems,
Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Deep Belief Network Training Same greedy layer-wise approach First train lowest RBM (h 0 – h 1 ) using RBM update algorithm (note h 0 is x) Freeze weights.
Kim HS Introduction considering that the amount of MRI data to analyze in present-day clinical trials is often on the order of hundreds or.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
A Presentation on Adaptive Neuro-Fuzzy Inference System using Particle Swarm Optimization and it’s Application By Sumanta Kundu (En.R.No.
A Document-Level Sentiment Analysis Approach Using Artificial Neural Network and Sentiment Lexicons Yan Zhu.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Some Slides from 2007 NIPS tutorial by Prof. Geoffrey Hinton
Learning Deep Generative Models by Ruslan Salakhutdinov
Deep Learning Amin Sobhani.
Energy models and Deep Belief Networks
Article Review Todd Hricik.
Matt Gormley Lecture 16 October 24, 2016
Restricted Boltzmann Machines for Classification
Multimodal Learning with Deep Boltzmann Machines
Neural Networks CS 446 Machine Learning.
Deep learning and applications to Natural language processing
Deep Learning Qing LU, Siyuan CAO.
Prof. Carolina Ruiz Department of Computer Science
General Aspects of Learning
Deep Architectures for Artificial Intelligence
Deep Belief Nets and Ising Model-Based Network Construction
COSC 4335: Part2: Other Classification Techniques
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
CSC 578 Neural Networks and Deep Learning
Prof. Carolina Ruiz Department of Computer Science
Presentation transcript:

Deep Belief Networks for Spam Filtering Grigorios Tzortzis and Aristidis Likas Department of Computer Science, University of Ioannina, Greece

Outline The spam phenomenon Deep belief networks for spam detection What is spam Spam filtering approaches Deep belief networks for spam detection Training of DBNs Experimental evaluation Datasets and preprocessing Performance measures Comparison to support vector machines (SVMs) (considered state-of-the-art) Conclusions

What is Spam? Unsolicited Bulk E-mail In human terms: any e-mail you do not want Large fraction of all e-mail sent Radicati Group est. 62% of e-mail traffic in Europe is spam – 16 billion spam messages sent everyday Still growing – to reach 38 billion by 2010 Best solution to date is spam filtering

Spam Filtering Approaches Knowledge Engineering Spam filters based on predefined and user-defined rules Static rules – easily bypassed by spammers Suffers from poor generalization Machine Learning Automatic construction of a classifier (training set) Keeping the filter up-to-date is easy (retraining) Higher generalization compared with rule-based filters

Machine Learning for Spam Detection Numerous classification methods have been proposed Naïve Bayes (already used in commercial filters) Support Vector Machines (SVMs) etc … In this work we propose the use of a Deep Belief Network to tackle the spam problem

Deep Belief Networks (DBNs) What is a DBN (for classification)? A feedforward neural network with a deep architecture i.e. with many hidden layers Consists of: visible (input) units, hidden units, output units (for classification, one for each class) Higher levels provide abstractions of the input data Parameters of a DBN W(j) :weights between the units of layers j-1 and j b(j) : biases of layer j (no biases in the input layer).

Training a DBN Conventional approach: Gradient based optimization Random initialization of weights and biases Adjustment by backpropagation (using e.g. gradient descent) w.r.t. a training criterion (e.g. cross-entropy) Optimization algorithms get stuck in poor solutions due to random initialization Solution Hinton et al [2006] proposed the use of a greedy layer-wise unsupervised algorithm for initialization of DBNs parameters Initialization phase: initialize each layer by treating it as a Restricted Boltzmann Machine (RBM) Recent work justifies its effectiveness (Hinton et al [2006], Bengio et al [2006])

Restricted Boltzmann Machines (RBMs) An RBM is a two layer neural network Stochastic binary inputs (visible units) are connected to stochastic binary outputs (hidden units) using symmetrically weighted connections Parameters of an RBM W :weights between the two layers b, c :biases for visible and hidden layers respectively Layer-to-layer conditional distributions (for logistic units) Bidirectional Connections

Remember that RBM training is unsupervised For every training example (contrastive divergence) Propagate it from visible to hidden units Sample from the conditional Propagate the sample in the opposite direction using ⇒ confabulation of the original data Update the hidden units once more using the confabulation Update the RBM parameters Repeat Sample Sample Data vector v Remember that RBM training is unsupervised

Good initializations are obtained DBN Training Revised Apply the RBM method to every layer (excluding the last layer for classification tasks) The inputs to the first layer RBM are the input examples For higher layer RBMs feed the activations of hidden units of the previous RBM, when driven by data not confabulations, as input W(L+1) random W(L) ,b(L) W(2) ,b(2) W(1) ,b(1) Good initializations are obtained Fine tune the whole network by backpropagation w.r.t. a supervised criterion (e.g. mean square error, cross-entropy)

Testing Corpora 3 widely used datasets LingSpam SpamAssassin EnronSpam Corpus Messages Spam Ratio Message Format Message Source Ham Spam LingSpam 2893 16.6% Subject -Body Linguist List Creators’ Inbox SpamAssassin 6047 31.3% Row User Donations EnronSpam (Enron1) 5172 29% Enron Employee

Performance Measures Accuracy: percentage of correctly classified messages Ham - Spam Recall: percentage of correctly classified ham – spam messages Ham - Spam Precision: percentage of messages that are classified as ham – spam that are indeed ham - spam

Experimental Setup Message representation: x=[x1, x2, …, xm] Each attribute corresponds to a distinct word from the corpus Use of frequency attributes (occurrences of word in message) Attribute selection Stop words and words appearing in <2 messages were removed + Information gain (m=1000 for SpamAssassin m=1500 for LingSpam and EnronSpam) All experiments were performed using 10-fold cross validation

Experimental Setup - continued SVM configuration Cosine kernel (the usual trend in text classification) The cost parameter C must be determined a priori Tried many values for C – kept the best DBN configuration Use of a m-50-50-200-2 DBN architecture (3 hidden layers) with softmax output units and logistic hidden units RBM training was performed using binary vectors for message representation (leads to better performance) Fine tuning by minimizing cross-entropy error (use of frequency vectors)

Experimental Results Performance Measure LingSpam DBN 1500-50-50-200-2 SVM C=1 Accuracy 99.45% 99.24% Spam Recall 98.54% 96.67% Spam Precision 98.2% 98.74% Ham Recall 99.63% 99.75% Ham Precision 99.71% 99.35% Performance Measure SpamAssassin DBN 1000-50-50-200-2 SVM C=10 Accuracy 97.5% 97.32% Spam Recall 95.51% 95.24% Spam Precision 96.4% 96.14% Ham Recall 98.39% 98.24% Ham Precision 98.02% 97.89% Performance Measure EnronSpam DBN 1000-50-50-200-2 SVM C=1 Accuracy 97.43% 96.92% Spam Recall 96.47% 97.27% Spam Precision 94.94% 92.74% Ham Recall 97.83% 96.78% Ham Precision 98.53% 98.84%

Experimental Results - continued The DBN achieves higher accuracy on all datasets Beats the SVM against all measures on SpamAssassin The DBN proved robust to variations on the number of units of each layer (kept the same architecture in all experiments) DBN training is much slower compared to SVM training A very encouraging result provided that SVMs are considered state-of-the-art in spam filtering

Conclusions The effectiveness of the initialization method was demonstrated in practice DBNs constitute a new viable solution to e-mail filtering The selection of the DBN architecture needs to be addressed in a more systematic way Number of layers Number of units in each layer

Thank you for listening Any questions?