Analyzing Behavioral Features for Email Classification.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Component Analysis (Review)
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
Dimensionality Reduction PCA -- SVD
An Overview of Machine Learning
Minimum Redundancy and Maximum Relevance Feature Selection
Face Recognition and Biometric Systems
CSC 380 Algorithm Project Presentation Spam Detection Algorithms Kyle McCombs Bridget Kelly.
SurroundSense: Mobile Phone Localization via Ambience Fingerprinting Written by Martin Azizyan, Ionut Constandache, & Romit Choudhury Presented by Craig.
Learning on User Behavior for Novel Worm Detection.
EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.
UNDERSTANDING VISIBLE AND LATENT INTERACTIONS IN ONLINE SOCIAL NETWORK Presented by: Nisha Ranga Under guidance of : Prof. Augustin Chaintreau.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Multi-Scale Analysis for Network Traffic Prediction and Anomaly Detection Ling Huang Joint work with Anthony Joseph and Nina Taft January, 2005.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Statistical Methods for long-range forecast By Syunji Takahashi Climate Prediction Division JMA.
SMS Mobile Botnet Detection Using A Multi-Agent System Abdullah Alzahrani, Natalia Stakhanova, and Ali A. Ghorbani Faculty of Computer Science, University.
TrustPort Net Gateway traffic protection. Keep It Secure Entry point protection –Clear separation of the risky internet and secured.
Spam? Not any more !! Detecting spam s using neural networks ECE/CS/ME 539 Project presentation Submitted by Sivanadyan, Thiagarajan.
Factor Analysis Psy 524 Ainsworth.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Chapter 2 Dimensionality Reduction. Linear Methods
A Statistical Anomaly Detection Technique based on Three Different Network Features Yuji Waizumi Tohoku Univ.
Network and Systems Security By, Vigya Sharma (2011MCS2564) FaisalAlam(2011MCS2608) DETECTING SPAMMERS ON SOCIAL NETWORKS.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Presented by Tienwei Tsai July, 2005
Texture analysis Team 5 Alexandra Bulgaru Justyna Jastrzebska Ulrich Leischner Vjekoslav Levacic Güray Tonguç.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
1 Comparison of Principal Component Analysis and Random Projection in Text Mining Steve Vincent April 29, 2004 INFS 795 Dr. Domeniconi.
Texture. Texture is an innate property of all surfaces (clouds, trees, bricks, hair etc…). It refers to visual patterns of homogeneity and does not result.
A Technical Approach to Minimizing Spam Mallory J. Paine.
DoWitcher: Effective Worm Detection and Containment in the Internet Core S. Ranjan et. al in INFOCOM 2007 Presented by: Sailesh Kumar.
INTERACTIVE ANALYSIS OF COMPUTER CRIMES PRESENTED FOR CS-689 ON 10/12/2000 BY NAGAKALYANA ESKALA.
Automated Classification and Analysis of Internet Malware M. Bailey J. Oberheide J. Andersen Z. M. Mao F. Jahanian J. Nazario RAID 2007 Presented by Mike.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Face Recognition: An Introduction
Leveraging Asset Reputation Systems to Detect and Prevent Fraud and Abuse at LinkedIn Jenelle Bray Staff Data Scientist Strata + Hadoop World New York,
Data Reduction via Instance Selection Chapter 1. Background KDD  Nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable.
Spam Detection Ethan Grefe December 13, 2013.
CSE 185 Introduction to Computer Vision Face Recognition.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
Optimal Component Analysis Optimal Linear Representations of Images for Object Recognition X. Liu, A. Srivastava, and Kyle Gallivan, “Optimal linear representations.
A NOVEL METHOD FOR COLOR FACE RECOGNITION USING KNN CLASSIFIER
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Supervisor: Nakhmani Arie Semester: Winter 2007 Target Recognition Harmatz Isca.
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
2D-LDA: A statistical linear discriminant analysis for image matrix
Classification using Co-Training
Classification Results for Folder Classification on Enron Dataset.
Face detection and recognition Many slides adapted from K. Grauman and D. Lowe.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
School of Computer Science & Engineering
University of Ioannina
LECTURE 10: DISCRIMINANT ANALYSIS
Analyzing Behavioral Features for Classification
Lecture 8:Eigenfaces and Shared Features
KDD 2004: Adversarial Classification
Categorizing networks using Machine Learning
Singular Value Decomposition
Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.
Techniques for studying correlation and covariance structure
LECTURE 09: DISCRIMINANT ANALYSIS
GANG: Detecting Fraudulent Users in OSNs
Data Pre-processing Lecture Notes for Chapter 2
Presentation transcript:

Analyzing Behavioral Features for Classification

Steve Martin, Anil Sewani, Blaine Nelson, Karl Chen, and Anthony Joseph {steve0, anil, nelsonb, quarl, University of California at Berkeley

The Problem: Abuse has become globally ubiquitous –By 2006, traffic is expected to surge to 60 billion messages daily. However, spam accounts for half the sent on a daily basis worldwide. Nearly all of the most virulent worms of 2004 spread by . system abuse results in huge damage costs.

Current Analysis Many current methods for detecting abuse examine characteristics of incoming . Example: Spam Detection –Calculate statistical features on received mail and classify each message separately. Example: Virus Scanning –Generate a hash value on each incoming message, compare with stored database of values. –Signatures must be predetermined by human analyst. Can be effective, but room for improvement.

Our Approach Huge corpus of ignored data: outgoing ! –Can’t profile user behavior with incoming . –Outgoing contains this information. Calculate features on outgoing . –Observe a wide variety of statistics. Build a statistical understanding of user behavior. –Use to classify sent by individual users. –Can detect sudden changes in behavior, such as worm/spam activity.

Ex. Outgoing Features Per- Features Contains HTML? Contains Scripts? Contains Images? Contains Links? MIME Types of Attachments Number of Attachments Number of Words in Body Number of Words in Subject Number of Chars in Subject... Per-User Features (calc’d over a window of ) Frequency of Sending No. of Unique ‘To’ Addr. No. of Unique ‘From’ Addr. Ratio s w/ Attachments Average Word Length Avg. No. of Words/Body Avg. No. of Words/Subject Variance in Word Length Variance in No. Words/Body...

1. Histogram Analysis Histograms of separate users over specific features allow similarity estimation. Example below: on left, two users, same feature. On right, difference between values. –Shows how these users differ over this feature. –Can use to detect differences in behavior between these two users.

Per-Feature Histograms

2. Covariance Analysis Goal: identify features that vary most significantly with the labels. Method 1: Principal Component Analysis (PCA) –Determines a linear combination of relevant features that maximize variance. –Does not take labels or redundancy into account. Method 2: Directions of Max Covariance –Determines directions in feature space that maximize the covariance between data and labels. –Modified to take potential feature redundancy into account.

Greedy Feature Ranking Rank features with a simple greedy approach using Directions of Max Covariance: –Rank features by their contribution to the first principal component of covariance matrix: cov[data,labels Feature Ranking Algorithm Set F = all features While F is not empty: CovMat = Empirical Covariance Matrix V = principle component vector of CovMat via SVD. Select feature f from principle component of V Modify (deflate) CovMat to eliminate redundancy F = F - f

Feature Ranking Results

Application: Worm Detection Can apply statistical learning on outgoing to detect/prevent novel worm propagation. –Success depends on ability of features to identify anomalous behavior. Constructed training/test sets of real traffic artificially ‘infected’ with viruses. Applied feature selection techniques, then tested with different models.

Example Results Features added greedily using selection algorithm. Graphs show exists an optimal set of features, beyond which performance decreases. Support Vector MachinesNaïve Bayes Classifier

Conclusions and Future Work Conclusion: analysis of behavior could have many applications. –Feature selection is extremely important to model performance. In the future, study effects of feature selection on classification accuracy for other statistical models Try similar analysis on existing anti-spam solutions. Cluster user behavior into sets of common models describing general behavior patterns.