Presentation is loading. Please wait.

Presentation is loading. Please wait.

CISC 879 - Machine Learning for Solving Systems Problems Presented by: Akanksha Kaul Dept of Computer & Information Sciences University of Delaware SBMDS:

Similar presentations


Presentation on theme: "CISC 879 - Machine Learning for Solving Systems Problems Presented by: Akanksha Kaul Dept of Computer & Information Sciences University of Delaware SBMDS:"— Presentation transcript:

1 CISC 879 - Machine Learning for Solving Systems Problems Presented by: Akanksha Kaul Dept of Computer & Information Sciences University of Delaware SBMDS: an interpretable string based malware detection system using SVM ensemble with bagging Yanfang Ye, Lifei Chen, Dingding Wang, Tao Li, Qingshan Jiang, Min Zhao

2 CISC 879 - Machine Learning for Solving Systems Problems Urgent need to detect malicious executables Major Threats Metamorphic Executables Reprograms itself Capable of infecting two OS. Polymorphic Executables Emulates as Non-malicious code Unseen Executables MOTIVATION

3 CISC 879 - Machine Learning for Solving Systems Problems Need of the Hour SBMDS String Based Malware Detection System What this system is exactly all about?? Performs Interpretable String Analysis Interpretable string is line of codes in a program which contains both API execution calls and important semantic strings representing the intent and goal of the program writer.

4 CISC 879 - Machine Learning for Solving Systems Problems Interpretable String??? Eg: Worm “Nimda ” “html script language = ‘javascript’ window.open(‘readme.eml’)” Another Example: “&gameid= %s&pass=%s; myparentthreadid=%d; myguid=%s” But all Strings are not interpretable Eg: “!0&0h0m0o0t0y0” “*3d%3dtgyhjij”,

5 CISC 879 - Machine Learning for Solving Systems Problems Major Steps to perform Constructing the interpretable strings by developing a feature parser. Performing feature selection to select informative strings. Using SVM ensemble with bagging to construct the classifier. Conducting the malware detector, also predict the exact type of the malware.

6 CISC 879 - Machine Learning for Solving Systems Problems Step 1 Develop Feature parser 39,838 executable collected from Kingsoft Anti-virus lab. All executables are PE files. Extract static features API calls from import table. Strings carrying semantic interpretation.

7 CISC 879 - Machine Learning for Solving Systems Problems SAMPLE (Backdoor-Redgirl.exe) ‘%s’ goto delete” always implicates that the malware may generate the “.bat” file to suicide

8 CISC 879 - Machine Learning for Solving Systems Problems Step 2 Feature Selection Selects only interpretable strings from the huge set of strings obtained from previous step. Assign these strings as signatures of the PE files.

9 CISC 879 - Machine Learning for Solving Systems Problems Step 3 Using SVM to CLASSIFY Why SVM ?? Have showed state-of-art results in classification problem. Problem: training complexity of SVM dependent on size of data set.

10 CISC 879 - Machine Learning for Solving Systems Problems Problem Training Accuracy becomes Constant when size of dataset reaches 3000

11 CISC 879 - Machine Learning for Solving Systems Problems Curse of Dimensionality?? Problem caused by the exponential increase in volume of data. How does SVM deals with “Curse of Dimensionality” Solution: By Using SVM ensemble & Bagging SVM ensemble and Bagging???

12 CISC 879 - Machine Learning for Solving Systems Problems 3.1 SVM Ensemble with Bagging Ensemble is a set of classifiers whose individual decisions are combined in some way to classify new samples. Bagging technique on the training set “BAGGING” (Bootstrap AGGregating) Uniform sampling of training data set

13 CISC 879 - Machine Learning for Solving Systems Problems 3.2 Multi-Classification Various classes of Malwares. To select the identical values from two different classes method of “MAJORITY VOTING” is used. Smallest index is chosen 1= Backdoors 2= Spywares 3= Trojans 4= Worms 0= Benign files

14 CISC 879 - Machine Learning for Solving Systems Problems STEP 4: Malware Detection Unknown variants of malwares are used. Malicious or not. To which class Malware belongs to.

15 CISC 879 - Machine Learning for Solving Systems Problems System Architecture 1. Feature Parser 2. Feature Selection 3. SVM Ensemble Classifier 4. Malware Detector

16 CISC 879 - Machine Learning for Solving Systems Problems Reason why I Chose This paper Comparisons With the Popular Anti- Virus Software. Points of Comparisons: 1. Detecting Known Variants of Malware. 2. Detecting Unknown Variants. 3. Efficiency (Detection Time). 4. Number of False positive Detections.

17 CISC 879 - Machine Learning for Solving Systems Problems Detecting Known Variants

18 CISC 879 - Machine Learning for Solving Systems Problems Detecting Unknown Variants

19 CISC 879 - Machine Learning for Solving Systems Problems Efficiency (Detection Time)

20 CISC 879 - Machine Learning for Solving Systems Problems Number of False Positives

21 CISC 879 - Machine Learning for Solving Systems Problems Conclusion This system has been already incorporated into the scanning tool of a commercial Anti- Virus software. Anti-Virus Name not Disclosed.

22 CISC 879 - Machine Learning for Solving Systems Problems Questions?????

23 CISC 879 - Machine Learning for Solving Systems Problems All Well that Ends Well THANK YOU


Download ppt "CISC 879 - Machine Learning for Solving Systems Problems Presented by: Akanksha Kaul Dept of Computer & Information Sciences University of Delaware SBMDS:"

Similar presentations


Ads by Google