Ensemble Learning for Low-level Hardware-supported Malware Detection

Ensemble Learning for Low-level Hardware-supported Malware Detection
Khaled N. Khasawneh*, Meltem Ozsoy***, Caleb Donovick**, Nael Abu-Ghazaleh*, and Dmitry Ponomarev** * University of California, Riverside, ** Binghamton University, *** Intel Corp. RAID 2015 – Kyoto, Japan, November 2015

RAID 2015 – Kyoto, Japan, November 2015
Malware Growth McAfee Lab Over 350M malware programs in their malware zoo 387 new threat every minute RAID 2015 – Kyoto, Japan, November 2015

Malware Detection Analysis
Static analysis Search for signatures in the executable Can detect all known malware programs with no false alarms Can't detect metamorphic malware, polymorphic malware, or targeted attacks RAID 2015 – Kyoto, Japan, November 2015

Malware Detection Analysis
Static analysis Search for signatures in the executable Can detect all known malware programs with no false alarms Can't detect metamorphic malware, polymorphic malware, or targeted attacks Dynamic analysis Monitors the behavior of the program Can detect metamorphic malware, polymorphic malware, and targeted attacks Adds substantial overhead to the system and have false positives RAID 2015 – Kyoto, Japan, November 2015

Two-level malware detection framework
RAID 2015 – Kyoto, Japan, November 2015

Two-Level Malware Detection
MAP was introduced by Ozsoy el al. (HPCA 2015) Explored a number of sub-semantic features vectors Single hardware supported detector Detect malware online (In real time) Two stage detection RAID 2015 – Kyoto, Japan, November 2015

Contributions of this work
Better hardware malware detection using ensemble of detectors specialized for each type of malware Metrics to measure resulting advantages of using two-level malware detection framework RAID 2015 – Kyoto, Japan, November 2015

Evaluation Methodology: workloads, features, performance measures

Data Set & Data Collection
Source of programs Malware MalwareDB 3,690 total malware programs Regular Windows system binaries Other applications like Winrar, Notepad++, Acrobat Reader Total Training Testing Cross-Validation Backdoor 815 489 163 Rogue 685 411 137 PWS 558 335 111 Trojan 1123 673 225 Worm 473 283 95 Regular 554 332 Dynamic trace Windows 7 virtual machine Firewall and security services were all disabled Pin tool was used to collect the features during execution RAID 2015 – Kyoto, Japan, November 2015

Feature Space Instruction mix INS1: frequency of instruction categories INS2: frequency of most variant opcodes INS3: presence of instruction categories INS4: presence of most variant opcodes Memory reference patterns MEM1: histogram (count) of memory address distances MEM2: binary (presence) of memory address distances Architectural events ARCH: Total number of memory reads, memory writes, unaligned memory access, immediate branches and taken branches RAID 2015 – Kyoto, Japan, November 2015

Detection Performance Measures
Sensitivity: Percent of malware that was detected (True positive rate) Specificity: Percent of correctly classified regular programs (True negative rate) Receiver Operating Characteristic (ROC) Curve Summaries the prediction performance for range of detection thresholds Area Under the Curve (AUC) Traditional performance metric for ROC curve RAID 2015 – Kyoto, Japan, November 2015

Specializing the Detectors for different malware types

Constructing Specialized Detectors
Specialized detectors for each malware type were trained only with the data of that type Supervised learning with logistic regression was used MEM1 Detectors RAID 2015 – Kyoto, Japan, November 2015

General vs. Specialized Detectors
Backdoor PWS Rogue Trojan Worm INS1 General 0.713 0.909 0.949 0.715 0.705 Specialized 0.892 0.962 0.727 0.819 INS2 0.905 0.946 0.993 0.768 0.810 0.895 0.954 0.976 0.782 0.984 INS3 0.837 0.924 0.527 0.761 0.840 0.888 0.991 0.808 0.852 INS4 0.866 0.868 0.914 0.788 0.830 0.891 0.941 0.798 0.869 MEM1 0.729 0.893 0.424 0.650 0.961 0.921 0.867 0.871 MEM2 0.833 0.947 0.903 0.843 0.979 0.931 ARCH 0.702 0.919 0.965 0.763 0.602 0.686 0.942 0.970 0.795 0.560 RAID 2015 – Kyoto, Japan, November 2015

Is There an Opportunity?
General Specialized Difference Backdoor 0.8662 0.8956 0.0294 PWS 0.8684 0.9795 0.1111 Rogue 0.9149 0.9937 0.0788 Trojan 0.7887 0.8676 0.0789 Worm 0.8305 0.9842 0.1537 Average 0.8537 0.9441 0.0904 Best General (INS4) Best Specialized per Type RAID 2015 – Kyoto, Japan, November 2015

Ensemble Detectors RAID 2015 – Kyoto, Japan, November 2015

Ensemble Learning Multiple diverse base detectors Different learning algorithm Different data set Combined to solve a problem RAID 2015 – Kyoto, Japan, November 2015

Decision Functions Or’ing High Confidence Or’ing RAID 2015 – Kyoto, Japan, November 2015

Decision Functions Majority voting Stacking RAID 2015 – Kyoto, Japan, November 2015

Ensemble Detectors General Ensemble Combines multiple general detectors Best of INS, MEM, ARCH Specialized Ensemble Combines the best specialized detector for each malware type Mixed Ensemble Combines the best general detector with the best specialized detectors from the same features vector RAID 2015 – Kyoto, Japan, November 2015

Offline Detection Effectiveness
Decision Function Sensitivity Specificity Accuracy Best General - 82.4% 89.3% 85.1% General Ensemble Or’ing 99.1% 13.3% 65.0% High Confidence 80.7% 92.0% Majority Voting 83.3% 92.1% 86.7% Stacking 96.0% 86.8% Specialized Ensemble 100% 5% 51.3% 94.4% 94.7% 94.5% 95.8% 95.9% Mixed Ensemble 84.2% 70.6% 78.8% 81.3% 82.5% RAID 2015 – Kyoto, Japan, November 2015

Online Detection Effectiveness
A decision is made after each 10,000 committed instructions Exponentially Weighted Moving Average (EWMA) to filter false alarms Sensitivity Specificity Accuracy Best General 84.2% 86.6% 85.1% General Ensemble (Stacking) 77.1% 94.6% 84.1% Specialized Ensemble (Stacking) 92.9% 92.0% 92.3% Mixed Ensemble (Stacking) 85.5% 90.1% 87.4% RAID 2015 – Kyoto, Japan, November 2015

Metrics to Assess Relative Performance of two-Level Detection framework RAID 2015 – Kyoto, Japan, November 2015

Metrics Work Advantage Time Advantage Detection Performance RAID 2015 – Kyoto, Japan, November 2015

Online Detection Effectiveness
A decision is made after each 10,000 committed instructions Exponentially Weighted Moving Average (EWMA) to filter false alarms Sensitivity Specificity Accuracy Best General 84.2% 86.6% 85.1% General Ensemble (Stacking) 77.1% 94.6% 84.1% Specialized Ensemble (Stacking) 92.9% 92.0% 92.3% Mixed Ensemble (Stacking) 85.5% 90.1% 87.4% RAID 2015 – Kyoto, Japan, November 2015

Time & Work Advantage Results
Time Advantage Work Advantage RAID 2015 – Kyoto, Japan, November 2015

Hardware Implementation
Physical design overhead Area % (Ensemble), 0.3% (General) Power 1.5% (Ensemble), 0.1% (General) Cycle time 9.8% (Ensemble), 1.9% (General) RAID 2015 – Kyoto, Japan, November 2015

Conclusions & Future Work
Ensemble learning with specialized detectors can significantly improve detection performance Hardware complexity increases, but several optimizations still possible Some features are complex to collect; simpler features may carry same information Future work: Demonstrate a fully functional system Study how attackers could evolve and adversarial machine learning RAID 2015 – Kyoto, Japan, November 2015

Thank you! Questions? RAID 2015 – Kyoto, Japan, November 2015

Ensemble Learning for Low-level Hardware-supported Malware Detection

Similar presentations

Presentation on theme: "Ensemble Learning for Low-level Hardware-supported Malware Detection"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Ensemble Learning for Low-level Hardware-supported Malware Detection

Similar presentations

Presentation on theme: "Ensemble Learning for Low-level Hardware-supported Malware Detection"— Presentation transcript:

Similar presentations

About project

Feedback