Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pallabi Parveen, Nate McDaniel, Varun S. Hariharan, Bhavani Thuraisingham and Latifur Khan Department of Computer Science at The University of Texas at.

Similar presentations


Presentation on theme: "Pallabi Parveen, Nate McDaniel, Varun S. Hariharan, Bhavani Thuraisingham and Latifur Khan Department of Computer Science at The University of Texas at."— Presentation transcript:

1 Pallabi Parveen, Nate McDaniel, Varun S. Hariharan, Bhavani Thuraisingham and Latifur Khan Department of Computer Science at The University of Texas at Dallas

2  Insider Threat  LZW & Quantized Dictionary  Concept Drift  Experiments & Results

3 An Insider is someone who exploits, or has the intention to exploit, his/her legitimate access to assets for unauthorised purposes. For example, over time, legitimate users may enter commands that read or write private data, or install malicious software

4 CComputer Crime and Security Survey 2001 $$377 million financial losses due to attacks 449% reported incidents of unauthorized network access by insiders WWikiLeaks Breach Highlights Insider Security Threat--Even the toughest security systems sometimes have a soft center that can be exploited by someone who has passed rigorous screening http://www.scientificamerican.com/article.cfm?id =wikileaks-insider-threat

5  Reduce false alarm rate without sacrificing threat detection rate  Threat detection is challenging since insiders mask and adapt their behavior to resemble legitimate system.

6  Normal users have a repetitive sequence of commands, system calls etc..  A sudden deviation from normal behavior, raises an alarm indicating an insider threat  To find an insider threat We need to collect these repeated sequences of commands in an unsupervised fashion  First challenge: variability in sequence length Overcome: Generating a LZW dictionary with combinations of possible potential patterns in the gathered data using Lempel-Ziv- Welch algorithm (LZW)  Second Challenge: Huge size of the Dictionary Overcome: Compress the Dictionary

7  Using an ensemble of models increases the accuracy of threat anomaly detection  New data chunks create new models  Problem: Ensemble holds K models and there are K+1 Solution: Remove the least accurate model  Majority voting by all models used to determine the model that is performing the worst

8 Indexed the system calls with Unicode Anomaly? j System call/ command System Call/ Command Chunk i+1 Chunk i System log Testing on Data from week i+1 Online learning Gather Data from Chunk i Indexed the system calls with Unicode Unsupervised Sequence Learning Compressed the Dictionary (QD) Generate a LZW Dictionary (D) containing all possible patterns using Lempel-Ziv- welch Algorithm Incremental based Stream Mining Update the previous QD Update models

9 liftliftlifliftliftliftliftliftliftliftliftliftliftlift lift LZW Dictionary Quantized Dictionary Lossy compression Unlabeled data stream LZW li lif lift If Ift Iftl ft ftl ftli tl tli tlif

10

11  Incremental learning is used for continuous dictionary update  continuous data stream is partitioned into a sequence of discrete chunks (may contain several user sessions)  When a new chunk arrives, Generate a new LZW dictionary from the new chunk while merging with the previous Quantized dictionary.  Apply our compression technique (CM) on this new LZW dictionary to generate a new compressed Quantized dictionary (NQD). [this is the DYNAMIC approach]

12 LZW Dictionary OLD Quantized Dictionary (OQD) LZW Dictionary Session 1 Session 2 Session n LZW New Quantized Dictionary (NQD) compression Session 1 Session 2 Session n LZW

13  Given data test stream S and quantized dictionary QD = {qd1, qd2, …},  An anomaly is a phrase/pattern in the stream which is more than α edit distance from all the patterns in QD  Steps in identifying non-matching phrases  Compute edit distance matrix L for each phrase in dictionary and data stream S  If the edit distance is within α edit distance, delete the matching part from the stream S  Remaining patterns in the stream S is considered as anomaly

14  User command patterns shift over time  i.e. programmer slowly evolves into an advanced programmer  Changes in users’ habits should not be identified as anomalies  Attribute natural changes to concept drift  Concept drift can be added artificially and anomalies are still detected

15

16  drift = [.7071, 1.1180, 1.5811, 1.5811, 1.5811] Min/Max distributions = [.42929/.57071,.08820/.31180, 0/.25811, 0/.25811, 0/.25811]

17  Modified Naïve Bayes that uses incremental approach(NB-INC)*  Unsupervised ensemble approach (USSL-GG) that incrementally tests for anomalies and best performs with an ensemble size of 3 (*) R. A. Maxion, “Masquerade detection using enriched command lines,” in Proc. IEEE International Conference on Dependable Systems & Networks (DSN), 2003, pp. 5–14.

18

19 TPRFPRAccuracyTime(sec) DriftNB-INCUSSL-GGNB-INCUSSL-GGNB-INCUSSL-GGNB-INCUSSL-GGNB-INCUSSL-GGNB-INCUSSL-GG 0.0000010.340.490.120.100.800.850.340.440.340.4752.03.60 0.000010.360.580.120.090.790.870.360.500.360.5450.83.54 0.00010.370.510.110.100.820.860.370.450.370.4951.03.55 0.0010.380.500.110.100.810.850.380.440.380.4753.43.60

20

21

22

23

24

25

26

27  Ensemble based stream mining effectively detects insider threats while coping with evolving concept drift  Our approach adopts advantages from stream mining, compression and ensembles–  Compression gives unsupervised learning  Stream mining offered adaptive learning  Ensembles increase accuracy with concept drift

28 ApproachUn/SupervisedDriftInsider ThreatSequence JuSNYY MaxionSNYN LiuUNYY WangSNYN SzymanskiSNYY MasudSYNN ParveenUYYN USSL-GGUYYY

29  Update existing models based on user feedback  Update and refine models on ground truth when it is available

30


Download ppt "Pallabi Parveen, Nate McDaniel, Varun S. Hariharan, Bhavani Thuraisingham and Latifur Khan Department of Computer Science at The University of Texas at."

Similar presentations


Ads by Google