Presentation is loading. Please wait.

Presentation is loading. Please wait.

Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS 2010 1.

Similar presentations


Presentation on theme: "Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS 2010 1."— Presentation transcript:

1 Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS 2010 1

2 Outline Introduction Training Data Scarcity Exploiting Global Knowledge Evaluation 2

3 Properties of Anomaly Detection Pros – Unknown attacks can be identified automatically – Without any a priori knowledge about the application. – Need not manually analyze applications composed of hundreds of components Cons – Tendency to produce a non-negligible amount of false positives – Critically rely upon the quality of enough training data used to construct their models 3

4 Motivation Web application component invocations are non-uniformly distributed For those components, it is often impossible to gather enough training data to accurately model their normal behavior No proposals exist that satisfactorily address the problem 4

5 Contributions Provide evidence for that traffic is distributed in a non-uniform fashion Propose an approach to address the problem of undertraining by using global knowledge Evaluate the proposed approach on a large data set of real-world traffic from many web applications 5

6 Outline Introduction Training Data Scarcity Exploiting Global Knowledge Evaluation 6

7 Summary of Notation Notations – A: a set of web applications – R: a set of resource paths or components – P: parameters – Q: requests Each request is represented by the tuple 7

8 Summary of Notation (cont’d) The set of models associated with each unique parameter instance can be represented as a tuple: The knowledgebase of an anomaly detection system trained on web application is denoted by 8

9 Multi-model Approach A profile for a given parameter is the tuple – describe normal intervals for integers and string lengths – models character strings as a ranked frequency histogram, or Idealized Character Distribution (ICD), – models sets of character strings by inducing a Hidden Markov Model (HMM). – models parameter values as a set of legal tokens 9

10 The Problem Non-uniform training data In the case of low-traffic applications – the rate of client requests is inadequate to allow models to train in a timely manner. In the case of high-traffic applications – a large subset of resource paths might fail to receive enough requests 10

11 Non-uniform training data 11

12 Outline Introduction Training Data Scarcity Exploiting Global Knowledge Evaluation 12

13 Exploiting Global Knowledge Parameters of the same type tend to induce model compositions that are similar to each other The goal is substituting profiles for similar parameters of the same type The proposed method is composed of three phases – Enhanced training – Building profile knowledge bases – Mapping undertrained profiles to well-trained profiles 13

14 14

15 Phase I: Enhanced training Generate undertrained profiles – Let denote a sequence of client requests containing parameter p for a i – Randomly sampled κ-sequences, where κ can take values in Each of the resulting profiles is then added to a knowledge base Each model monitors its stability during the training phase Well trained, or stable, profile is stored in a knowledge base 15

16 Phase II: Building profile knowledge bases Merge a set of knowledge bases as the undertrained profile database Profile clustering is performed in in order to time- optimize query execution The resulting clusters of profiles in are denoted by An agglomerative hierarchical clustering algorithm using group average linkage was applied 16

17 Distance Measure More formally, the distance between the profiles c i and c j is defined as: where is the distance function 17

18 Distance Functions 18

19 Phase III: Mapping undertrained profiles to well-trained profiles The mapping is implemented as follows – A nearest-neighbor match is performed between and – A nearest-neighbor match is performed between and the members of to discover the undertrained profile at minimum distance from – Well-trained profile is substituted for 19

20 Mapping Quality 20

21 Mapping Quality Let be a mapping from an undertrained cluster to the maximum number of elements in that cluster that map to the same cluster in C The robustness metric ρ is then defined as And where is a minimum robustness threshold 21

22 Outline Introduction Training Data Scarcity Exploiting Global Knowledge Evaluation 22

23 Experimental Setting HTTP connection observed over a period of approximately three months A portion of the resulting flows were then filtered using Snort to remove known attacks The data set contains 823 distinct web applications, 36,392 unique components, 16,671 unique parameters, and 58,734,624 HTTP requests 23

24 Profile clustering quality 24

25 Profile mapping robustness 25

26 Detection accuracy 100,000 attacks 26

27 Conclusion Have identified that non-uniform web client access distributions cause model undertraining Propose the use of global knowledge bases of well- trained profiles to remediate a local scarcity of training data 27


Download ppt "Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS 2010 1."

Similar presentations


Ads by Google