Presentation is loading. Please wait.

Presentation is loading. Please wait.

Anomaly Detection Using Symmetric Compression Benjamin Arai & Chris Baron Computer Science and Engineering Department University of California - Riverside.

Similar presentations


Presentation on theme: "Anomaly Detection Using Symmetric Compression Benjamin Arai & Chris Baron Computer Science and Engineering Department University of California - Riverside."— Presentation transcript:

1 Anomaly Detection Using Symmetric Compression Benjamin Arai & Chris Baron Computer Science and Engineering Department University of California - Riverside barai@cs.ucr.edu cbaron@cs.ucr.edu

2 What are anomalies? Something that is peculiar, irregular, abnormal, and difficult to classify with the surrounding data Anomalies are subject to interpretation Two anomalies can look completely different from one another

3 Why is this important? Security –Abnormal activity –Intrusion detection Health –Atypical rhythmic patterns (i.e. heart beat, breathing) Equities and Financial Data Detection in general

4 Motivation Searching for a specific pattern is relatively trivial for a computer (at least in linear time), and has been well researched (I.e. KMP, Boyer-Moore, Edit Distance) How does a computer detect surprising patterns without being told in advance what they look like? Utilize Kolmogorov complexity with compression!

5 Kolmogorov complexity and information distance K(x) – Smallest program that prints out x K(x|y) – Smallest program that prints out x given y as an input Information distance – How different are x and y? –Edit distance? –Normalize

6 Normalized information distance (K(x|y) + K(y|x)) / (K(xy) –Close to 0 then very similar –Close to 1 then very different Compression does a good job at estimating Kolmogorov complexity We use compression to find anomalies

7 How compression works Create a dictionary that maps long sequences to short ones The more these long sequences are used, the better the compression (works well with text) i.e: –the = 01 –and = 10 –algorithm = 11

8 Compression dictionary example

9 How compression works Bzip2 –Burrow-Wheeler transform –Huffman Encoding –Compressed with dictionary These methods combined create an efficient estimate of Kolmogorov complexity

10 Our algorithm Split input string into equal sections –How many sections? Compress each section, and sections containing anomalies should appear as outliers (by looking at their size normalized) For each section containing an anomaly, split and compare against section most likely not containing an anomaly

11 Pseudo code Initial_cuts(data) { do { split(data, number of splits); compress splits; number of splits++; } while(no normalized splits > threshold) base_check = minimal normalized coefficient for each normalized split > threshold { drill_down(normalized split); } Normalized split x =

12 How it works (Example) Initial split 1.0

13 How it works Second split 0.752880921895 0.7471190781051.5

14 How it works Final split 0.678651685393 0.7182022471910.6031460674162.0

15 Preliminary results 0.165094 1.367925 0.04717 2.287736

16 Preliminary results 0.141892 0.628378 1.682432 0.02027 1.317568 2.209459

17 Preliminary results 0.169903 1.019417 0.776699 2.5

18 Preliminary results 0.909091 0.181818 2

19 Results (Partial Epilepsy 1)

20

21 0.678652 0.718202 0.603146 2

22 Results (Partial Epilepsy 2)

23

24 0.630324 2 0.739353

25 Multi-anomaly detection

26 Future research Tests extended to using binary data (i.e. pictures, video, etc.) Finding anomalies in pairs of data –It is hot out –Chris is wearing a coat –It is hot out, and Chris is wearing a coat Anomaly detection refinement?

27 Drill down Drill_down(data) { a = data(0…n/2); b = data(n/2+1…n); if(data < size_threshold) { add data’s coordinates to link list and return; } else if(a is similar to b) { Drill_down(a); Drill_down(b); } else if(a is closer to mean) { Drill_down(b); } else { Drill_down(a); } Drills down splits containing anomalies to get a closer approximation Mean = slices of split most likely not to contain an anomaly of sizes data/2

28 Questions? If you have any questions, please visit http://www.google.com

29 References M. Li, X. Chen, X. Li, B. Ma, and P. Vitanyi, The Similarity Metric, 2002 M. Burrows and D.J. Wheeler, A Block-sorting Lossless Data Compression Algorithm, digital Systems Research Center, Palo Alto, CA, 1994 E. Keogh, S. Lonardi, and B. Chiu, Finding Surprising Patterns in a Time Series Database in Linear Time and Space, University of California Riverside, Riverside, CA, 2002


Download ppt "Anomaly Detection Using Symmetric Compression Benjamin Arai & Chris Baron Computer Science and Engineering Department University of California - Riverside."

Similar presentations


Ads by Google