11/11/051 ME A Novel Technique for Learning Rare Events Margaret H. Dunham, Yu Meng, Jie Huang CSE Department Southern Methodist University Dallas, Texas.

Slides:

Advertisements

Similar presentations

Applications of one-class classification

Advertisements

Yinyin Yuan and Chang-Tsun Li Computer Science Department

Discrimination and Classification. Discrimination Situation: We have two or more populations  1,  2, etc (possibly p-variate normal). The populations.

Ziming Zhang, Yucheng Zhao and Yiwen Wan.  Introduction&Motivation  Problem Statement  Paper Summeries  Discussion and Conclusions.

Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.

Sensor-Based Abnormal Human-Activity Detection Authors: Jie Yin, Qiang Yang, and Jeffrey Junfeng Pan Presenter: Raghu Rangan.

1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)

9/15/2008 CTBTO Data Mining/Data Fusion Workshop 1 Spatiotemporal Stream Mining Applied to Seismic+ Data Margaret H. Dunham CSE Department Southern Methodist.

BIRCH: Is It Good for Databases? A review of BIRCH: An And Efficient Data Clustering Method for Very Large Databases by Tian Zhang, Raghu Ramakrishnan.

Date : 21 st of May, Shri Ramdeo Baba College of Engineering and Management Presentation By : Rimjhim Singh Under the Guidance of: Dr. M.B. Chandak.

Matlab Simulations of Markov Models Yu Meng Department of Computer Science and Engineering Southern Methodist University.

10/31/2012, METU Spatiotemporal Stream Mining using TRACDS Middle East Technical University October 31, 2012 Margaret H Dunham, Michael Hahsler, Yu Su,

4/24/09 - KSU Spatiotemporal Stream Mining Using EMM Margaret H. Dunham Southern Methodist University Dallas, Texas This material.

10/26/09, Wilfrid Laurier University 1 Temporal Relationship Among Clusters for Data Streams Margaret H. Dunham, Michael Hahsler, Doug Raiford Students:

HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Self-Correlating Predictive Information Tracking for Large-Scale Production Systems Zhao, Tan, Gong, Gu, Wambolt Presented by: Andrew Hahn.

Tree Clustering & COBWEB. Remember: k-Means Clustering.

© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.

1 Real Time, Online Detection of Abandoned Objects in Public Areas Proceedings of the 2006 IEEE International Conference on Robotics and Automation Authors.

Real Time Abnormal Motion Detection in Surveillance Video Nahum Kiryati Tammy Riklin Raviv Yan Ivanchenko Shay Rochel Vision and Image Analysis Laboratory.

Intrusion and Anomaly Detection in Network Traffic Streams: Checking and Machine Learning Approaches ONR MURI area: High Confidence Real-Time Misuse and.

Module 04: Algorithms Topic 07: Instance-Based Learning

6 am 11 am 5 pm Fig. 5: Population density estimates using the aggregated Markov chains. Colour scale represents people per km. Population Activity Estimation.

Intrusion Detection Jie Lin. Outline Introduction A Frame for Intrusion Detection System Intrusion Detection Techniques Ideas for Improving Intrusion.

VoIP Data IIIT Allahabad Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Dallas, Texas 75275, USA

10/24/081 Anomaly Detection Using Data Mining Techniques Margaret H. Dunham, Yu Meng, Donya Quick, Jie Huang, Charlie Isaksson CSE Department Southern.

Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.

11/26/07 – IRADSN’07 1 Stream Hierarchy Data Mining for Sensor Data Margaret H. Dunham SMU Dallas, Texas Vijay Kumar UMKC Kansas.

1 ENTROPY-BASED CONCEPT SHIFT DETECTION PETER VORBURGER, ABRAHAM BERNSTEIN IEEE ICDM 2006 Speaker: Li HueiJyun Advisor: Koh JiaLing Date:2007/11/6 1.

EDGE DETECTION IN COMPUTER VISION SYSTEMS PRESENTATION BY : ATUL CHOPRA JUNE EE-6358 COMPUTER VISION UNIVERSITY OF TEXAS AT ARLINGTON.

Discrete Random Variables. Numerical Outcomes Consider associating a numerical value with each sample point in a sample space. (1,1) (1,2) (1,3) (1,4)

Part II - Association Rules © Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II – Association Rules Margaret H. Dunham Department of.

A Passive Approach to Sensor Network Localization Rahul Biswas and Sebastian Thrun International Conference on Intelligent Robots and Systems 2004 Presented.

Protein motif extraction with neuro-fuzzy optimization Bill C. H. Chang and Author : Bill C. H. Chang and Saman K. Halgamuge Saman K. Halgamuge Adviser.

ECE-7000: Nonlinear Dynamical Systems Overfitting and model costs Overfitting  The more free parameters a model has, the better it can be adapted.

BIRCH: Balanced Iterative Reducing and Clustering Using Hierarchies A hierarchical clustering method. It introduces two concepts : Clustering feature Clustering.

CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Machine Learning Margaret H. Dunham Department of Computer Science and Engineering Southern.

DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.

Autonomous Robots Vision © Manfred Huber 2014.

Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,

07/03/06 - Tunisia1 ME Data Mining Research at SMU Margaret H. Dunham, DBGroup: Yu Meng, Jie Huang, Lin Lu, Donya Quick, Michael Pierce CSE Department.

 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A self-organizing map for adaptive processing of structured.

12/9/08, Sandia National Labs 1 Anomaly Detection Using Data Mining Techniques Margaret H. Dunham, Yu Meng, Donya Quick, Jie Huang, Charlie Isaksson CSE.

Data Stream Mining with Extensible Markov Model Yu Meng, Margaret H. Dunham, F. Marco Marchetti, Jie Huang, Charlie Isaksson October 18, 2006.

Database Management Systems, R. Ramakrishnan 1 Algorithms for clustering large datasets in arbitrary metric spaces.

November 21, 2013Computer Vision Lecture 14: Object Recognition II 1 Statistical Pattern Recognition The formal description consists of relevant numerical.

Bootstrapped Optimistic Algorithm for Tree Construction

Near repeat burglary chains: describing the physical and network properties of a network of close burglary pairs. Dr Michael Townsley, UCL Jill Dando Institute.

11/3/041 ME Extensible Markov Model Margaret H. Dunham, Yu Meng, Jie Huang CSE Department Southern Methodist University Dallas, Texas 75275

1 IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo, Jose G. Delgado-Frias Publisher: Journal of Systems.

Kalman Filter and Data Streaming Presented By :- Ankur Jain Department of Computer Science 7/21/03.

Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.

GraphiCon 2008 | 1 Trajectory classification based on Hidden Markov Models Jozef Mlích and Petr Chmelař Brno University of Technology, Faculty of Information.

Feature learning for multivariate time series classification Mustafa Gokce Baydogan * George Runger * Eugene Tuv † * Arizona State University † Intel Corporation.

Presented by Niwan Wattanakitrungroj

Data Science Algorithms: The Basic Methods

A Forest of Sensors: Using adaptive tracking to classify and monitor activities in a site Eric Grimson AI Lab, Massachusetts Institute of Technology

Ch9: Decision Trees 9.1 Introduction A decision tree:

Supervised Time Series Pattern Discovery through Local Importance

Computer Vision Lecture 12: Image Segmentation II

Lin Lu, Margaret Dunham, and Yu Meng

Image Segmentation Techniques

ADVANCED TOPICS IN DATA MINING CSE 8331 Spring 2010 Part I

CASE − Cognitive Agents for Social Environments

A Framework for Clustering Evolving Data Streams

Effective Replica Allocation

DATA MINING Introductory and Advanced Topics Part II - Clustering

Discovery of Significant Usage Patterns from Clickstream Data

Presentation transcript:

11/11/051 ME A Novel Technique for Learning Rare Events Margaret H. Dunham, Yu Meng, Jie Huang CSE Department Southern Methodist University Dallas, Texas This material is based upon work supported by the National Science Foundation under Grant No. IIS

11/11/052 Objectives/Outline Develop modeling techniques which can “learn/forget” past behavior of spatiotemporal events. Apply to prediction of rare events. nIntroduction nEMM Overview nEMM Applications to Rare Event Detection nFuture Work

11/11/053 Objectives/Outline Develop modeling techniques which can “learn/forget” past behavior of spatiotemporal events. Apply to prediction of rare events. nIntroduction nEMM Overview nEMM Applications to Rare Event Detection nFuture Work

11/11/054 Spatiotemporal Environment nEvents arriving in a stream nCan not look at a snapshot of the data. nAt any time, t, we can view the state of the problem at a site as represented by a vector of n numeric values: V t = V2V2 V2V2 …V2V2 S1S1 S 11 S 12 …S 1q S2S2 S 21 S 22 …S 2q …………… SnSn S n1 S n2 …S nq Time

11/11/055 Spatiotemporal Modeling nExample Applications: n Flood Prediction n Rare Event Detection – Network traffic, automobile traffic nRequirements n Capture Time n Capture Space n Dynamic n Scalable n Quasi-Real Time

11/11/056 Technique nSpatiotemporal modeling technique based on Markov models. nHowever – n Size of MM depends on size of dataset n The required structure of the MM is not known at the model construction time. n As the real world being modeled by the MM changes, so should the structure of the MM. Thus not only should transition probabilities change, but the number of states should be changed to more accurately model the changing world.

11/11/057 MM A first order Markov Chain is a finite or countably infinite sequence of events {E1, E2, … } over discrete time points, where Pij = P(Ej | Ei), and at any time the future behavior of the process is based solely on the current state A Markov Model (MM) is a graph with m vertices or states, S, and directed arcs, A, such that: nS ={N 1,N 2, …, N m }, and nA = {L ij | i  1, 2, …, m, j  1, 2, …, m} and Each arc, L ij = is labeled with a transition probability P ij = P(N j | N i ).

11/11/058 Problem with Markov Chains nThe required structure of the MC may not be certain at the model construction time. nAs the real world being modeled by the MC changes, so should the structure of the MC. nNot scalable – grows linearly as number of events. nMarkov Property nOur solution: n Extensible Markov Model (EMM) n Cluster real world events n Allow Markov chain to grow and shrink dynamically

11/11/059 Objectives/Outline Develop modeling techniques which can “learn/forget” past behavior of spatiotemporal events. Apply to prediction of rare events. nIntroduction nEMM Overview nEMM Applications to Rare Event Detection nFuture Work

11/11/0510 Extensible Markov Model (EMM) nTime Varying Discrete First Order Markov Model nNodes are clusters of real world states. nLearning continues during application phase. nLearning: n Transition probabilities between nodes n Node labels (centroid/medoid of cluster) n Nodes are added and removed as data arrives

11/11/0511 Related Work nSplitting Nodes in HMMs n Create new states by splitting an existing state n M.J. Black and Y. Yacoob,”Recognizing facial expressions in image sequences using local parameterized models of image motion”, Int. Journal of Computer Vision, 25(1), 1997, nDynamic Markov Modeling n States and transitions are cloned n G. V. Cormack, R. N. S. Horspool. “Data compression using dynamic Markov Modeling,” The Computer Journal, Vol. 30, No. 6, nAugmented Markov Model (AMM) n Creates new states if the input data has never been seen in the model, and transition probabilities are adjusted n Dani Goldberg, Maja J Mataric. “Coordinating mobile robot group behavior using a model of interaction dynamics,” Proceedings, the Third International Conference on Autonomous Agents (agents ’99), Seattle, Washington

11/11/0512 EMM vs AMM Our proposed EMM model is similar to AMM, but is more flexible: nEMM continues to learn during the application (prediction, etc.) phase. nThe EMM is a generic incremental model whose nodes can have any kind of representatives. nState matching is determined using a clustering technique. nEMM not only allows the creation of new nodes, but deletion (or merging) of existing nodes. This allows the EMM model to “forget” old information which may not be relevant in the future. It also allows the EMM to adapt to any main memory constraints for large scale datasets. nEMM performs one scan of data and therefore is suitable for online data processing.

11/11/0513 EMM Extensible Markov Model (EMM): at any time t, EMM consists of an MM and algorithms to modify it, where algorithms include: nEMMSim, which defines a technique for matching between input data at time t + 1 and existing states in the MM at time t. nEMMBuild algorithm, which updates MM at time t + 1 given the MM at time t and classification measure result at time t + 1. Additional algorithms are used to modify the model or for applications.

11/11/0514 EMMBuild Input: V t = : Observed values at n different locations at time t. G: EMM with m states at time t-1. N c :Current state at time t-1. Output: G: EMM graph at time t. N c :Current state at time t. if G = empty then// Initialize G, first input vector is the first state N 1 = V t ; CN 1 = 0; N c = N 1 ; else// update G as new input comes in foreach N i in G determine EMMSim(V t, N i ); let N n be node with largest similarity value, sim; if sim >= threshold then// update matching state information CN c = CN c + 1; if L cn exists CL cn = CL cn + 1; else create new transition L cn = ; CL cn = 1; N c = N n ; else // create a new state N m+1 represented by V t create new node N m+1 ; N m+1 = V t ; CN m+1 = 0; create new transition L c(m+1) = ; CL c(m+1) = 1; CN c = CN c + 1 ; N c = N m+1 ;

11/11/0515 EMMSim nFind closest node to incoming event. nIf none “close” create new node nLabeling of cluster is centroid/medoid of members in cluster nProblem n O(n) n BIRCH O(lg n) Requires second phase to recluster initial

11/11/0516 EMMBuild <18,10,3,3,1,0,0><17,10,2,3,1,0,0><16,9,2,3,1,0,0><14,8,2,3,1,0,0><14,8,2,3,0,0,0><18,10,3,3,1,1,0.> 1/3 N1 N2 2/3 N3 1/1 1/3 N1 N2 2/3 1/1 N3 1/1 1/2 1/3 N1 N2 2/3 1/2 N3 1/1 2/3 1/3 N1 N2 N1 2/2 1/1 N1 1

11/11/0517 EMMDecrement N2 N1N3 N5N6 2/2 1/3 1/2 N1N3 N5N6 1/6 1/3 Delete N2

11/11/0518 EMM Advantages nDynamic nAdaptable nUse of clustering nLearns rare event nScalable: n Growth of EMM is not linear on size of data. n Hierarchical feature of EMM nCreation/evaluation quasi-real time nDistributed / Hierarchical extensions

11/11/0519 Growth of EMM Servent Data

11/11/0520 EMM Performance – Growth Rate DataSim Threshold Ser went Jaccrd Dice Cosine Ovrlap22334 Ouse Jaccrd Dice Cosine Ovrlap11111

11/11/0521 EMM Performance – Growth Rate Minnesota Traffic Data

11/11/0522 Error Rates nNormalized Absolute Ratio Error (NARE) NARE = nRoot Means Square (RMS) RMS =

11/11/0523 EMM Performance - Prediction NARERMS No of States RLF EMM Th= Th= Th=

11/11/0524 EMM Water Level Prediction – Ouse Data

11/11/0525 Objectives/Outline Develop modeling techniques which can “learn/forget” past behavior of spatiotemporal events. Apply to prediction of rare events. nIntroduction nEMM Overview nEMM Applications to Rare Event Detection nFuture Work

11/11/0526 Rare Event nRare - Anomalous – Surprising nOut of the ordinary nNot outlier detection n No knowledge of data distribution n Data is not static n Must take temporal and spatial values into account n May be interested in sequence of events nEx: Snow in upstate New York is not rare n Snow in upstate New York in June is rare nRare events may change over time

11/11/0527 Rare Event Examples nThe amount of traffic through a site in a particular time interval as extremely high or low. nThe type of traffic (i.e. source IP addresses or destination addresses) is unusual. nCurrent traffic behavior is unusual based on recent precious traffic behavior. nUnusual behavior at several sites.

11/11/0528 What is a Rare Event? nNot an outlier n We don’t know anything about the distribution of the data. Even if we did the data continues changing. A model created based on a static view may not fit tomorrow’s data. nWe view a rare event as: n Unusual state of the network (or subset thereof). n Transition between network states which does not frequently occur. nBase rare event detection on determining events or transitions between events that do not frequently occur.

11/11/0529 Rare Event Examples – VoIP Traffic nThe amount of traffic through a site in a particular time interval as extremely high or low. nThe type of traffic (i.e. source IP addresses or destination addresses) is unusual. nCurrent traffic behavior is unusual based on recent precious traffic behavior. nUnusual behavior at several sites.

11/11/0530 Rare Event Detection Applications nIntrusion Detection nFraud nFlooding nUnusual automobile/network traffic

11/11/0531 Rare Event Detection Techniques nSignature Based n Created signatures for normal behavior n Rule based n Pattern Matching n State Transition Analysis nStatistical Based n Profiles of normal behavior nData Mining Base n Classification n Clustering

11/11/0532 EMM Rare Event Prediction – VoIP Traffic nPredict rare events at a specific site (switch) representing an area of the network. nUse: n Identify when rare transition occurs n Identify rare event by creation of new node nHierarchical EMM: Collect rare event information at a higher level by constructing an EMM of more global events from several sites there.

11/11/0533 Our Approach nBy learning what is normal, the model can predict what is not nNormal is based on likelihood of occurrence nUse EMM to build model of behavior nWe view a rare event as: n Unusual event n Transition between events states which does not frequently occur. nBase rare event detection on determining events or transitions between events that do not frequently occur. nContinue learning

11/11/0534 EMMRare nEMMRare algorithm indicates if the current input event is rare. Using a threshold occurrence percentage, the input event is determined to be rare if either of the following occurs: n The frequency of the node at time t+1 is below this threshold n The updated transition probability of the MC transition from node at time t to the node at t+1 is below the threshold

11/11/0535 Determining Rare nOccurrence Frequency (OF c ) of a node N c as defined by: OF c = nLikewise when determining what is meant by small for a transition probability, we should look at a normalized rather than actual value. We, thus, define the Normalized Transition Probability (NTP mn ), from one state, N m, to another, N n, as: NTP mn =

11/11/0536 Ongoing/Future Work nExtend to Emerging Patterns nIncorporate techniques to reduce False Alarms nExtend to Hierarchical/Distributed

11/11/0537 Conclusion We welcome feedback