On Anomalous Hot Spot Discovery in Graph Streams

Slides:



Advertisements
Similar presentations
Partitional Algorithms to Detect Complex Clusters
Advertisements

Distributed Approximate Spectral Clustering for Large- Scale Datasets FEI GAO, WAEL ABD-ALMAGEED, MOHAMED HEFEEDA PRESENTED BY : BITA KAZEMI ZAHRANI 1.
Anomaly Detection in Communication Networks Brian Thompson James Abello.
The multi-layered organization of information in living systems
Eigenvalue and eigenvectors  A x = λ x  Quantum mechanics (Schrödinger equation)  Quantum chemistry  Principal component analysis (in data mining)
1 Social Influence Analysis in Large-scale Networks Jie Tang 1, Jimeng Sun 2, Chi Wang 1, and Zi Yang 1 1 Dept. of Computer Science and Technology Tsinghua.
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
OpenFOAM on a GPU-based Heterogeneous Cluster
Copyright 2006, Data Mining Research Laboratory An Event-based Framework for Characterizing the Evolutionary Behavior of Interaction Graphs Sitaram Asur,
On Computing Compression Trees for Data Collection in Wireless Sensor Networks Jian Li, Amol Deshpande and Samir Khuller Department of Computer Science,
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
1 In-Network PCA and Anomaly Detection Ling Huang* XuanLong Nguyen* Minos Garofalakis § Michael Jordan* Anthony Joseph* Nina Taft § *UC Berkeley § Intel.
© 2011 IBM Corporation IBM Research SIAM-DM 2011, Mesa AZ, USA, Non-Negative Residual Matrix Factorization w/ Application to Graph Anomaly Detection Hanghang.
Streaming Models and Algorithms for Communication and Information Networks Brian Thompson (joint work with James Abello)
Cumulative Violation For any window size  t  Communication-Efficient Tracking for Distributed Cumulative Triggers Ling Huang* Minos Garofalakis.
Illumination Normalization with Time-Dependent Intrinsic Images for Video Surveillance Yasuyuki Matsushita, Member, IEEE, Ko Nishino, Member, IEEE, Katsushi.
Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.
Application of Graph Theory to OO Software Engineering Alexander Chatzigeorgiou, Nikolaos Tsantalis, George Stephanides Department of Applied Informatics.
Algorithm: For all e E t, define X e = {w e if e G t, 1 - w e otherwise}. Measure likelihood of substructure S by. Flag S as anomalous if, where is an.
Color Transfer in Correlated Color Space Xuezhong Xiao, Computer Science & Engineering Department, Shanghai Jiao Tong University Lizhuang Ma., Computer.
Eigenfaces for Recognition Student: Yikun Jiang Professor: Brendan Morris.
Models of Influence in Online Social Networks
Intrusion and Anomaly Detection in Network Traffic Streams: Checking and Machine Learning Approaches ONR MURI area: High Confidence Real-Time Misuse and.
Attention and Event Detection Identifying, attributing and describing spatial bursts Early online identification of attention items in social media Louis.
A Distributed and Privacy Preserving Algorithm for Identifying Information Hubs in Social Networks M.U. Ilyas, Z Shafiq, Alex Liu, H Radha Michigan State.
A Statistical Anomaly Detection Technique based on Three Different Network Features Yuji Waizumi Tohoku Univ.
Presented By Wanchen Lu 2/25/2013
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.
Last Words COSC Big Data (frameworks and environments to analyze big datasets) has become a hot topic; it is a mixture of data analysis, data mining,
Optimal Parallelogram Selection for Hierarchical Tiling Authors: Xing Zhou, Maria J. Garzaran, David Padua University of Illinois Presenter: Wei Zuo.
Science & Technology Centers Program Center for Science of Information Bryn Mawr Howard MIT Princeton Purdue Stanford Texas A&M UC Berkeley UC San Diego.
Keyword Search on External Memory Data Graphs Bhavana Bharat Dalvi, Meghana Kshirsagar, S. Sudarshan PVLDB 2008 Reported by: Yiqi Lu.
Anomalous Node Detection in Time Series of Mobile Communication Graphs Leman Akoglu January 28, 2010.
Scalable and Efficient Data Streaming Algorithms for Detecting Common Content in Internet Traffic Minho Sung Networking & Telecommunications Group College.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. LogKV: Exploiting Key-Value.
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
1 Finding Constant From Change: Revisiting Network Performance Aware Optimizations on IaaS Clouds Yifan Gong, Bingsheng He, Dan Li Nanyang Technological.
Extending Spatial Hot Spot Detection Techniques to Temporal Dimensions Sungsoon Hwang Department of Geography State University of New York at Buffalo DMGIS.
Anomaly Detection via Online Over-Sampling Principal Component Analysis.
1 Distributed Detection of Network-Wide Traffic Anomalies Ling Huang* XuanLong Nguyen* Minos Garofalakis § Joe Hellerstein* Michael Jordan* Anthony Joseph*
BotGraph: Large Scale Spamming Botnet Detection Yao Zhao, Yinglian Xie, Fang Yu, Qifa Ke, Yuan Yu, Yan Chen, and Eliot Gillum Speaker: 林佳宜.
Last Words DM 1. Mining Data Steams / Incremental Data Mining / Mining sensor data (e.g. modify a decision tree assuming that new examples arrive continuously,
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
ISOMAP TRACKING WITH PARTICLE FILTER Presented by Nikhil Rane.
Chapter 7 Multivariate techniques with text Parallel embedded system design lab 이청용.
Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.
Non-Linear Dimensionality Reduction
Weekly Report- Reduction Ph.D. Student: Leo Lee date: Oct. 30, 2009.
Association Mining via Co-clustering of Sparse Matrices Brian Thompson *, Linda Ness †, David Shallcross †, Devasis Bassu † *†
Math 285 Project Diffusion Maps Xiaoyan Chong Department of Mathematics and Statistics San Jose State University.
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
A Tutorial on Spectral Clustering Ulrike von Luxburg Max Planck Institute for Biological Cybernetics Statistics and Computing, Dec. 2007, Vol. 17, No.
A Connectivity-Based Popularity Prediction Approach for Social Networks Huangmao Quan, Ana Milicic, Slobodan Vucetic, and Jie Wu Department of Computer.
Generalized and Hybrid Fast-ICA Implementation using GPU
Pagerank and Betweenness centrality on Big Taxi Trajectory Graph
Challenging Cloning Related Problems with GPU-Based Algorithms
A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.
Community Distribution Outliers in Heterogeneous Information Networks
Section 7.12: Similarity By: Ralucca Gera, NPS.
Pong: Diagnosing Spatio-Temporal Internet Congestion Properties
Smita Vijayakumar Qian Zhu Gagan Agrawal
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Graph-Based Anomaly Detection
Paper Reading Dalong Du April.08, 2011.
Resource Allocation for Distributed Streaming Applications
TensorFlow: A System for Large-Scale Machine Learning
Modeling IDS using hybrid intelligent systems
Marios Mattheakis and Pavlos Protopapas
Accelerating Regular Path Queries using FPGA
Presentation transcript:

On Anomalous Hot Spot Discovery in Graph Streams

Introduction Background We care about data stream of interactions between network participants. Social Network, Communication Network, etc. Abrupt changes in level and patterns of interaction of participants may be associated with critical events. A simple Illustration

Introduction Graph Stream Graph: E.g., SNS, Communication Net: Node – User; Edge – User Interaction; Stream: edge sequence -> (Node A – Node B : timestamp),… Hot spot: a node of such abrupt changes: (a) high activity level (b) patterns of activity at specific time periods, associated with anomalous or critical events in the underlying network. Application Scenarios SN: A person got popular. SN: Your follower could be a spammer

Introduction Basic idea – Localized Principal Component Analysis(PCA) Adjacency matrix should capture edge correlations between the target node and the node in its neighborhood/locality. Analyze edge correlation structure of a node using PCA Changes in absolute levels of activity – Dominant Eigenvalue Local edge correlation patterns – Dominant Eigenvector Challenging problems Anomaly over different time granularity Computing Pressure of PCA Stream Update High Dimension

Model Framework Graph of Temporal Network: G(t) = (N(t), A(t)) Assumptions: A sequence of edges is continuously received over time. The set of nodes changes over time. N(t) is the set of all distinct nodes in the stream at time t. A(t) is a sequence of edges corresponding to all edges received so far. A(t) may contain repetitions Model Intuition Quantify interaction level and pattern (measure edges). LEVEL: Model decay of time Provide greater importance/ weight to recent edges. PATTERN: Measure temporal edge arrival correlation of target node Use pairwise product.

Model Framework

HotSpot Algorithm

Computational Challenges Principal components analysis Power Iteration for Eigen-problem Decay-based approach All matrices, eigenvalues, eigenvectors need to be updated. Lazy update technique Absent new arrivals, updates to the quantities aforementioned can be expressed purely as a function of the quantities at t’(<t) and the value of (t-t’) No need to explicitly update matrix value because of time decay. We don’t monitor unusual inactivity. When edge (i,j) arrives, the statistics of only nodes i and j need to be updated. Scales well. Could be distributed if data segmented properly.

Experimental Results Experimental Setting Data sets: DBLP Data Set: 1942 – 2012, author pair as edges, nodes of an author pair being different. 1,141,301 authors, 1,690,933 papers and 7,778,687 author pairs in total. Internet Movie Database (IMDB) Data Set: 1892 – 2012, director – actor pair, director node would have larger S(i,t) set. 1,008,978 records, 2,214,210 nodes and 13,529,524 edges in total. Half-life being 1,2,4,8 years and all of them for multi-granularity analysis. Algorithms and Implementation: HotSpot algorithm implementation: C++. Eigen-solver: Intel Math Kernel Library(MKL) 11.0 update 1 : optimized LAPACK.LAPACK Nvidia CUDA 5.0 SDK: parallelized linear algebra function(CUBLAS). Computing unit: Core 3.10GHz, 16GB of RAM.

Experimental Results Case study David Butler, Director Half-life being 1 year, identified as hot spots in 1929, 1934, 1943, 1949, 1956 and 1962, temporary bursts of production. Half-life being 2 years, and , active period. Half-life being 4 years, , peak period in career. Half-life being 8 years, not detected. Al Pacino, Actor Detected 2 out of 3 times when he directed films in 1996, Thomas S. Huang, Computer Scientist Half-life being 1 year, 1997, 1998, 2001, 2006, 2007, 2008 Half-life being 2 years, , Over 2 years, undetected. In total, we found 5589 hot spots in DBLP and hot spots in IMDB for all half-life values.

Experimental Results Performance Evaluation – Efficiency Tests DBLP IMDB

Experimental Results Performance Evaluation – Space Overhead Tests DBLP IMDB

Thanks! Q&A?