Presentation is loading. Please wait.

Presentation is loading. Please wait.

Streaming Models and Algorithms for Communication and Information Networks Brian Thompson (joint work with James Abello)

Similar presentations


Presentation on theme: "Streaming Models and Algorithms for Communication and Information Networks Brian Thompson (joint work with James Abello)"— Presentation transcript:

1 Streaming Models and Algorithms for Communication and Information Networks Brian Thompson (joint work with James Abello)

2 Outline Introduction and Motivation A Streaming Model Algorithms Experimental Results Conclusions and Future Work Streaming Models and Algorithms for Communication and Information Networks Our Approach

3 Outline Introduction and Motivation A Streaming Model Algorithms Experimental Results Conclusions and Future Work Streaming Models and Algorithms for Communication and Information Networks Our Approach

4 Streaming Models and Algorithms for Communication and Information Networks Data: A network (G;T) G = (V,E) is a graph T is a set of time-stamped events corresponding to nodes or edges in G Goals: Identify recent correlated activity Measure influence between entities Challenges: Scalability – networks may be very large, limited space Efficiency – high data rate, time-sensitive information Variability – entities have different temporal dynamics Problem Description

5 Streaming Models and Algorithms for Communication and Information Networks Time-evolving graph model - sequence of “snapshots” Time series analysis t = 1t = 2t = 3t = 4 Related Work

6 Streaming Models and Algorithms for Communication and Information Networks Cascade model – set of seed nodes, information (product, news, virus) propagates through network Related Work

7 Outline Introduction and Motivation A Streaming Model Algorithms Experimental Results Conclusions and Future Work Streaming Models and Algorithms for Communication and Information Networks Our Approach

8 G is a graph T is a set of time-stamped events corresponding to nodes or edges in G SourceRecipientContentTimestamp Alice(public)“Fire at 2nd & Main!”Tuesday, 9:25am BobCheng(private message)Tuesday, 9:27am Cheng(public)“RT @Alice Fire...”Tuesday, 9:28am Alice Bob Cheng Devika Elina Streaming Models and Algorithms for Communication and Information Networks Data Model

9 (Node-centric) Alice Bob Cheng Devika Elina Streaming Models and Algorithms for Communication and Information Networks Data Model

10 (Edge-centric) Streaming Models and Algorithms for Communication and Information Networks Data Model Bob Cheng Alice Devika Elina

11 Streaming Models and Algorithms for Communication and Information Networks t1t1 t2t2 t3t3 t4t4 t5t5 0 S3S3 Renewal Theory

12 Streaming Models and Algorithms for Communication and Information Networks t1t1 t2t2 t3t3 t4t4 t5t5 0 t Renewal Theory

13 We model a stream of communication data from a node or across an edge as a renewal process Streaming Models and Algorithms for Communication and Information Networks x min x max Inter-Arrival Time Distribution Discrete-event sequence: t1t1 t2t2 t3t3 t4t4 t5t5 REneWal theory Approach for Real-time Data Streams The REWARDS Model

14 Given a stream of time-stamped events, we estimate the parameters of the renewal process for each node or edge based on the inter-arrival times Streaming Models and Algorithms for Communication and Information Networks x min x max Inter-Arrival Time Distribution REneWal theory Approach for Real-time Data Streams Discrete-event sequence: t1t1 t2t2 t3t3 t4t4 t5t5 The REWARDS Model

15 Outline Introduction and Motivation A Streaming Model Algorithms Experimental Results Conclusions and Future Work Streaming Models and Algorithms for Communication and Information Networks Our Approach

16 Streaming Models and Algorithms for Communication and Information Networks Goal: highlight recent activity Key idea: more recent = more relevant Challenge: The most frequent communicators will always seem “recent”, overshadowing others’ behavior. We call this time-scale bias. 8:00 am10:00 am12:00 pmNOW! alice1337 bob_iz_kewl User: Recency

17 Streaming Models and Algorithms for Communication and Information Networks Recency

18 Streaming Models and Algorithms for Communication and Information Networks Recency of Edge in Bluetooth Dataset Recency

19 Streaming Models and Algorithms for Communication and Information Networks Goal: measure influence of entity A on entity B Key idea: study pairwise (A,B)-gaps Challenge: More frequent communicators will tend to always have shorter “gaps”. 8:00 am10:00 am12:00 pmNOW! alice1337 bob_iz_kewl User: Another example of time-scale bias. Delay

20 Streaming Models and Algorithms for Communication and Information Networks Delay

21 Streaming Models and Algorithms for Communication and Information Networks Delay

22 Outline Introduction and Motivation A Streaming Model Algorithms Experimental Results Conclusions and Future Work Streaming Models and Algorithms for Communication and Information Networks Our Approach

23 Divergence Based on the Kolmogorov-Smirnov statistic: Recency divergence compares recency values for a set of nodes or edges to the CDF for Uniform(0,1) Delay divergence compares delay values for a set of edges, or for all (A,B)-gaps, to the CDF for Uniform(0,1) Streaming Models and Algorithms for Communication and Information Networks Compares empirical EDF F n (x) to hypothetical CDF F(x) KS = 0.32

24 Streaming Node-Centric Algorithm Goal: Flag times at which a node exhibits anomalous activity (indicated by an unusually high concentration of recent outgoing communication) Approach: Since the recency function is decreasing between consecutive communication, measure the recency divergence at a node only at times at which new activity occurs Streaming Models and Algorithms for Communication and Information Networks

25 The MCD Algorithm Goal: Identify subgraphs with correlated behavior Recency divergence to find recent anomalous activity Delay divergence to identify spheres of influence Streaming Models and Algorithms for Communication and Information Networks Challenge: How do we overcome the combinatorial explosion? Maximal Component Divergence Algorithm

26 2.9 2.7 The MCD Algorithm V2V2 V3V3 V1V1 V5V5 V4V4 0.9 0.75 0.7 0.1 0.5 0.3 2.4 V1V1 V2V2 V3V3 V4V4 V5V5 θComponentDiv(C) 0.9{V 1,V 2 }2.908 0.75{V 1,V 2,V 3 }2.723 0.7{V 1,V 2,V 3 }6.132 0.5{V 4,V 5 }1.143 0.3{V 1,V 2,V 3,V 4,V 5 }2.380 0.1{V 1,V 2,V 3,V 4,V 5 }1.882 1. Calculate edge weights using recency or delay function 2. Gradually decrease the threshold, updating components and divergence values as necessary 3. Output: Disjoint components with max divergence 6.1 2.91.1 Streaming Models and Algorithms for Communication and Information Networks Maximal Component Divergence Algorithm

27 Sample Output MCDθ#V(C)E-frac%E(C)%E(G) 14.570.075453/2120.250.08 12.840.083231/880.350.08 3.700.1065/70.710.10 2.970.1854/41.000.14 1.910.0576/410.150.04 Streaming Models and Algorithms for Communication and Information Networks

28 Outline Introduction and Motivation A Streaming Model Algorithms Experimental Results Conclusions and Future Work Streaming Models and Algorithms for Communication and Information Networks Our Approach

29 Robustness to Time Scale Streaming Models and Algorithms for Communication and Information Networks Simulation: R-MAT model, 128 vertices, avg. degree 16 IATs for edge activity sampled from Bounded Pareto distributions, rate parameter btwn 10 mins. and 1 week Every 5 days, a randomly selected node has anomalous activity at 10x its normal rate

30 Robustness to Time Scale Streaming Models and Algorithms for Communication and Information Networks

31 Robustness to Time Scale Streaming Models and Algorithms for Communication and Information Networks Conclusion: While it takes longer for anomalous activity to be recognized at nodes with lower rates, the magnitude of the peak seems to be independent of activity rate but highly correlated with degree

32 Accuracy and Precision Streaming Models and Algorithms for Communication and Information Networks Simulation: star network, 100 trials w/ only normal activity and 100 trials including a period of anomalous activity ROC curves show accuracy and precision for several methods for distinguishing between the two scenarios Conclusion: Especially when variability is introduced, our approach out-performs the WtdDeg and Z-Score metrics

33 Detection Latency Streaming Models and Algorithms for Communication and Information Networks Data: Enron corpus, 1k nodes, 2k edges, 4k timestamps Compare our approach with GraphScope Algorithm Conclusion: The two algorithms seem to identify similar times of anomalous activity, but our approach based on the REWARDS model has shorter response time

34 Anomaly Detection in IP Traffic Streaming Models and Algorithms for Communication and Information Networks Data: LBNL network trace, > 9 million timestamps during one hour on December 15, 2004 Compare our approach with total network volume and with “scanning activity” labeled by LBNL analysts

35 Anomaly Detection in IP Traffic Streaming Models and Algorithms for Communication and Information Networks

36 Complexity Analysis Dataset: Twitter messages, Nov. 2008 – Oct. 2009 (263k nodes, 308k edges, 1.1 million timestamps) Updates O(1) per communication MCD Algorithm O(m log m), where m = # of edges; can be approximated in effectively O(m) time Streaming Models and Algorithms for Communication and Information Networks

37 Outline Introduction and Motivation A Streaming Model Algorithms Experimental Results Conclusions and Future Work Streaming Models and Algorithms for Communication and Information Networks Our Approach

38 Future Work Incorporate duration of communication and other node or edge attributes into our model Make use of geographical and textual content Use gap divergence to infer links, compare to approach of Gomez-Rodriguez et. al. Develop streaming algorithm to identify emerging trends Streaming Models and Algorithms for Communication and Information Networks

39 Acknowledgements Part of this work was conducted at Lawrence Livermore National Laboratory, under the guidance of Tina Eliassi- Rad. This project is partially supported by a DHS Career Development Grant, under the auspices of CCICADA, a DHS Center of Excellence. Streaming Models and Algorithms for Communication and Information Networks

40


Download ppt "Streaming Models and Algorithms for Communication and Information Networks Brian Thompson (joint work with James Abello)"

Similar presentations


Ads by Google