Presentation is loading. Please wait.

Presentation is loading. Please wait.

Massive Streaming Data Analytics: A Case Study with Clustering Coefficients David Ediger Karl Jiang Jason Riedy David A. Bader Georgia Institute of Technology.

Similar presentations


Presentation on theme: "Massive Streaming Data Analytics: A Case Study with Clustering Coefficients David Ediger Karl Jiang Jason Riedy David A. Bader Georgia Institute of Technology."— Presentation transcript:

1 Massive Streaming Data Analytics: A Case Study with Clustering Coefficients David Ediger Karl Jiang Jason Riedy David A. Bader Georgia Institute of Technology Atlanta, GA USA 1

2 STINGER Data Structure Spatio-temporal Interaction Networks and Graphs (STING) Extensible Representation General-purpose data structure for dynamic graphs Efficient edge insertion/deletion (updates) with concurrent readers (analysis) 2

3 STINGER Data Structure Array of linked lists, which may have empty slots (from deleting edges) Additional stored info not in paper Efficient updates Concurrent reads (no locking) 3

4 Assumptions for parallelism Single streaming source for inserts/deletes Changes are scattered widely – Batches are sufficiently independent Analysis kernels have small range – Graph change only requires access to local portions and affects small portion of output 4

5 Assumptions (continued) 5

6 Case Study: Updating Clustering Coefficients Clustering coefficients measure density of closed triangles: One way of determining if a graph is a small- world graph 6

7 Bloom filter Consider an edge list represented as a bit array (1 bit per edge) => O(n) storage space Bloom filter is a bit array with an arbitrary, smaller number of bits A hash function maps a vertex to a specific bit Small number of bits == high collision rate To reduce false-positives, use k independent hash functions to set multiple bits 7

8 Bloom filter 8

9 Testbed Massively multi-threaded Cray XMT – 64 Threadstorm processors Each running at 500MHz Each has 128 hardware streams maintaining a thread context Context switches occur every cycle 512 GiB globally addressable shared memory – (holds 2 billion vertices and 17 billion edges) Synthetic data – 16 million vertices, ~500 million edges 9


Download ppt "Massive Streaming Data Analytics: A Case Study with Clustering Coefficients David Ediger Karl Jiang Jason Riedy David A. Bader Georgia Institute of Technology."

Similar presentations


Ads by Google