Massive Streaming Data Analytics: A Case Study with Clustering Coefficients David Ediger Karl Jiang Jason Riedy David A. Bader Georgia Institute of Technology.

Massive Streaming Data Analytics: A Case Study with Clustering Coefficients David Ediger Karl Jiang Jason Riedy David A. Bader Georgia Institute of Technology Atlanta, GA USA 1

STINGER Data Structure Spatio-temporal Interaction Networks and Graphs (STING) Extensible Representation General-purpose data structure for dynamic graphs Efficient edge insertion/deletion (updates) with concurrent readers (analysis) 2

STINGER Data Structure Array of linked lists, which may have empty slots (from deleting edges) Additional stored info not in paper Efficient updates Concurrent reads (no locking) 3

Assumptions for parallelism Single streaming source for inserts/deletes Changes are scattered widely – Batches are sufficiently independent Analysis kernels have small range – Graph change only requires access to local portions and affects small portion of output 4

Assumptions (continued) 5

Case Study: Updating Clustering Coefficients Clustering coefficients measure density of closed triangles: One way of determining if a graph is a small- world graph 6

Bloom filter Consider an edge list represented as a bit array (1 bit per edge) => O(n) storage space Bloom filter is a bit array with an arbitrary, smaller number of bits A hash function maps a vertex to a specific bit Small number of bits == high collision rate To reduce false-positives, use k independent hash functions to set multiple bits 7

Bloom filter 8

Testbed Massively multi-threaded Cray XMT – 64 Threadstorm processors Each running at 500MHz Each has 128 hardware streams maintaining a thread context Context switches occur every cycle 512 GiB globally addressable shared memory – (holds 2 billion vertices and 17 billion edges) Synthetic data – 16 million vertices, ~500 million edges 9

Massive Streaming Data Analytics: A Case Study with Clustering Coefficients David Ediger Karl Jiang Jason Riedy David A. Bader Georgia Institute of Technology.

Similar presentations

Presentation on theme: "Massive Streaming Data Analytics: A Case Study with Clustering Coefficients David Ediger Karl Jiang Jason Riedy David A. Bader Georgia Institute of Technology."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Massive Streaming Data Analytics: A Case Study with Clustering Coefficients David Ediger Karl Jiang Jason Riedy David A. Bader Georgia Institute of Technology.

Similar presentations

Presentation on theme: "Massive Streaming Data Analytics: A Case Study with Clustering Coefficients David Ediger Karl Jiang Jason Riedy David A. Bader Georgia Institute of Technology."— Presentation transcript:

Similar presentations

About project

Feedback