Presentation is loading. Please wait.

Presentation is loading. Please wait.

On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

Similar presentations


Presentation on theme: "On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU."— Presentation transcript:

1 On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU

2 The Data Deluge: Data Data Everywhere 22

3 600$ to buy a disk drive that can store all of the worlds music 3 [McKinsey Global Institute Special Report, June 11] Data Storage is Cheap

4 Data does not exist in isolation. 4

5 Data almost always exists in connection with other data – integral part of the value proposition. 5

6 6 Social networks Protein InteractionsInternet VLSI networks Data dependencies Neighborhood graphs

7 7 Big Data Problem: All this data is only useful if we can scalably extract useful knowledge from such complex data

8 THIS TALK THE ROLE OF DATA PLACEMENT IN BIG DATA SYSTEMS THE ROLE OF VISUALIZATION AND INTERACTION IN BIG DATA ANALYSIS

9 GLOBAL GRAPHS

10 What? – System for deploying applications processing complex data Why? – Seeks balance between high productivity and high performance How? – Built on top of PNLs GlobalArrays – Trees (GlobalTrees, GlobalForests) – Relational Arrays (ArrayDB-GA) – Graphs (GlobalGraphs) Data Placement is key to high performance

11 Importance of Data Placement Locality – Placing related items close to each other so they may be processed together Mitigating Impact of Data Skew – Reducing load imbalance in a parallel setting – Reducing variance in partition samples Generating Stratified Samples – Improving interactive performance

12 Key Ideas Pivotization – Convert data with complex structure into sets – Each element of set captures features of local topology Hashing into Strata: Hash related sets into similar bins – Can employ a sketch-clustering algorithm Partitioning: Place Strata into partitions for Locality Mitigating Data Skew Samples

13 SKETCHSORT or SKETCHCLUSTER S-1 : S-4 (Δ1, SK-1) (Δ5, SK-5) (Δ12,SK-12) (Δ25,SK-25) : S-5 : S-128 : PARTITIONING & REPLICATION P-1 : P-2 S-4 S-7 S-8 S-12 : S-128 P-3 : P-8 S-3 S-4 S-9 S-12 : S- 127 PIVOT TRANSFORMATIONS A B C L E A B C L E F Δ1 Δ25 DATA ( Δ ) A B C A F C A E C A F L B E F A E L A B L A B C A E C A E L A B L (PS-1) (PS-25) PIVOT SETS (PS) MINWISE HASHING on PIVOT SETS {1050, 2020, 3130,1800} (SK-1) {1050, 2020, 7225, 2020} (SK-25) SKETCHES(SK) Strata (S)

14 Frequent Tree Mining Our proposed approaches shows 100X gains

15 WebGraph Compression Linear Scaleup with no loss in compression ratio

16 PRISM-HD - PRobing the Intrinsic Structure and Makeup of High-dimensional Data HD

17 Visualization and Interactivity are key to discovery 17

18 PRISM-HD What? – A novel mechanism for exploring complex data Why? – User is often overwhelmed with characteristics of data – Befuddled on where to start How? – Given, similarity measure-of-interest – Compute similarity graph at threshold (t) Key: Graphs are dimensionless – Provide user graph visualization cues User determines next threshold and repeats HD

19 HIGH THRESHOLD MODERATE THRESHOLD LOW THRESHOLD

20 Benefits of Knowledge Caching HD

21 Benefits of Incremental Processing on Twitter Incremental estimates on Twitter t 1 = 0.95 HD

22 PRISM-HD and Global Graphs in Context: Leveraging Social Media in Emergency Response HD

23 Concluding Remarks Data is everywhere Data is fraught with complexities – Dimensionality, dynamics, structure, massive… Both data placement and data interactivity have an important role to play in big data analytics – PRISM-HD and GlobalGraphs can help! HD

24 Thanks for your attention Contact: Mining Simulation Data Medical Image Analysis Protein Interaction Network (yeast) Acknowledgements: Various NSF, NIH, DOE and industry grants


Download ppt "On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU."

Similar presentations


Ads by Google