Presentation is loading. Please wait.

Presentation is loading. Please wait.

Techniques for Visualizing Massive Data Sets

Similar presentations


Presentation on theme: "Techniques for Visualizing Massive Data Sets"— Presentation transcript:

1 Techniques for Visualizing Massive Data Sets
Leilani Battle, Mike Stonebraker

2 Context Visualization System query result Database
Have a database with lots of data Want a visual overview (i.e. stats plots like scatterplot, heatmap, etc.) Want visualizations to be interactive (I.e. pan and zoom)

3 Problem Performance Over-plotting
Vis systems don’t scale well for big data Or are turning into databases Over-plotting Makes visualizations unreadable Waste of time/resources

4 Solution: Resolution Reduction
Visualization System Database Resolution Reduction Layer query modified query queryplan query When query will return too much data, reduce it Aggregate, sample, filter, etc. “Too big” means: Will slow down the vis system Will cause over-plotting reduced result queryplan result

5 ScalaR Scalable vis system for data exploration
Web front-end Uses SciDB (www.scidb.org) Visualizes query results Performs Resolution Reduction Advertise SciDB (open source array-oriented db system, great for scientific applications and scalable machine learning, give url scidb.org)

6 Demo of ScalaR

7 Array Browser Collaboration with:
Brown: Justin DeBrabant, Stan Zdonik, Ugur Cetintemel Stanford: Zhicheng Liu, Jeff Heer Google Maps-style exploration experience Fetches subsets of the data (aka data tiles)

8 Array Browser Example

9 Array Browser Architecture

10 Demo of Array Browser

11 Future Work: Prefetching
Goal: Reduce user-wait time by prefetching tiles Cache tiles in the tile buffer Need algorithms to decide what to pre-fetch

12 User Behavior Predictor (Seer)
Learn common query sequences from user traces P P

13 Statistical Analysis Predictor
Look for statistical similarities in tiles Try to guess what’s important based on patterns P P P

14 Using Multiple Predictors
Run multiple predictors (or experts) in parallel Compare predictions to user’s actual behavior Use predictions from best performing expert May change over time based on user’s goals

15 Other Challenges Lots if interesting problems left to address
Best eviction policy for the tile buffer? How to share data between multiple users? More predictors? Explain these bullets LRU and weighted LRU, lots of work don on these But not clear how LRU works with multiple predictors

16 Questions?

17

18 Gemini Sagittarius Dogs Cats

19

20 Prefetching Experts User behavior predictor (Seer)
Learn common query sequences from user traces Stats analysis predictor Look for statistical similarities in tiles Try to guess what’s important based on patterns 2 min Describe movement patterns through the data set, and explain how Seer is great for this


Download ppt "Techniques for Visualizing Massive Data Sets"

Similar presentations


Ads by Google