Presentation is loading. Please wait.

Presentation is loading. Please wait.

Towards Adaptive Dataflow Infrastructure Joe Hellerstein, UC Berkeley.

Similar presentations


Presentation on theme: "Towards Adaptive Dataflow Infrastructure Joe Hellerstein, UC Berkeley."— Presentation transcript:

1 Towards Adaptive Dataflow Infrastructure Joe Hellerstein, UC Berkeley

2 Online Query Processing: The CONTROL Project (’96-’01) Data Analysis on massive datasets takes forever No feedback, 100% accuracy Challenge: make queries more like image delivery But images are pre-encoded in progressive format Query is ad hoc Solution: Online Aggregation Continuous sampling w/o replacement New pipelining query processing algorithms with good statistical properties (e.g. Ripple Joins) and user control (Online Reordering – “Juggle”) Estimators and confidence intervals for aggregates Streaming samples, streaming answers

3

4 Images Are Aggregates

5 Can do Online “Enumeration” Too “Potter’s wheel”

6 Volatility in Streaming Queries: Analogies for Sensors Query engines map queries to dataflows Flow graph laid out by a query optimizer (typically on cluster) Query executor runs the flow User priorities change during CONTROL queries Breaks “compile-then-run” query optimization paradigm Dynamic reordering of commutative tasks: f(g(x))? g(f(x)) ? Dynamic reordering of data objects: x 1, x 2, x 3, … Requires dynamic competition among choices: f(x) or f’(x)? Volatile networks are similar Hard to predict rates of consumption/production a priori Volatile over time, and queries may run “forever” Imagine interactive user “cockpit" on the sensor net! Added metrics of power and data quality And different kinds of volatility, no doubt

7 Adaptive Dataflow: Convergence of DBs/Nets The idea from two angles Queries are flows, query optimization is routing Sensor queries need nets-style adaptivity New networking SW looks like a query engine Click, Scout. Also CANs. Sensor Qs need DB-style semantic optimization (up to app) Telegraph: An Adaptive Dataflow System Boxes & Arrows dataflow programming Adaptive reoptimization of the flow graph (Eddies) Adaptive prioritization of the delivery (Juggle) Adaptive load-balancing/FT across nodes (FLuX) Mix Push/Pull to blend streams and pools (Fjords)

8 Extra Slides on Telegraph

9 Telegraph Apps to Date Web Queries: Election 2000 http://fff.cs.berkeley.edu Enhanced P2P functionality Query by album or artist, via joins with web data Working on pure P2P query processing Initial sensor app Join I-80 traffic movement with webcams and incidents Smart Dust Mote simulations

10 Telenap: Amazon Meets Napster

11 Movie Stars Who Donated to Bush

12 Query >> Search: http://fff.cs.berkeley.edu “Federated Facts and Figures” Yahoo join FECInfo

13 Query >> Search: http://fff.cs.berkeley.edu “Federated Facts and Figures” APBNews join FECInfo


Download ppt "Towards Adaptive Dataflow Infrastructure Joe Hellerstein, UC Berkeley."

Similar presentations


Ads by Google