Presentation is loading. Please wait.

Presentation is loading. Please wait.

Engine Issues for Data Stream Processing Mike Franklin UC Berkeley 1 st Duodecennial SWiM Meeting January 9, 2003.

Similar presentations


Presentation on theme: "Engine Issues for Data Stream Processing Mike Franklin UC Berkeley 1 st Duodecennial SWiM Meeting January 9, 2003."— Presentation transcript:

1 Engine Issues for Data Stream Processing Mike Franklin UC Berkeley 1 st Duodecennial SWiM Meeting January 9, 2003

2 Michael J. Franklin2 Panel Goals Identify those key areas where existing database engine technology falls short for supporting data streams. – Succinct justification for the area Oracle talk at CIDR showed how to do a lot of interesting things using tables/standard ixs/SQL – Identification of interesting research areas/open problems – Road map for progress To point out possible solutions, non-solutions or just potential cool things.

3 Michael J. Franklin3 Panel Structure Approach: Panelists requested to identify their #1 concern in engine design. Panelists (in order of desc distance travelled) – Alex Buchmann – Ugur Cetintemel – Ted Johnson – Jennifer Widom

4 Michael J. Franklin4 My #1 Issue(s): Sharing + Adaptivity Sharing: – Opportunity: Standing queries can see and analyze most of the queries as a group long-lived queries mean benefits accrue, costs are amortized – Benefit: Scalability obvious: avoid duplicate work need to keep up with the dataflow – don’t want to stall pipeline (similar to staged db ideas) reduce cost of entry for new queries Adaptivity – no stats, dynamic environment,… – in particular, the query mix and workload intensity continually fluctuate.

5 Michael J. Franklin5 Common Sub-expressions Traditional MQO approaches suffer from same problems as traditional QP approaches in streaming environments. – namely, they are static Insertion and removal of queries degrades global plan quality over time. Two approaches: – YFilter: shared XML filtering – TelegraphCQ: extreme adaptive QP

6 Michael J. Franklin6 YFilter:Shared Processing (Yanlei Diao) XFilter showed how to use an event-based (SAX) parser to drive state transitions for XML filtering. YFilter uses an NFA-based approach to share work among queries. Location steps /a //a /* //* NFA fragments a * a  * * * 

7 Michael J. Franklin7 Combining NFA Fragments (a) a b a b (b) * b * b (c) * a  * a  b b (d) *  a b * b  * a 

8 Michael J. Franklin8 YFilter: NFA Structure Matching Q1=/a/b Q2=/a/c Q3=/a/b/c Q4=/a//b/c Q5=/a/*/c Q6=/a//c Q7=/a/*/*/c Q8=/a/b/c {Q6} {Q7} c {Q2} c * {Q4}  c b * c {Q1} a b c {Q5} * c {Q3}{Q3, Q8} Key to scalability is sharing of machine states and processing.

9 Michael J. Franklin9 The TelegraphCQ Approach Aggressive adaptivity – Say no to static dataflows – Continuous adaptivity A B C D Eddy A B C D SteMs D B + = Aggressive sharing Beyond common sub-expressions Easy addition of new queries Sharing and Adaptivity Two sides of the same coin ! Use a single framework for both

10 Michael J. Franklin10 Fun with Eddies and STeMs Eddy A B C D SteMs Grouped Selection Filter Output Q2: select * from B, D where B.b = D.d and B.b > 25 Q1: select * from A,B,C,D where A.a = B.b and B.b = C.c and C.c = D.d D A B C

11 Michael J. Franklin11 Dynamic Query Addition TelegraphCQ Front End Planner Parser Listener Mini-Executor Catalog Split TelegraphCQ Back End Modules Scans CQEddy TelegraphCQ Wrapper ClearingHouse Shared Memory Buffer Pool Disk Query Plan Queue Eddy Control Queue Query Result Queues } Legend Data Tuples Query + Control Data + Query Wrappers Proxy 1 2 3 4 5 6 7 8 9 Cursor


Download ppt "Engine Issues for Data Stream Processing Mike Franklin UC Berkeley 1 st Duodecennial SWiM Meeting January 9, 2003."

Similar presentations


Ads by Google