Engine Issues for Data Stream Processing Mike Franklin UC Berkeley 1 st Duodecennial SWiM Meeting January 9, 2003.

Engine Issues for Data Stream Processing Mike Franklin UC Berkeley 1 st Duodecennial SWiM Meeting January 9, 2003

Michael J. Franklin2 Panel Goals Identify those key areas where existing database engine technology falls short for supporting data streams. – Succinct justification for the area Oracle talk at CIDR showed how to do a lot of interesting things using tables/standard ixs/SQL – Identification of interesting research areas/open problems – Road map for progress To point out possible solutions, non-solutions or just potential cool things.

Michael J. Franklin3 Panel Structure Approach: Panelists requested to identify their #1 concern in engine design. Panelists (in order of desc distance travelled) – Alex Buchmann – Ugur Cetintemel – Ted Johnson – Jennifer Widom

Michael J. Franklin4 My #1 Issue(s): Sharing + Adaptivity Sharing: – Opportunity: Standing queries can see and analyze most of the queries as a group long-lived queries mean benefits accrue, costs are amortized – Benefit: Scalability obvious: avoid duplicate work need to keep up with the dataflow – don’t want to stall pipeline (similar to staged db ideas) reduce cost of entry for new queries Adaptivity – no stats, dynamic environment,… – in particular, the query mix and workload intensity continually fluctuate.

Michael J. Franklin5 Common Sub-expressions Traditional MQO approaches suffer from same problems as traditional QP approaches in streaming environments. – namely, they are static Insertion and removal of queries degrades global plan quality over time. Two approaches: – YFilter: shared XML filtering – TelegraphCQ: extreme adaptive QP

Michael J. Franklin6 YFilter:Shared Processing (Yanlei Diao) XFilter showed how to use an event-based (SAX) parser to drive state transitions for XML filtering. YFilter uses an NFA-based approach to share work among queries. Location steps /a //a /* //* NFA fragments a * a  * * * 

Michael J. Franklin7 Combining NFA Fragments (a) a b a b (b) * b * b (c) * a  * a  b b (d) *  a b * b  * a 

Michael J. Franklin8 YFilter: NFA Structure Matching Q1=/a/b Q2=/a/c Q3=/a/b/c Q4=/a//b/c Q5=/a/*/c Q6=/a//c Q7=/a/*/*/c Q8=/a/b/c {Q6} {Q7} c {Q2} c * {Q4}  c b * c {Q1} a b c {Q5} * c {Q3}{Q3, Q8} Key to scalability is sharing of machine states and processing.

Michael J. Franklin9 The TelegraphCQ Approach Aggressive adaptivity – Say no to static dataflows – Continuous adaptivity A B C D Eddy A B C D SteMs D B + = Aggressive sharing Beyond common sub-expressions Easy addition of new queries Sharing and Adaptivity Two sides of the same coin ! Use a single framework for both

Michael J. Franklin10 Fun with Eddies and STeMs Eddy A B C D SteMs Grouped Selection Filter Output Q2: select * from B, D where B.b = D.d and B.b > 25 Q1: select * from A,B,C,D where A.a = B.b and B.b = C.c and C.c = D.d D A B C

Michael J. Franklin11 Dynamic Query Addition TelegraphCQ Front End Planner Parser Listener Mini-Executor Catalog Split TelegraphCQ Back End Modules Scans CQEddy TelegraphCQ Wrapper ClearingHouse Shared Memory Buffer Pool Disk Query Plan Queue Eddy Control Queue Query Result Queues } Legend Data Tuples Query + Control Data + Query Wrappers Proxy 1 2 3 4 5 6 7 8 9 Cursor

Engine Issues for Data Stream Processing Mike Franklin UC Berkeley 1 st Duodecennial SWiM Meeting January 9, 2003.

Similar presentations

Presentation on theme: "Engine Issues for Data Stream Processing Mike Franklin UC Berkeley 1 st Duodecennial SWiM Meeting January 9, 2003."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Engine Issues for Data Stream Processing Mike Franklin UC Berkeley 1 st Duodecennial SWiM Meeting January 9, 2003.

Similar presentations

Presentation on theme: "Engine Issues for Data Stream Processing Mike Franklin UC Berkeley 1 st Duodecennial SWiM Meeting January 9, 2003."— Presentation transcript:

Similar presentations

About project

Feedback