Presentation is loading. Please wait.

Presentation is loading. Please wait.

Supporting Join Queries Talk by: Andy Cooke Collaborators: Alasdair Gray, Lisha Ma, and Werner Nutt Heriot-Watt University.

Similar presentations


Presentation on theme: "Supporting Join Queries Talk by: Andy Cooke Collaborators: Alasdair Gray, Lisha Ma, and Werner Nutt Heriot-Watt University."— Presentation transcript:

1 Supporting Join Queries Talk by: Andy Cooke Collaborators: Alasdair Gray, Lisha Ma, and Werner Nutt Heriot-Watt University

2 What queries would users like to ask? (1) n A continuously executing query that might involve matching tuples across several streams. “ stream to me average net traffic passing between two ComputingElements (CEs)” u need to specify in the query the age of tuples that can be matched (a “sliding window”) u e.g. “consider only tuples no older than 5 min. from now” Possibly interesting?

3 n A “latest snapshot” query that joins the latest values of keys. “return all CEs that Steve is allowed to use” (Resource Broker)  This query would involve joining tuples from CE tables, VO tables and denied users tables Probably interesting!  A “history” query involving self-joins and aggregation “what was the growth in net traffic since last week?” Possibly interesting? What queries would users like to ask? (2)

4 How can R-GMA answer such queries? Observation: n If all the relevant tuples are inside one DBMS, then we can pass the query on to that DBMS query engine. - EASY! n If there are > 1 relevant producers, then our mediator probably needs an execution engine!- HARD! In any case, we know that some R-GMA users are defining Archivers and querying these directly. However:  the local answer may only be a subset of the global answer.  they may get a wrong answer (if the query involved max, avg, count, etc.)

5 Answering Joins using Archivers tables: cpuLoad, discspace condition: country =‘britain’ Requirements: Complete views (I publish everything!) “Latest” or “History” query-type (so data in a database, not a buffer). A smart registry hmm.. just need to go to 1 Archiver. Tuple matching always needs to take place in the same database, and never across databases. e.g. “SELECT * FROM cpuload c, discspace s WHERE c.site = s.site” can easily be answered using site archivers

6 n Archivers can’t access the tuples introduced by LatestProducers and DatabaseProducers Problems with Answering Joins using Archivers (1)  If a  new LatestProducer registers.  Archiver can’t stream from it.  mediator needs to mediate between two producers, but doesn’t have a query engine!

7 Problems: n Archivers can’t access the tuples introduced by LatestProducers and DatabaseProducers Answering Joins using Archivers (2) u If a new LatestProducer is registered, the Archiver cannot access these tuples because LatestProducers can’t answer stream queries. u consider a Archiver at some site that pores the tuples from several StreamProducers into a LatestProducer u Therefore the Mediator can’t rely on the Archiver’s query engine to return a complete answer, and so must mediate (hard!). n What if one Archiver isn’t enough? F Consumer.canAnswer()? Consumer.getPlan() ? (“you need an Archiver with these declarations”) (“I can’t answer your query, but could answer this sub-query”) n What if the Archiver disappears before the consumer calls start()? n Would a “Latest Archiver” be up-to-date enough?  new LatestProducer registers.  Archiver can’t stream from it.  mediator needs to mediate between two producers, but doesn’t have a query engine!

8 Problems: n Archivers can’t access the tuples introduced by LatestProducers and DatabaseProducers Answering Joins using Archivers (2) u If a new LatestProducer is registered, the Archiver cannot access these tuples because LatestProducers can’t answer stream queries. u consider a Archiver at some site that pores the tuples from several StreamProducers into a LatestProducer u Therefore the Mediator can’t rely on the Archiver’s query engine to return a complete answer, and so must mediate (hard!). n What if one Archiver isn’t enough? F Consumer.canAnswer()? Consumer.getPlan() ? (“you need an Archiver with these declarations”) (“I can’t answer your query, but could answer this sub-query”) n What if the Archiver disappears before the consumer calls start()? n Would a “Latest Archiver” be up-to-date enough?  new LatestProducer registers.  Archiver can’t stream from it.  mediator needs to mediate between two producers, but doesn’t have a query engine!

9 Query Planning and Execution: F What are the relevant Producers? F What sub-queries should we send them? F How should results be combined and operated on? (need a query engine!) Where? Possible Query Engines: F MySQL - dump all the data into MySQL … easy! F Polar Star (Manchester) ?… compatability? Answering Joins without Archivers

10 Conclusions We could support some “global” join queries quite easily:  when just one Archiver is enough (needs a smarter Registry)  suggestions could be given when there isn’t one Archiver available (consumer.getPlan())  and/or ad hoc joins could answered (in-efficiently) by first loading data into MySQL But:  what queries do users want to pose?  shouldn’t we restrict users to using only StreamProducers?


Download ppt "Supporting Join Queries Talk by: Andy Cooke Collaborators: Alasdair Gray, Lisha Ma, and Werner Nutt Heriot-Watt University."

Similar presentations


Ads by Google