Presentation is loading. Please wait.

Presentation is loading. Please wait.

Website: Answering Continuous Queries Using Views Over Data Streams Alasdair J G Gray Werner.

Similar presentations


Presentation on theme: "Website: Answering Continuous Queries Using Views Over Data Streams Alasdair J G Gray Werner."— Presentation transcript:

1 Website: http://www.macs.hw.ac.uk/~ajgg1/dis4gEmail: ajgg1@macs.hw.ac.uk Answering Continuous Queries Using Views Over Data Streams Alasdair J G Gray Werner Nutt Introduction We have developed techniques for planning the execution of continuous queries posed over a set of distributed data streams. The plan generated for a query produces an answer stream which meets the condition of the query. A data stream is an append only data source. An example is a sensor that continually publishes its reading. A continuous query is a query which once issued returns all fresh readings which meet the condition of the query. It can be seen as a subscription to data of interest. Motivation: Grid Monitoring A Grid is a collection of connected, geographically distributed, computational resources belonging to several organisations. The Grid behaves as a single virtual supercomputer. The types of components found in a Grid and their interactions are shown in figure 1. Components on the Grid require monitoring information about other components of the Grid. For example, the resource broker could be looking for a lightly loaded computing element to process a job or the User Interface could be running a visualisation tool tracking the progress of a job. Monitoring data is published about each resource on the Grid, e.g. a computing element publishes data about the number of jobs it is currently processing. This monitoring data can be seen as a stream. These streams are distributed across the Grid. Components must be able to locate and request monitoring data of interest. Figure 1: The components of the European DataGrid User Interface Monitoring System Status Information Data Transfer Job Submission Resource Broker Logging and Bookkeeping Replica Catalogue Computer Computing Element Computer Query Results Storage Element R-GMA: A Grid Monitoring System R-GMA is a Grid monitoring system that has been developed as part of the DataGrid project. It is an information integration system that provides a virtual database containing information about all the resources of a Grid. The architecture of R-GMA is shown in figure 2. R-GMA consists of: Schema: provides a vocabulary with which to communicate. Producers: publish monitoring information and respond to queries. Consumers: query for monitoring information. Republishers: query for monitoring information and publish their answer. Registry: matches consumer requests for information with relevant publishers. Republishers allow queries to be answered more efficiently. They collect together streams from the producers and make the combined stream available from a single point on the Grid. However, they increase the difficulty of query answering as tuples can come directly from producers or they can come from republishers. Query Planning Within R-GMA logical reasoning is used to generate a query plan for a continuous query. The reasoning follows three phases and is distributed between the registry and the consumer. The query plans generated are sound, complete, duplicate free (get each tuple once) and weakly ordered (for each stream, tuples appear in same order they were originally published in). Query Plan Maintanence Continuous queries are posed from the point in time when they were created until they are stopped by the consumer. During this period, the set of publishers in the system can change. Therefore, the consumser’s query plans must be maintained to reflect the current set of available publishers. There are four cases to consider: In each case, the registry informs the consumer of the change. The consumer then consults their meta query plan to see if it needs amending. By using the meta query plan approach, we reduce how often a plan must be recalculated when a publisher is added or removed from the system. The meta query plan holds a list of alternative publishers, so alterations to the executed query plan can be made quickly. Figure 2: The architecture of R-GMA Register Query Data Register View Republisher Register Query & View (Q=V) Consumer Producer Schema Registry 1.For each consumer the registry will generate a list of publishers (producers and republishers) who could potentially contribute answers, i.e. publishers whose condition does not contradict the condition of the query. These are the relevant publishers. 2.The consumer keeps only those publishers which are not strictly covered by another relevant publisher, i.e. in figure 2, two of the producers are strictly covered by the republisher. These are the maximal relevant publishers. These are grouped into equivalence classes based on the tuples made available for the query. We call this a meta query plan. 3.From the meta query plan, the consumer can construct a query plan which contains one publisher from each equivalence class. 1.A new producer is added to the system. 2.An existing producer is dropped from the system. 3.A new republisher is added to the system. 4.An existing republisher is dropped from the system.


Download ppt "Website: Answering Continuous Queries Using Views Over Data Streams Alasdair J G Gray Werner."

Similar presentations


Ads by Google