Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Data Stream Publish/Subscribe Architecture with Self-adapting Queries Alasdair J G Gray and Werner Nutt School of Mathematical and Computer Sciences,

Similar presentations


Presentation on theme: "A Data Stream Publish/Subscribe Architecture with Self-adapting Queries Alasdair J G Gray and Werner Nutt School of Mathematical and Computer Sciences,"— Presentation transcript:

1 A Data Stream Publish/Subscribe Architecture with Self-adapting Queries Alasdair J G Gray and Werner Nutt School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh 4 th November 2005

2 4 th Nov 2005A.J.G. Gray and W. NuttCoopIS 20052 Overview  Motivation  Publish/subscribe architecture  Answering a query  Long-lived query plans  Switching between data sources

3 4 th Nov 2005A.J.G. Gray and W. NuttCoopIS 20053 Motivation Scenario:  Streams generated by distributed sensors  Users are also distributed  Use data integration to match users to streams For example,  Grid monitoring for logging and bookkeeping  Sensor networks Grid Job progress Bookkeeping Monitoring data Motivation

4 4 th Nov 2005A.J.G. Gray and W. NuttCoopIS 20054 R-GMA: A Grid Monitoring System  Grid monitoring system that integrates streams of data  Deployed on several Grids  Continuing to be developed as part of the EGEE project  We are developing innovative extensions for R-GMA

5 4 th Nov 2005A.J.G. Gray and W. NuttCoopIS 20055 Publishing Monitoring Data  Data can be represented in terms of relations with Keys: “what” and “where” Measurements: the “value” Timestamps: “when” For example, Network ThroughPut  One reading is a tuple in the relation NTP(from,to,tool,psize,latency,timestamp) ('hw','ral','ping',32,11.1,2005-06-24- 15:05:34) NTP(from,to,tool,psize,latency,timestamp)

6 4 th Nov 2005A.J.G. Gray and W. NuttCoopIS 20056 Consuming Monitoring Data  Users are interested in how the grid changes over time. For example, 1. Latency for large packets sent from hw 2. Links with a low latency as recorded by the PingER tool  These can be expressed as SQL selection queries

7 4 th Nov 2005A.J.G. Gray and W. NuttCoopIS 20057 Data Integration in a Publish/Subscribe Architecture  Local as View Approach Consumers pose a query over the schema to request streams Producers describe their stream using a view on the schema  Queries and views are selections over a single relation Producers Registry Data Streams Consumers

8 4 th Nov 2005A.J.G. Gray and W. NuttCoopIS 20058 What is an Answer to a Query?  Global relations contain no tuples (virtual relation)  Need to translate into query over sources  An answer stream should be Sound Complete Duplicate free Weakly ordered: all tuples that share the same key value will be in timestamp order  Order in general is difficult in a distributed setting  Weak order sufficient for more complex queries such as aggregates

9 4 th Nov 2005A.J.G. Gray and W. NuttCoopIS 20059 Λ from='hw' Λ tool='udp' Λ from='ral' Λ tool='ping'from='hw' Λ psize≥1024 Query Planning: Consumer Query  Satisfiability used to find relevant producers S 1 : from='hw' Λ tool='udp' S 2 : from='hw' Λ tool='ping' S 3 : from='ral' Λ tool='ping' q 1 : from='hw' Λ psize≥1024 S 4 : from='ral' Λ tool='udp' S 5 : from=‘an' Λ tool='ping' q 2 : tool=‘ping' Λ latency≤10.0

10 4 th Nov 2005A.J.G. Gray and W. NuttCoopIS 200510 How does the Registry find Relevant Producers?  Producer views are stored in a structured format  Satisfiability check can be constructed as an SQL query SELECT producers WHERE NOT EXISTS (SELECT * WHERE contradictory condition );

11 4 th Nov 2005A.J.G. Gray and W. NuttCoopIS 200511 Scalability is an Issue Problem: Every consumer contacting every producer of interest does not scale Even a small Grid of less than a dozen sites has problems  Grids may contain thousands of resources For example, Large Hadron Collider Computing Grid (LCG)

12 4 th Nov 2005A.J.G. Gray and W. NuttCoopIS 200512 Republishers Allow the System to Scale A republisher  Consumes answers to a selection query Merges "trickles" into streams  Publishes Answer stream Latest-state answer History Problem: Choice in where to obtain information Producer S 1 Producer S 2 Republisher

13 4 th Nov 2005A.J.G. Gray and W. NuttCoopIS 200513  Meta query plan contains choice  Query plan uses one of R 1 or R 3 Query Planning in the Presence of Republishers  Find all relevant publishers  Rank according to data provided S 1 : from='hw' Λ tool='udp' S 2 : from='hw' Λ tool='ping' S 3 : from='ral' Λ tool='ping' R 1 : from='hw'R 2 : from='ral' R 3 :from='hw' Λ tool='ping' q 1 : from='hw' Λ psize≥1024 S 4 : from='ral' Λ tool='udp' S 5 : from=‘an' Λ tool='ping' q 2 : tool='ping' Λ latency≤10.0

14 4 th Nov 2005A.J.G. Gray and W. NuttCoopIS 200514 Weak Order is not Guaranteed  Tuples for same channel  (3) published before (8)  Arrive at consumer in wrong order S 1 : from='hw' Λ tool='udp' S 2 : from='hw' Λ tool='ping' S 3 : from='ral' Λ tool='ping' latency≤5.0latency>5.0 S 4 : from='ral' Λ tool='udp' q 2 : tool=‘ping' Λ latency≤10.0 slow link (3)(8) (3) (8) (3)

15 4 th Nov 2005A.J.G. Gray and W. NuttCoopIS 200515 Generating Well Formed Query Plans  A publisher is relevant for a global query if 1. Conditions are satisfiable, and 2. All measurements that agree on their key values come from the same publisher  The measurement condition can be checked using entailment.  Example on slide 13 was well formed.slide 13

16 4 th Nov 2005A.J.G. Gray and W. NuttCoopIS 200516 Query Planning in the Presence of Republishers S 1 : from='hw' Λ tool='udp' S 2 : from='hw' Λ tool='ping' S 3 : from='ral' Λ tool='ping' R 1 : from='hw'R 2 : from='ral' R 3 :from='hw' Λ tool='ping' q 1 : from='hw' Λ psize≥1024 S 4 : from='ral' Λ tool='udp' S 5 : from=‘an' Λ tool='ping' q 2 : tool=‘ping' Λ latency≤10.0

17 4 th Nov 2005A.J.G. Gray and W. NuttCoopIS 200517 Plans Need to be Maintained  Queries are long-lived  Set of publishers can change  Query plans should reflect changes  What happens when we Add a republisher? Remove a republisher?

18 4 th Nov 2005A.J.G. Gray and W. NuttCoopIS 200518 How does a new Republisher affect our Consumers?  Find consumers for which R 4 is relevant  Compare R 4 to publishers in Meta Query Plan S 1 : from='hw' Λ tool='udp' S 2 : from='hw' Λ tool='ping' S 3 : from='ral' Λ tool='ping' R 4 : TRUE R 1 : from='hw'R 2 : from='ral' R 3 :from='hw' Λ tool='ping' q 1 : from='hw' Λ psize≥1024 S 4 : from='ral' Λ tool='udp' S 5 : from=‘an' Λ tool='ping' q 2 : tool= 'ping' Λ latency≤10.0

19 4 th Nov 2005A.J.G. Gray and W. NuttCoopIS 200519 General Case of Adding a Republisher Republisher relevant for a consumer query, either 1. Republisher is not maximal relevant No change in query plans 2. Equivalent Republisher Change to the Meta Query Plan No change to the Query Plan 3. Covering Republisher Change to the Meta Query Plan Change to the Query Plan

20 4 th Nov 2005A.J.G. Gray and W. NuttCoopIS 200520 How does removing a Republisher affect our Consumers?  Find all consumers for which R 1 was relevant  Update plans S 1 : from='hw' Λ tool='udp' S 2 : from='hw' Λ tool='ping' S 3 : from='ral' Λ tool='ping' R 4 : TRUE R 1 : from='hw'R 2 : from='ral' R 3 :from='hw' Λ tool='ping' q 1 : from='hw' Λ psize≥1024 S 4 : from='ral' Λ tool='udp' S 5 : from=‘an' Λ tool='ping' q 2 : tool=‘ping' Λ latency≤10.0

21 4 th Nov 2005A.J.G. Gray and W. NuttCoopIS 200521 General Case of Dropping a Republisher Republisher relevant for a consumer query, either 1. Republisher is not maximal relevant No change in query plans 2. Equivalent Republisher Change to the Meta Query Plan May need to change the Query Plan 3. Covering Republisher Change to the Meta Query Plan Change to the Query Plan Requires some method to patch the plan

22 4 th Nov 2005A.J.G. Gray and W. NuttCoopIS 200522 Planning a Republisher Query  Applying Consumer planning techniques results in a problem S 1 : from='hw' Λ tool='udp' S 2 : from='hw' Λ tool='ping' S 3 : from='ral' Λ tool='ping' R 4 : TRUE R 1 : from='hw'R 2 : from='ral' R 3 :from='hw' Λ tool='ping' S 4 : from='ral' Λ tool='udp' S 5 : from=‘an' Λ tool='ping' Problem:  Hierarchy contains cycles  Republishers disconnected from Producers

23 4 th Nov 2005A.J.G. Gray and W. NuttCoopIS 200523  Correctness: streams answer queries  Cycle freeness: loops can lead to duplicates  Uniqueness: hierarchy defined for a set of publishers  Local planning: Publishers and Consumers only need to communicate with the Registry Desirable Properties for a Hierarchy

24 4 th Nov 2005A.J.G. Gray and W. NuttCoopIS 200524 Generating Well Formed Hierarchies  Need a stricter relevance criterion  R 1 can consume from R 2 iff 1. Everything R 2 offers is relevant to R 1, and 2. R 1 offers something R 2 does not. Can be checked by entailment Ensures  No loops in the hierarchy  Republishers connected to the Producers

25 4 th Nov 2005A.J.G. Gray and W. NuttCoopIS 200525 Planning a Republisher Query: 2nd Attempt  Stricter relevance criterion  Republishers only consume from publishers below them S 1 : from='hw' Λ tool='udp' S 2 : from='hw' Λ tool='ping' S 3 : from='ral' Λ tool='ping' R 4 : TRUE R 1 : from='hw'R 2 : from='ral' R 3 :from='hw' Λ tool='ping' S 4 : from='ral' Λ tool='udp' S 5 : from=‘an' Λ tool='ping' R 4 is not relevant for R 1

26 4 th Nov 2005A.J.G. Gray and W. NuttCoopIS 200526 Computing Query Plans Adding a new Consumer 1. Consumer contacts Consumer Agent 2. Consumer Agent contacts the Registry and receives a list of relevant publishers 3. Consumer Agent constructs Meta Query Plan and Query Plan 4. Consumer Agent contacts Publisher Agents in the Query Plan 5. Publisher starts streaming tuples to consumer agent 6. Consumer agent merges into a single answer stream  Similar approach for adding a publisher Consumer Registry Producer Agent Consumer Agent 1 2 4 3 5 6

27 4 th Nov 2005A.J.G. Gray and W. NuttCoopIS 200527 Maintaining Query Plans  Agents maintain registry entry through a soft state registration mechanism  Registry detects change in publisher set Poses query over internal database Informs affected consumers/republishers  Consumer agent considers query plan Consumer Registry Producer Agent Consumer Agent

28 4 th Nov 2005A.J.G. Gray and W. NuttCoopIS 200528 0 Consumer switching to new Publisher  R 1 equivalent to R 2  Plan changes to use R 2  Send timestamp of oldest tuple  Stream from first tuple with timestamp  Filter against latest- state buffer Mechanism ensures answer stream properties

29 4 th Nov 2005A.J.G. Gray and W. NuttCoopIS 200529 Conclusions  Formal framework for publishing and consuming stream data  Partially implemented in R-GMA  Republishers: Allow system to scale Complicate query answering problem  Republishers require special planning  We have developed algorithms that allows the system to adapt to changes in the set of publishers  Protocol developed for switching between query plans

30 4 th Nov 2005A.J.G. Gray and W. NuttCoopIS 200530 Example  Description S 1 : from='hw' Λ tool='udp' S 2 : from='hw' Λ tool='ping' S 3 : from='ral' Λ tool='ping' R 4 : TRUE R 1 : from='hw'R 2 : from='ral' R 3 :from='hw' Λ tool='ping' q 1 : from='hw' Λ psize≥1024 S 4 : from='ral' Λ tool='udp' S 5 : from=‘an' Λ tool='ping' q 2 : tool=‘ping' Λ latency≤10.0


Download ppt "A Data Stream Publish/Subscribe Architecture with Self-adapting Queries Alasdair J G Gray and Werner Nutt School of Mathematical and Computer Sciences,"

Similar presentations


Ads by Google