Website: Answering Continuous Queries Using Views Over Data Streams Alasdair J G Gray Werner.

Slides:



Advertisements
Similar presentations
© 2006 Open Grid Forum GGF18, 13th September 2006 OGSA Data Architecture Scenarios Dave Berry & Stephen Davey.
Advertisements

IEEE NSS 2003 Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what.
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
COURSE: COMPUTER PLATFORMS
The State of the Art in Distributed Query Processing by Donald Kossmann Presented by Chris Gianfrancesco.
CIT 613: Relational Database Development using SQL Introduction to SQL.
A Computation Management Agent for Multi-Institutional Grids
Distributed Systems Architectures
1 ITC242 – Introduction to Data Communications Week 12 Topic 18 Chapter 19 Network Management.
NextGRID & OGSA Data Architectures: Example Scenarios Stephen Davey, NeSC, UK ISSGC06 Summer School, Ischia, Italy 12 th July 2006.
WP3 R-GMA Revisited 23/7/2002 Werner Nutt / Heriot-Watt University.
Grids and Grid Technologies for Wide-Area Distributed Computing Mark Baker, Rajkumar Buyya and Domenico Laforenza.
Kelly Davis Architecture of GAT Kelly Davis AEI-MPG.
Overview of Search Engines
Makrand Siddhabhatti Tata Institute of Fundamental Research Mumbai 17 Aug
WP6: Grid Authorization Service Review meeting in Berlin, March 8 th 2004 Marcin Adamski Michał Chmielewski Sergiusz Fonrobert Jarek Nabrzyski Tomasz Nowocień.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
Republishers in a Publish/Subscribe Architecture for Data Streams Alasdair J G Gray and Werner Nutt School of Mathematical and Computer Sciences, Heriot-Watt.
ITEC224 Database Programming
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
Republishing Mechanisms for R-GMA Benefits and Approaches. Talk by: Alasdair Gray Collaborators: Andy Cooke, Lisha Ma, and Werner Nutt Heriot-Watt University.
Introduction on R-GMA Shi Jingyan Computing Center IHEP.
Computer and Automation Research Institute Hungarian Academy of Sciences Presentation and Analysis of Grid Performance Data Norbert Podhorszki and Peter.
Lecture On Introduction (DBMS) By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
GRM + Mercury in P-GRADE Monitoring of P-GRADE applications in the Grid using GRM and Mercury.
Miguel Branco CERN/University of Southampton Enabling provenance on large-scale e-Science applications.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
SZTAKI in DataGrid 2003 What to do this year. Topics ● Application monitoring (GRM) ● Analysis and Presentation (Pulse) ● Performance of R-GMA.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Real Time Monitor of Grid Job Executions Janusz Martyniak Imperial College London.
Information Grid Services in the Polish Optical Internet PIONIER Cezary Mazurek, Maciej Stroiński, Jan Węglarz.
Sensor Database System Sultan Alhazmi
Grid infrastructure analysis with a simple flow model Andrey Demichev, Alexander Kryukov, Lev Shamardin, Grigory Shpiz Scobeltsyn Institute of Nuclear.
SE-02 COMPONENTS – WHY? Object-oriented source-level re-use of code requires same source code language. Object-oriented source-level re-use may require.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks R-GMA Now With Added Authorization Steve.
Kelly Davis and Tom Goodale Architecture of GAT Kelly Davis and Tom Goodale and
FILES AND DATABASES. A FILE is a collection of records with similar characteristics, e.g: A Sales Ledger Stock Records A Price List Customer Records Files.
CENTRALISED AND CLIENT / SERVER DBMS. Topics To Be Discussed………………………. (A) Centralized DBMS (i) IntroductionIntroduction (ii) AdvantagesAdvantages (ii)
Distributed Database. Introduction A major motivation behind the development of database systems is the desire to integrate the operational data of an.
CLRC and the European DataGrid Middleware Information and Monitoring Services The current information service is built on the hierarchical database OpenLDAP.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Information Integration By Neel Bavishi. Mediator Introduction A mediator supports a virtual view or collection of views that integrates several sources.
AN SLA-BASED RESOURCE VIRTUALIZATION APPROACH FOR ON-DEMAND SERVICE PROVISION Gabor Kecskemeti MTA SZTAKI International Workshop on Virtualization Technologies.
WP3 RGMA Deployment Laurence Field / RAL Steve Fisher / RAL.
E-infrastructure shared between Europe and Latin America FP6−2004−Infrastructures−6-SSA gLite Information System Pedro Rausch IF.
WP3 Werner Nutt (Heriot-Watt University) R-GMA – DataGrid’s Monitoring System 1/7/2003.
A Data Stream Publish/Subscribe Architecture with Self-adapting Queries Alasdair J G Gray and Werner Nutt School of Mathematical and Computer Sciences,
Configuration Mapper Sonja Vrcic Socorro,
 CMS data challenges. The nature of the problem.  What is GMA ?  And what is R-GMA ?  Performance test description  Performance test results  Conclusions.
System/SDWG Update Management Council Face-to-Face Flagstaff, AZ August 22-23, 2011 Sean Hardman.
Patterns in caBIG Baris E. Suzek 12/21/2009. What is a Pattern? Design pattern “A general reusable solution to a commonly occurring problem in software.
Question What technology differentiates the different stages a computer had gone through from generation 1 to present?
Lecture On Introduction (DBMS) By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
EGEE is a project funded by the European Union under contract IST Information and Monitoring Services within a Grid R-GMA (Relational Grid.
Supporting Join Queries Talk by: Andy Cooke Collaborators: Alasdair Gray, Lisha Ma, and Werner Nutt Heriot-Watt University.
INFSO-RI Enabling Grids for E-sciencE gLite Information System: R-GMA Tony Calanducci INFN Catania gLite tutorial at the EGEE User.
CERN 21 January 2005Piotr Nyczyk, CERN1 R-GMA Basics and key concepts Monitoring framework for computing Grids – developed by EGEE-JRA1-UK, currently used.
TIFR, Mumbai, India, Feb 13-17, GridView - A Grid Monitoring and Visualization Tool Rajesh Kalmady, Digamber Sonvane, Kislay Bhatt, Phool Chand,
The Mediator: What Next? Talk by: Andy Cooke Collaborators: Alasdair Gray, Lisha Ma, and Werner Nutt Heriot-Watt University.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
InSilicoLab – Grid Environment for Supporting Numerical Experiments in Chemistry Joanna Kocot, Daniel Harężlak, Klemens Noga, Mariusz Sterzel, Tomasz Szepieniec.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
ALICE Monitoring
Distributed Databases
OGSA Data Architecture Scenarios
COMPONENTS – WHY? Object-oriented source-level re-use of code requires same source code language. Object-oriented source-level re-use may require understanding.
Presentation transcript:

Website: Answering Continuous Queries Using Views Over Data Streams Alasdair J G Gray Werner Nutt Introduction We have developed techniques for planning the execution of continuous queries posed over a set of distributed data streams. The plan generated for a query produces an answer stream which meets the condition of the query. A data stream is an append only data source. An example is a sensor that continually publishes its reading. A continuous query is a query which once issued returns all fresh readings which meet the condition of the query. It can be seen as a subscription to data of interest. Motivation: Grid Monitoring A Grid is a collection of connected, geographically distributed, computational resources belonging to several organisations. The Grid behaves as a single virtual supercomputer. The types of components found in a Grid and their interactions are shown in figure 1. Components on the Grid require monitoring information about other components of the Grid. For example, the resource broker could be looking for a lightly loaded computing element to process a job or the User Interface could be running a visualisation tool tracking the progress of a job. Monitoring data is published about each resource on the Grid, e.g. a computing element publishes data about the number of jobs it is currently processing. This monitoring data can be seen as a stream. These streams are distributed across the Grid. Components must be able to locate and request monitoring data of interest. Figure 1: The components of the European DataGrid User Interface Monitoring System Status Information Data Transfer Job Submission Resource Broker Logging and Bookkeeping Replica Catalogue Computer Computing Element Computer Query Results Storage Element R-GMA: A Grid Monitoring System R-GMA is a Grid monitoring system that has been developed as part of the DataGrid project. It is an information integration system that provides a virtual database containing information about all the resources of a Grid. The architecture of R-GMA is shown in figure 2. R-GMA consists of: Schema: provides a vocabulary with which to communicate. Producers: publish monitoring information and respond to queries. Consumers: query for monitoring information. Republishers: query for monitoring information and publish their answer. Registry: matches consumer requests for information with relevant publishers. Republishers allow queries to be answered more efficiently. They collect together streams from the producers and make the combined stream available from a single point on the Grid. However, they increase the difficulty of query answering as tuples can come directly from producers or they can come from republishers. Query Planning Within R-GMA logical reasoning is used to generate a query plan for a continuous query. The reasoning follows three phases and is distributed between the registry and the consumer. The query plans generated are sound, complete, duplicate free (get each tuple once) and weakly ordered (for each stream, tuples appear in same order they were originally published in). Query Plan Maintanence Continuous queries are posed from the point in time when they were created until they are stopped by the consumer. During this period, the set of publishers in the system can change. Therefore, the consumser’s query plans must be maintained to reflect the current set of available publishers. There are four cases to consider: In each case, the registry informs the consumer of the change. The consumer then consults their meta query plan to see if it needs amending. By using the meta query plan approach, we reduce how often a plan must be recalculated when a publisher is added or removed from the system. The meta query plan holds a list of alternative publishers, so alterations to the executed query plan can be made quickly. Figure 2: The architecture of R-GMA Register Query Data Register View Republisher Register Query & View (Q=V) Consumer Producer Schema Registry 1.For each consumer the registry will generate a list of publishers (producers and republishers) who could potentially contribute answers, i.e. publishers whose condition does not contradict the condition of the query. These are the relevant publishers. 2.The consumer keeps only those publishers which are not strictly covered by another relevant publisher, i.e. in figure 2, two of the producers are strictly covered by the republisher. These are the maximal relevant publishers. These are grouped into equivalence classes based on the tuples made available for the query. We call this a meta query plan. 3.From the meta query plan, the consumer can construct a query plan which contains one publisher from each equivalence class. 1.A new producer is added to the system. 2.An existing producer is dropped from the system. 3.A new republisher is added to the system. 4.An existing republisher is dropped from the system.