Presentation is loading. Please wait.

Presentation is loading. Please wait.

Unibasel Toward Replication in Grids for Digital Libraries with Freshness and Correctness Guarantees* Fuat Akal, Heiko Schuldt and Hans-Jörg

Similar presentations


Presentation on theme: "Unibasel Toward Replication in Grids for Digital Libraries with Freshness and Correctness Guarantees* Fuat Akal, Heiko Schuldt and Hans-Jörg"— Presentation transcript:

1 unibasel Toward Replication in Grids for Digital Libraries with Freshness and Correctness Guarantees* Fuat Akal, Heiko Schuldt and Hans-Jörg Schek @unibas.ch, schek@inf.ethz.ch University of Basel, Computer Science Department Bernoullistr 16, CH-4056, Basel, Switzerland 3 rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September 2007 * The work has been partly supported by the EU in the 6 th framework programme within the project DILIGENT (contract No. IST-2003-004260). >

2 unibasel 3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September 20072 Example Scenario Satellite pictures of Mediterranean Sea are continuously taken and... stored as complex documents in a Digital Library (DL). A typical activity is to generate periodical reports. Image Features Image Features Image Features Image Features Image Features Image Features Storage Properties MER_RR__2P MER … 17000 12000 22000 13500 World World Europe Bigger_Europe Smaller_Europe Mediterranean Iberia North_Atlantic Africa North_Africa Middle_East Portugal... MER_RR__2P MER … 17000 12000 22000 13500 World World Europe Bigger_Europe Smaller_Europe Mediterranean Iberia North_Atlantic Africa North_Africa Middle_East Portugal... Metadata as XML Documents Earth Observation Simple Boolea n Querie s Image Similarit y Queries

3 unibasel 3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September 20073 Watching the Environment Closely Monitoring of the Mediterranean Sea There are some busy oil terminals in the region –Oil tankers keep floating in the sea –Potential oil spill into the sea Earth Observation Both are extremely concerned about the environment! Data Grid satellite images, metadata, image features... „I am interested in Greek coasts as of last week“ „Fresh Turkish water please“ Scientist 1 in Athens Greece Scientist 2 in Antalya Turkey

4 unibasel 3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September 20074 Desired Replica Management in the Grid Scientist 1 in Athens Greece Scientist 2 in Antalya Turkey satellite images, metadata, image features... Entire Mediterranean Turkish Coasts Greek Coasts storage node 0 sn 1 sn 2 sn 3 Greek Coasts Scientist 3 in Thessaloniki Greece Data Grid Assumption: Whole data is collected at a single node, e.g. ESA in Italy Automatic selection of the best replica from the user‘s location Replication at a higher level, e.g. collections, subcollections. Dynamic decision on when/where to create replicas, e.g. sn 1 becomes a hot spot Freshness and correctness guarantees on accessed data is insured, e.g. „I want uptodate data“ Sophisticated replication mechanism is required! Create Replica Scientists may also 1) write back their reports and/or 2) create versions of documents or annotate

5 unibasel 3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September 20075 Outline Digital Library built atop a grid middleware –Rich variety, structure, volume of data, e.g. traditional documents, complex multimedia objects Simple Boolean queries as well as sophisticated multi-feature similarity queries –Consistent access to up-to-date data may be essential Rest of the talk is... –Replication in a DB Cluster –Transition from a DB cluster to the Grid –DILIGENT Replication Architecture –Conclusions and Outlook

6 unibasel 3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September 20076 Replication in a DB Cluster (PDBREP) Available replication solutions for grid environments do not meet all of the desired properties just mentioned, e.g. freshness and correctness. In our previous work [VLDB2005], we devised a replication protocol for database clusters named PDBREP. –It provides already some properties of what we call desired replica management in the Grid, e.g. freshnes, higher replication granularity. Our approach in this work is to start with this protocol and adapt it to the grid. PDBREP stands for PowerDB Replication, which was a a project conducted at ETH Zurich partially supported by Microsoft.

7 unibasel 3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September 20077 Replication in a DB Cluster (PDBREP) Update Node(s) U: update(a)Q: query(a, b, fr) a,c a,b,c,d Coordination Middleware Continuous Update Broadcast Read-only Nodes Continuous Update Propagation Transactions (only, when the node is idle) Local Update Queue Global Log db,db,c U w(a) Q r(b)r(a) distributed query execution fr : freshness requirement, e.g. „I am fine with 2 minutes old data“, „I want fresh data“ etc. Refresh Transactions (on-demand) + +

8 unibasel 3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September 20078 Transition to the Grid UpdatesQueries Coordination Middleware Update Node(s) Read-only Nodes We still distinguish update and read-only nodes Potentially several update nodes –We still assume that all updates are serialized into a global log Broadcast of updates not feasible, replicas subscribe for changes instead Service Oriented Architecture More nodes which are heterogeneous Failures are more likely to happen Global Log

9 unibasel 3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September 20079 Replication Granularity The unit of replication is called a DataSet (DS) –A DataSet can be a collection of documents, a subcollection or as small as a single document. –Rule based definition: information on a specific region, documents not older than 30 days, created between date1 and date 2, etc... Collection of Satellite Images and its metadata Subcollection 1Subcollection 2 DataSet 1 Entire Mediterranean Turkish Coasts Greek Coasts DS 2

10 unibasel 3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September 200710 sn 1 sn 5 sn 2sn 3 DILIGENT Grid Replication Architecture Storage Node 4 DS 1 DS 2 DS 3 DS 4 DS 1 DS 2 DS 3 DS 1 : 1 DS 2 : 2,3 DS 3 : 5 DS 4 : 4 Replica Catalog DS 1 : 1 DS 2 : 2,3 DS 3 : 5 DS 4 : 4 Replica Catalog DS 1 : DS 2 :, DS 3 : DS 4 : Freshness Repository DS 1 : DS 2 :, DS 3 : DS 4 : Freshness Repository (1) Read(DS 2 (x), DS 4 (y), 0.6) (2.1) Locate bestReplicas Client (3) Read Data continuous propagation Queue.... TS x, W x, DS y... DS 4 Update Queue subscription SN 1 : 50% SN 2 : 25% SN 3 : 60% SN 4 : 30% SN 5 : 50% Load Repository SN 1 : 50% SN 2 : 25% SN 3 : 60% SN 4 : 30% SN 5 : 50% Load Repository (2.2) (2.3) RMS RSS FTS Access History (4) Log

11 unibasel 3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September 200711 Conclusions & Outlook We presented the first steps of our on-going work whose ultimate goal is to come up with a fully integrated and self-managing replication subsystem for the Grid We want to adapt an existing database replication mechanism, i.e. PDBREP from database clusters to data grids This looks feasible: –The infrastructure related assumptions like broadcasting of changes to replicas can be replaced by a subscription mechanism easily –Additional components presented in the envisioned architecture to facilitate scheduling of queries can be included in the PDBREP without requiring major changes. Implementation of the DILIGENT replication on top of gLite is still ongoing

12 unibasel 3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September 200712 Thank you!.. Questions?

13 unibasel 3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September 200713 References 1.DILIGENT: A DIgital Library Infrastructure on Grid ENabled Technology. http://www.diligentproject.org/. IST-2003-004260 2.F. Akal, C. T¨urker, H.-J. Schek, Y. Breitbart, T. Grabs, and L. Veen. Fine-Grained Replication and Scheduling with Freshness and Correctness Guarantees. In VLDB, pages 565–576, 2005.


Download ppt "Unibasel Toward Replication in Grids for Digital Libraries with Freshness and Correctness Guarantees* Fuat Akal, Heiko Schuldt and Hans-Jörg"

Similar presentations


Ads by Google