Unibasel Toward Replication in Grids for Digital Libraries with Freshness and Correctness Guarantees* Fuat Akal, Heiko Schuldt and Hans-Jörg

Slides:



Advertisements
Similar presentations
DILIGENT Digital libraries powered by the Grid Peter Fankhauser
Advertisements

Service Oriented Architecture for Mobile Applications Swarupsingh Baran University of North Carolina Charlotte.
ARGUGRID Use Case using Instrumentation Mary Grammatikou National Technical University of Athens OGF 2009, Catania.
TU e technische universiteit eindhoven / department of mathematics and computer science Modeling User Input and Hypermedia Dynamics in Hera Databases and.
The Top 10 Reasons Why Federated Can’t Succeed And Why it Will Anyway.
gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI ICIS Requirements Gathering.
Digital Libraries of the Future – and the Role of Libraries Donatella Castelli ISTI-CNR.
CoreGRID Workpackage 5 Virtual Institute on Grid Information and Monitoring Services Authorizing Grid Resource Access and Consumption Erik Elmroth, Michał.
OSIRIS Middleware & ISIS Application DELOS All Tasks Meeting Heiko Schuldt University of Basel, Switzerland UMIT, Austria.
Transactional Services Ricardo Jiménez-Peris Marta Patiño-Martínez Technical University of Madrid 1 st Adapt Workshop 23 rd -24 th September 2002 Madrid,
1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.
1 Archival Storage for Digital Libraries Arturo Crespo Hector Garcia-Molina Stanford University.
Introduction and Conceptual Modeling
Module 14: Scalability and High Availability. Overview Key high availability features available in Oracle and SQL Server Key scalability features available.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Client/Server Architectures
(ITI310) SESSIONS : Active Directory By Eng. BASSEM ALSAID.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Designing Efficient Systems Services and Primitives for Next-Generation Data-Centers K. Vaidyanathan, S. Narravula, P. Balaji and D. K. Panda Network Based.
Training Workshop Windows Azure Platform. Presentation Outline (hidden slide): Technical Level: 200 Intended Audience: Developers Objectives (what do.
Sistem Basis Data (DATABASE) Siauw Yohanes Darmawan
A Lightweight Platform for Integration of Resource Limited Devices into Pervasive Grids Stavros Isaiadis and Vladimir Getov University of Westminster
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
Using SRB and iRODS with the Cheshire3 Information Framework Building Data Grids with iRODS May, 2008 National e-Science Centre Edinburgh Dr Robert.
Software Architecture Framework for Ubiquitous Computing Divya ChanneGowda Athrey Joshi.
Ohio State University Department of Computer Science and Engineering 1 Cyberinfrastructure for Coastal Forecasting and Change Analysis Gagan Agrawal Hakan.
San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of.
BFTCloud: A Byzantine Fault Tolerance Framework for Voluntary-Resource Cloud Computing Yilei Zhang, Zibin Zheng, and Michael R. Lyu
A DΙgital Library Infrastructure on Grid EΝabled Technology ETICS Usage in DILIGENT Pedro Andrade
Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering Nithya N. Vijayakumar, Beth Plale DDE Lab, Indiana University {nvijayak,
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
DEPICT: DiscovEring Patterns and InteraCTions in databases A tool for testing data-intensive systems.
The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)
Heterogeneous Database Replication Gianni Pucciani LCG Database Deployment and Persistency Workshop CERN October 2005 A.Domenici
1 4/23/2007 Introduction to Grid computing Sunil Avutu Graduate Student Dept.of Computer Science.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
Yuhui Chen; Romanovsky, A.; IT Professional Volume 10, Issue 3, May-June 2008 Page(s): Digital Object Identifier /MITP Improving.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
Derek Wright Computer Sciences Department University of Wisconsin-Madison MPI Scheduling in Condor: An.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Distributed database system
Replica Consistency in a Data Grid1 IX International Workshop on Advanced Computing and Analysis Techniques in Physics Research December 1-5, 2003 High.
From Digital Objects to Content across eInfrastructures Content and Storage Management in gCube Pasquale Pagano CNR –ISTI on behalf of Heiko Schuldt Dept.
Highly available database clusters with JDBC
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Enabling e-Research in Combustion Research Community T.V Pham 1, P.M. Dew 1, L.M.S. Lau 1 and M.J. Pilling 2 1 School of Computing 2 School of Chemistry.
Fine-Grained Replication and Scheduling with Freshness and Correctness Guarantees F.Akal 1, C.Türker 1, H.-J.Schek 1, Y.Breitbart 2, T.Grabs 3, L.Veen.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
System/SDWG Update Management Council Face-to-Face Flagstaff, AZ August 22-23, 2011 Sean Hardman.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
Software Reuse Course: # The Johns-Hopkins University Montgomery County Campus Fall 2000 Session 4 Lecture # 3 - September 28, 2004.
Data and storage services on the NGS.
D4Science and ETICS Building and Testing gCube and gCore Pedro Andrade CERN EGEE’08 Conference 25 September 2008 Istanbul (Turkey)
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Collection and storage of provenance data Jakub Wach Master of Science Thesis Faculty of Electrical Engineering, Automatics, Computer Science and Electronics.
ETICS An Environment for Distributed Software Development in Aerospace Applications SpaceTransfer09 Hannover Messe, April 2009.
Active Directory Domain Services (AD DS). Identity and Access (IDA) – An IDA infrastructure should: Store information about users, groups, computers and.
Pedro Andrade > IT-GD > D4Science Pedro Andrade CERN European Organization for Nuclear Research GD Group Meeting 27 October 2007 CERN (Switzerland)
M.-E. Bégin¹, S. Da Ronco², G. Diez-Andino Sancho¹, M. Gentilini³, E. Ronchieri ², and M. Selmi² ¹CERN, Switzerland, ² INFN-Padova, Italy, ³INFN-CNAF,
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
System Software Laboratory Databases and the Grid by Paul Watson University of Newcastle Grid Computing: Making the Global Infrastructure a Reality June.
Web and mobile access to digital repositories Mario Torrisi National Institute of Nuclear Physics – Division of
Introduction to Load Balancing:
GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European.
The Top 10 Reasons Why Federated Can’t Succeed
Data Warehouse.
MORE ON ARCHITECTURES The main reasons for using an architecture are maintainability and performance. We want to structure the software into reasonably.
Terms: Data: Database: Database Management System: INTRODUCTION
Introduction to Databases
Presentation transcript:

unibasel Toward Replication in Grids for Digital Libraries with Freshness and Correctness Guarantees* Fuat Akal, Heiko Schuldt and Hans-Jörg University of Basel, Computer Science Department Bernoullistr 16, CH-4056, Basel, Switzerland 3 rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September 2007 * The work has been partly supported by the EU in the 6 th framework programme within the project DILIGENT (contract No. IST ). >

unibasel 3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September Example Scenario Satellite pictures of Mediterranean Sea are continuously taken and... stored as complex documents in a Digital Library (DL). A typical activity is to generate periodical reports. Image Features Image Features Image Features Image Features Image Features Image Features Storage Properties MER_RR__2P MER … World World Europe Bigger_Europe Smaller_Europe Mediterranean Iberia North_Atlantic Africa North_Africa Middle_East Portugal... MER_RR__2P MER … World World Europe Bigger_Europe Smaller_Europe Mediterranean Iberia North_Atlantic Africa North_Africa Middle_East Portugal... Metadata as XML Documents Earth Observation Simple Boolea n Querie s Image Similarit y Queries

unibasel 3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September Watching the Environment Closely Monitoring of the Mediterranean Sea There are some busy oil terminals in the region –Oil tankers keep floating in the sea –Potential oil spill into the sea Earth Observation Both are extremely concerned about the environment! Data Grid satellite images, metadata, image features... „I am interested in Greek coasts as of last week“ „Fresh Turkish water please“ Scientist 1 in Athens Greece Scientist 2 in Antalya Turkey

unibasel 3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September Desired Replica Management in the Grid Scientist 1 in Athens Greece Scientist 2 in Antalya Turkey satellite images, metadata, image features... Entire Mediterranean Turkish Coasts Greek Coasts storage node 0 sn 1 sn 2 sn 3 Greek Coasts Scientist 3 in Thessaloniki Greece Data Grid Assumption: Whole data is collected at a single node, e.g. ESA in Italy Automatic selection of the best replica from the user‘s location Replication at a higher level, e.g. collections, subcollections. Dynamic decision on when/where to create replicas, e.g. sn 1 becomes a hot spot Freshness and correctness guarantees on accessed data is insured, e.g. „I want uptodate data“ Sophisticated replication mechanism is required! Create Replica Scientists may also 1) write back their reports and/or 2) create versions of documents or annotate

unibasel 3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September Outline Digital Library built atop a grid middleware –Rich variety, structure, volume of data, e.g. traditional documents, complex multimedia objects Simple Boolean queries as well as sophisticated multi-feature similarity queries –Consistent access to up-to-date data may be essential Rest of the talk is... –Replication in a DB Cluster –Transition from a DB cluster to the Grid –DILIGENT Replication Architecture –Conclusions and Outlook

unibasel 3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September Replication in a DB Cluster (PDBREP) Available replication solutions for grid environments do not meet all of the desired properties just mentioned, e.g. freshness and correctness. In our previous work [VLDB2005], we devised a replication protocol for database clusters named PDBREP. –It provides already some properties of what we call desired replica management in the Grid, e.g. freshnes, higher replication granularity. Our approach in this work is to start with this protocol and adapt it to the grid. PDBREP stands for PowerDB Replication, which was a a project conducted at ETH Zurich partially supported by Microsoft.

unibasel 3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September Replication in a DB Cluster (PDBREP) Update Node(s) U: update(a)Q: query(a, b, fr) a,c a,b,c,d Coordination Middleware Continuous Update Broadcast Read-only Nodes Continuous Update Propagation Transactions (only, when the node is idle) Local Update Queue Global Log db,db,c U w(a) Q r(b)r(a) distributed query execution fr : freshness requirement, e.g. „I am fine with 2 minutes old data“, „I want fresh data“ etc. Refresh Transactions (on-demand) + +

unibasel 3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September Transition to the Grid UpdatesQueries Coordination Middleware Update Node(s) Read-only Nodes We still distinguish update and read-only nodes Potentially several update nodes –We still assume that all updates are serialized into a global log Broadcast of updates not feasible, replicas subscribe for changes instead Service Oriented Architecture More nodes which are heterogeneous Failures are more likely to happen Global Log

unibasel 3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September Replication Granularity The unit of replication is called a DataSet (DS) –A DataSet can be a collection of documents, a subcollection or as small as a single document. –Rule based definition: information on a specific region, documents not older than 30 days, created between date1 and date 2, etc... Collection of Satellite Images and its metadata Subcollection 1Subcollection 2 DataSet 1 Entire Mediterranean Turkish Coasts Greek Coasts DS 2

unibasel 3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September sn 1 sn 5 sn 2sn 3 DILIGENT Grid Replication Architecture Storage Node 4 DS 1 DS 2 DS 3 DS 4 DS 1 DS 2 DS 3 DS 1 : 1 DS 2 : 2,3 DS 3 : 5 DS 4 : 4 Replica Catalog DS 1 : 1 DS 2 : 2,3 DS 3 : 5 DS 4 : 4 Replica Catalog DS 1 : DS 2 :, DS 3 : DS 4 : Freshness Repository DS 1 : DS 2 :, DS 3 : DS 4 : Freshness Repository (1) Read(DS 2 (x), DS 4 (y), 0.6) (2.1) Locate bestReplicas Client (3) Read Data continuous propagation Queue.... TS x, W x, DS y... DS 4 Update Queue subscription SN 1 : 50% SN 2 : 25% SN 3 : 60% SN 4 : 30% SN 5 : 50% Load Repository SN 1 : 50% SN 2 : 25% SN 3 : 60% SN 4 : 30% SN 5 : 50% Load Repository (2.2) (2.3) RMS RSS FTS Access History (4) Log

unibasel 3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September Conclusions & Outlook We presented the first steps of our on-going work whose ultimate goal is to come up with a fully integrated and self-managing replication subsystem for the Grid We want to adapt an existing database replication mechanism, i.e. PDBREP from database clusters to data grids This looks feasible: –The infrastructure related assumptions like broadcasting of changes to replicas can be replaced by a subscription mechanism easily –Additional components presented in the envisioned architecture to facilitate scheduling of queries can be included in the PDBREP without requiring major changes. Implementation of the DILIGENT replication on top of gLite is still ongoing

unibasel 3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September Thank you!.. Questions?

unibasel 3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September References 1.DILIGENT: A DIgital Library Infrastructure on Grid ENabled Technology. IST F. Akal, C. T¨urker, H.-J. Schek, Y. Breitbart, T. Grabs, and L. Veen. Fine-Grained Replication and Scheduling with Freshness and Correctness Guarantees. In VLDB, pages 565–576, 2005.