From Digital Objects to Content across eInfrastructures Content and Storage Management in gCube Pasquale Pagano CNR –ISTI on behalf of Heiko Schuldt Dept.

Slides:



Advertisements
Similar presentations
Database System Concepts and Architecture
Advertisements

PROVENANCE FOR THE CLOUD (USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES(FAST `10)) Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard.
Transaction.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Chapter 13 (Web): Distributed Databases
gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI ICIS Requirements Gathering.
Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
ABCSG - Distributed Database 1 Data Management Distributed Database Data Replication.
GGF Toronto Spitfire A Relational DB Service for the Grid Peter Z. Kunszt European DataGrid Data Management CERN Database Group.
Overview Distributed vs. decentralized Why distributed databases
Data Management I DBMS Relational Systems. Overview u Introduction u DBMS –components –types u Relational Model –characteristics –implementation u Physical.
Chapter 12 Distributed Database Management Systems
DATABASE MANAGEMENT SYSTEMS 2 ANGELITO I. CUNANAN JR.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Cube Enterprise Database Solution presented to MTF GIS Committee presented by Minhua Wang Citilabs, Inc. November 20, 2008.
Database Design – Lecture 16
Chapter 1 Introduction to Databases Pearson Education ©
Database Architecture Introduction to Databases. The Nature of Data Un-structured Semi-structured Structured.
ATLAS DQ2 Deletion Service D.A. Oleynik, A.S. Petrosyan, V. Garonne, S. Campana (on behalf of the ATLAS Collaboration)
Unibasel Toward Replication in Grids for Digital Libraries with Freshness and Correctness Guarantees* Fuat Akal, Heiko Schuldt and Hans-Jörg
 DATABASE DATABASE  DATABASE ENVIRONMENT DATABASE ENVIRONMENT  WHY STUDY DATABASE WHY STUDY DATABASE  DBMS & ITS FUNCTIONS DBMS & ITS FUNCTIONS 
Introduction: Databases and Database Users
Chapter 7: Database Systems Succeeding with Technology: Second Edition.
Lecture On Introduction (DBMS) By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
EGEE-III INFSO-RI Enabling Grids for E-sciencE The Medical Data Manager : the components Johan Montagnat, Romain Texier, Tristan.
Massively Distributed Database Systems - Distributed DBS Spring 2014 Ki-Joune Li Pusan National University.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 12 Distributed Database Management Systems.
Week 5 Lecture Distributed Database Management Systems Samuel ConnSamuel Conn, Asst Professor Suggestions for using the Lecture Slides.
INFSO-RI Enabling Grids for E-sciencE Distributed Metadata with the AMGA Metadata Catalog Nuno Santos, Birger Koblitz 20 June 2006.
Distributed Database Systems Overview
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.
Enabling Grids for E-sciencE Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
Kjell Orsborn UU - DIS - UDBL DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002 A second course on development of database systems Kjell.
Replica Management Services in the European DataGrid Project Work Package 2 European DataGrid.
Replica Consistency in a Data Grid1 IX International Workshop on Advanced Computing and Analysis Techniques in Physics Research December 1-5, 2003 High.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
Managing Data DIRAC Project. Outline  Data management components  Storage Elements  File Catalogs  DIRAC conventions for user data  Data operation.
INFSO-RI Enabling Grids for E-sciencE Introduction Data Management Ron Trompert SARA Grid Tutorial, September 2007.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
Distributed Database Management Systems. Reading Textbook: Ch. 1, Ch. 3 Textbook: Ch. 1, Ch. 3 For next class: Ch. 4 For next class: Ch. 4 FarkasCSCE.
Lecture On Introduction (DBMS) By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
INFSO-RI Enabling Grids for E-sciencE University of Coimbra GSAF Grid Storage Access Framework Salvatore Scifo INFN of Catania EGEE.
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Architecture of LHC File Catalog Valeria Ardizzone INFN Catania – EGEE-II NA3/NA4.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
DISTRIBUTED DATABASES AND DDBMS. Learning Objectives  Describe various DDBMS implementations  Explain how database design affects the DDBMS environment.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
1 Tutorial Outline 30’ From Content Management Systems to VREs 50’ Creating a VRE 80 Using a VRE 20’ Conclusions.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
Riccardo Zappi INFN-CNAF SRM Breakout session. February 28, 2012 Ingredients 1. Basic ingredients (Fabric & Conn. level) 2. (Grid) Middleware ingredients.
EGEE Data Management Services
Introduction to Databases
(on behalf of the POOL team)
Overview of MDM Site Hub
GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European.
Introduction to Data Management in EGI
GSAF Grid Storage Access Framework
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Grid Data Integration In the CMS Experiment
7.1. CONSISTENCY AND REPLICATION INTRODUCTION
Distributed Database Management Systems
Database System Concepts and Architecture
Introduction to Databases
Presentation transcript:

From Digital Objects to Content across eInfrastructures Content and Storage Management in gCube Pasquale Pagano CNR –ISTI on behalf of Heiko Schuldt Dept. of Computer Science University of Basel Switzerland

EGEE Conference, Budapest2 Content & Storage Management: Challenges  Store large volumes of digital content in the Grid  But there is much more:  Metadata for each object MER_RR__2P MER RR 2P MERIS ___ALGAL_1 ALGAL World World Europe Bigger_Europe Smaller_Europe Mediterranean Iberia North_Atlantic Africa North_Africa Middle_East Portugal MER_RR__2P MER RR 2P MERIS ___ALGAL_1 ALGAL World World Europe Bigger_Europe Smaller_Europe Mediterranean Iberia North_Atlantic Africa North_Africa Middle_East Portugal MER_RR__2P MER RR 2P MERIS ___ALGAL_1 ALGAL World World Europe Bigger_Europe Smaller_Europe Mediterranean Iberia North_Atlantic Africa ENVI... MER_RR__2P MER RR 2P MERIS ___ALGAL_1 ALGAL World World Europe Bigger_Europe Mediterranean Iberia North_Atlantic Africa North_Africa Middle_East ENVI...  Automatically extracted features per object (e.g., color histo- grams for images)  Storage properties (e.g., size, etc.)  … and all that highly inter- connected

EGEE Conference, Budapest3 Introduction  gCube Data Management provides means for  Persistently storing and physically structuring of content.  Logical grouping of content in collections – independent of physical locations.  Logical sharing of content among several collections.  Complex content consisting of several parts and having multiple representations.  Replication and partition of content.  Subscription and notification  Populating VREs by importing/linking pre-existing content.  gCube exploits Grid technology file-system-like functionalities to manage content storage

EGEE Conference, Budapest4 Storage Properties Content example (Earth Observation) Satellite Image Image Features Image Features Image Features MER_RR__2P MER RR 2P MERIS ___ALGAL_1 ALGAL World World Europe Bigger_Europe Smaller_Europe Mediterranean Iberia North_Atlantic Africa North_Africa Middle_East Portugal ENVI... Metadata as XML Document Metadata Management Content & Storage Management Feature Extraction Storage Properties

EGEE Conference, Budapest5 Document Model  All entities in the VRE are information objects  Collections, documents, metadata  Any information object can be stored or fetched independently of what it represents Information Object Storage_Property Object Reference comprise Reference_Source Reference_Target Reference_Role Reference_Propagation Info_Object_ID Property_Name Property_Type Property_Value BLOB object File object ISA

EGEE Conference, Budapest6 Data Management Architecture Content Management Layer Base Layer Storage Management Layer Content Management Service Collection Management Service Notification Service Archive Import Service Storage Management Service Any off-the-shelf DBMS, e.g. MySQL Any SRM enabled SE, e.g. DPM JDBC GFAL Replication Management Service Properties & Relations Raw File Content Cataloguing API Storage API LFC SRM

EGEE Conference, Budapest7 gCube Storage Model Content Management Layer Base Layer Storage Management Layer RDBMSgLite SE ISA BLOB object File object Content GUID Information Object Document

EGEE Conference, Budapest8 Where to Store What?  An information object is stored in the relational database management system (RDBMS) and in the Storage Element (SE)  The RDBMS is used to store  properties of documents  relationships between documents  Grid storage is used to store raw data

EGEE Conference, Budapest9 Replication  Replication  Fully or partially duplicates raw data among the nodes of a distributed system  Replication management: responsible for the maintenance of replicas  Ensures consistency of multiple copies of the same data object  Identification of master site (original –and updateable– copy) and slave sites (replica)  Basic approaches for maintaining replicas : eager vs. lazy  Eager replication: synchronizes replicas within the boundaries of the update transaction  Lazy replication: decouples synchronization from the updating transaction (replication done in separate transaction)

EGEE Conference, Budapest10 Replication Management Service  Replication Management service is an internal service of the Base Layer  Goal:  increase the degree of availability of content  Provides transparent access to data storage Get Document Store Document Content Management Storage Management DB SE DB 1 DB m SE 1 SE n...

EGEE Conference, Budapest11 Replication Management Service  Features  Ensure consistency of multiple copies of the same data object  Support data with different degrees of freshness  Guarantee consistent reads for all objects of a collection  Handle replication internally lazily, although being eager from the user‘s perspective  Adjust to changing load by dynamically creating new replicas

EGEE Conference, Budapest12 Advanced Replication in the Grid Update Sites Update Transactions Read-only Sites Middleware  Update and read-only storage nodes in the Grid  Query can be served by any node; changes only to be sent to update node  Many update nodes per object: Correct serialization and propagation  Specific features for replication in the Grid  Replicas subscribe for changes instead  Large number of heterogeneous nodes Update Transactions Read- only

EGEE Conference, Budapest13 Base Layer Replication Service Storage Management (SM) Layer Content Management (CM) Layer ColMCMAINS storage node 1 Replicator storage node 2 Replicator storage node n Replicator … Replication in gCube Metadata Management Index Management Change request to a document, e.g. update a document  Replication Service: internal service of the Base Layer  Provides transparent access to data stores Invoking appropriate SM operation Notification is generated to maintain metadata/indexes Update request is handled by the master site of the document Document is updated Update is being propagated to replicas 5.1 Notifications are consumed by external services 6 2

EGEE Conference, Budapest14 Content & Storage Mgmnt in gCube: Summary  Basic tools and protocols are in place but gCube Data Management orchestrates all of them in order to  Associate the different parts of an information object  Associate information objects and all its meta data  Transparently replicate information objects and their meta data

EGEE Conference, Budapest15 Grid (gLite) Storage  LFC: LCG File Catalog (LCG: LHC Computing Grid / LHC: Large Hadron Collider)  Centralized catalog for storing locations of files stored in the grid  Complete catalog can be replicated  SRM: Storage Resource Manager. Interface to  Copy a file on a storage element  Gather information about a file stored into a storage element (SE)  Remove a file from a SRM storage  Retrieve information about a SRM managed storage.  GFAL: Grid File Access Library  provides calls for catalog interaction, storage management and file access and can be very handy when an application requires access to some part  DPM: Disk Pool Manager  APIs for accessing local storage