Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 P.Kunszt LCGP 13.3.2002 Data Management on the GRID Peter Z. Kunszt CERN Database Group EU DataGrid – Data Management.

Similar presentations


Presentation on theme: "1 P.Kunszt LCGP 13.3.2002 Data Management on the GRID Peter Z. Kunszt CERN Database Group EU DataGrid – Data Management."— Presentation transcript:

1 1 P.Kunszt LCGP 13.3.2002 Data Management on the GRID Peter Z. Kunszt CERN Database Group EU DataGrid – Data Management

2 2 P.Kunszt LCGP 13.3.2002 Personal Information PhD in theoretical physics (Lattice QCD) at U of Bern ‘Builder of the SDSS Project’ – design and implementation work on the SDSS science archive SX both Objectivity and MS SQLServer CERN Database Group Activity task leader for Grid Data Management Management of WP2 (Data Management) of the EDG Project

3 3 P.Kunszt LCGP 13.3.2002 Scope of Data Management Data Transfer –Transport protocols Data Access –Remote I/O –Security / Policies Data Storage –Hierarchical Storage –Mass Storage Replication –Peer-to-Peer –Centralized –Distributed –Automatic Metadata management –Scalable –Distributed –Consistent Persistency –Grid-enabled databases and data stores –Independent of back-end implementation Optimisation –Data Access optimisation –Cost minimsation

4 4 P.Kunszt LCGP 13.3.2002 Vision of Grid Data Management Distributed Shared Data Storage Ubiquitous Data Access Transparent Data Transfer and Migration Consistency and Robustness Optimisation

5 5 P.Kunszt LCGP 13.3.2002 Vision of Grid Data Management GRID Distributed Shared Data Storage – Different architectures – Heterogenous data stores – Self-describing data and metadata

6 6 P.Kunszt LCGP 13.3.2002 Vision of Grid Data Management GRID Ubiquitous Data Access – Global Namespace – Transparent security control and enforcement – Access from anytime anywhere, physical data location irrelevant – Automatic Data Replication and Validation

7 7 P.Kunszt LCGP 13.3.2002 Vision of Grid Data Management GRID Transparent Data Transfer and Migration – Protocol negotiation and multiple protocol support – Management of data formats and database versions

8 8 P.Kunszt LCGP 13.3.2002 Vision of Grid Data Management GRID Consistency and Robustness – Replicated data is reasonably up-to-date – Reliable data transfer – Self-detecting and self-correcting mechanisms upon data corruption      X

9 9 P.Kunszt LCGP 13.3.2002 Vision of Grid Data Management GRID Optimisation – Customisation or self-adaptation to specific access patterns – Distributed Querying, Data Analysis and Data Mining ? ? ? !

10 10 P.Kunszt LCGP 13.3.2002 Grid Data Management Dependencies Performance Reliability Availability Usability Media Hardware Operating System Local File System Network Software Protocols Storage System

11 11 P.Kunszt LCGP 13.3.2002 Existing Middleware for Grid Data Management - Overview Globus –GridFTP –Replica Catalog –Replica Manager EU DataGrid –GDMP –Replica Catalog –Replica Manager –Spitfire Condor –NeST PPDG –Magda –JASMine –GDMP Griphyn/iVDGL –Virtual Data Toolkit Storage Resource Broker Storage Resource Manager ROOT –Alien Nimrod-G Not exhaustive

12 12 P.Kunszt LCGP 13.3.2002 Globus Data Management GridFTP –Fast, parallel file transfer –Towards self-optimising system –Work on reliable file transfer on top Replica Catalog – jointly with EDG WP2 –Configurable –Distributed, hierarchical –Scalable Replica Manager Security infrastructure

13 13 P.Kunszt LCGP 13.3.2002 European DataGrid WP2 GDMP – with PPDG –In production with CMS for Objectivity replication –Subscription-based replication –Scalable architecture Replica Catalog with Globus Replica Manager and Optimiser –Take Globus RM as core –Additional modules for pre- postprocessing of data Replica Selection in the WP2 Optimisation task –Simulator to test replica selection Spitfire –Unified front-end to databases –Suitable for Grid and Application Metadata

14 14 P.Kunszt LCGP 13.3.2002 WP2 Replica Manager Architecture Core API Optimisation API Replica Catalogue Metadata Catalogue

15 15 P.Kunszt LCGP 13.3.2002 Condor Data Management Condor Matchmaking –Find optimal resource Condor Network Storage (NeST) –Generic access to storage – abstract storage interface –Virtual Protocol Layer –User Management and Reservation Chirp –Minimum set of file access requests –Meta-management requests Condor Bypass

16 16 P.Kunszt LCGP 13.3.2002 PPDG / Griphyn Data Management Globus, Condor, SRB GDMP – with EDG Magda –To be used in ATLAS data challenges –Metadata catalog JASMine JLAB Asynchronous Storage Manager –Storage Management and Resource –Replica catalog based on MySQL, as Web Service –Replication service –File Server Griphyn Virtual Data System

17 17 P.Kunszt LCGP 13.3.2002 SRB, SRM SDSC Storage Resource Broker –Advanced resource techniques –Replica Catalog based on Oracle, catalog itself is being replicated using Oracle’s replication mechanism Storage Resource Manager (LBNL) –Interfaces to any Storage System –Joint functional definition with EDG, PPDG, Griphyn

18 18 P.Kunszt LCGP 13.3.2002 Reference Technologies P2P technology –Gnutella –Napster –Freenet –Oceanstore –CHORD –CAN –JXTA Search –Mojo Nation Database technology –Replication –Distributed heterogeneous databases –Query planning and optimization Storage –Unitree –DMF –HPSS –Castor, Enstore, Eurostore –SAM File Systems –AFS, Coda, Intermezzo –NFS –GPFS, CXFS, GFS, DFS, DAFS –SlashGrid

19 19 P.Kunszt LCGP 13.3.2002 Application to LCG Project Bridge the gap between immediate needs of experiments for production quality grid middleware and existing prototype middleware –Evolve existing grid middleware into production quality services –LCG Project is a Deployment Grid – nevertheless we will need to do some development Specialization of existing Grid Middleware to the LHC environment – explicitly to the tiered architecture model Very close relations to Application Area Physics Data Management task AFSGDM

20 20 P.Kunszt LCGP 13.3.2002 Issues / Dangers Commonalities – solving the same problems again and again ; potential for duplication of effort +Think in Virtual Organisations +RTAGs, like Common Persistency Framework Security – i can see what you can’t see +EDG Security Group – see Dave Kelsey’s talk +SciDAC +Building Trust relationships Standardisation – bringing it all together and agree, agree, agree +OGSA +GGF Consensus – too many cooks spoil the broth +Making decisions in time +Keeping agreements, sticking to standards +Avoid Micromanagement


Download ppt "1 P.Kunszt LCGP 13.3.2002 Data Management on the GRID Peter Z. Kunszt CERN Database Group EU DataGrid – Data Management."

Similar presentations


Ads by Google