1 P.Kunszt LCGP 13.3.2002 Data Management on the GRID Peter Z. Kunszt CERN Database Group EU DataGrid – Data Management.

Slides:



Advertisements
Similar presentations
WP2: Data Management Gavin McCance University of Glasgow November 5, 2001.
Advertisements

WP2: Data Management Gavin McCance University of Glasgow.
Jens G Jensen Atlas Petabyte store Supporting Multiple Interfaces to Mass Storage Providing Tape and Mass Storage to Diverse Scientific Communities.
HEPiX GFAL and LCG data management Jean-Philippe Baud CERN/IT/GD.
Peter Berrisford RAL – Data Management Group SRB Services.
Data Management Expert Panel - WP2. WP2 Overview.
Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
GGF Toronto Spitfire A Relational DB Service for the Grid Peter Z. Kunszt European DataGrid Data Management CERN Database Group.
The CrossGrid project Juha Alatalo Timo Koivusalo.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
Grids and Grid Technologies for Wide-Area Distributed Computing Mark Baker, Rajkumar Buyya and Domenico Laforenza.
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
Magda – Manager for grid-based data Wensheng Deng Physics Applications Software group Brookhaven National Laboratory.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
GridPP9 – 5 February 2004 – Data Management DataGrid is a project funded by the European Union GridPP is funded by PPARC WP2+5: Data and Storage Management.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
HEP Experiment Integration within GriPhyN/PPDG/iVDGL Rick Cavanaugh University of Florida DataTAG/WP4 Meeting 23 May, 2002.
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
INFSO-RI Enabling Grids for E-sciencE gLite Data Management Services - Overview Mike Mineter National e-Science Centre, Edinburgh.
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
DataGrid is a project funded by the European Union CHEP 2003 – March 2003 – Next Generation Data Mgmt... – n° 1 James Casey CERN
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
Miguel Branco CERN/University of Southampton Enabling provenance on large-scale e-Science applications.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Grid Status - PPDG / Magda / pacman Torre Wenaus BNL U.S. ATLAS Physics and Computing Advisory Panel Review Argonne National Laboratory Oct 30, 2001.
File and Object Replication in Data Grids Chin-Yi Tsai.
PPDG and ATLAS Particle Physics Data Grid Ed May - ANL ATLAS Software Week LBNL May 12, 2000.
Ruth Pordes, Fermilab CD, and A PPDG Coordinator Some Aspects of The Particle Physics Data Grid Collaboratory Pilot (PPDG) and The Grid Physics Network.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Data Grid projects in HENP R. Pordes, Fermilab Many HENP projects are working on the infrastructure for global distributed simulated data production, data.
D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
GriPhyN EAC Meeting (Jan. 7, 2002)Carl Kesselman1 University of Southern California GriPhyN External Advisory Committee Meeting Gainesville,
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
Middleware for Grid Computing and the relationship to Middleware at large ECE 1770 : Middleware Systems By: Sepehr (Sep) Seyedi Date: Thurs. January 23,
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
Replica Management Services in the European DataGrid Project Work Package 2 European DataGrid.
Owen SyngeTitle of TalkSlide 1 Storage Management Owen Synge – Developer, Packager, and first line support to System Administrators. Talks Scope –GridPP.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Computing Sciences Directorate, L B N L 1 CHEP 2003 Standards For Storage Resource Management BOF Co-Chair: Arie Shoshani * Co-Chair: Peter Kunszt ** *
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
The Particle Physics Data Grid Collaboratory Pilot Richard P. Mount For the PPDG Collaboration DOE SciDAC PI Meeting January 15, 2002.
PPDGLHC Computing ReviewNovember 15, 2000 PPDG The Particle Physics Data Grid Making today’s Grid software work for HENP experiments, Driving GRID science.
DGC Paris WP2 Summary of Discussions and Plans Peter Z. Kunszt And the WP2 team.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
7. Grid Computing Systems and Resource Management
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Magda Distributed Data Manager Prototype Torre Wenaus BNL September 2001.
Distributed Data Access Control Mechanisms and the SRM Peter Kunszt Manager Swiss Grid Initiative Swiss National Supercomputing Centre CSCS GGF Grid Data.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Grid Status - PPDG / Magda / pacman Torre Wenaus BNL DOE/NSF Review of US LHC Software and Computing Fermilab Nov 29, 2001.
Production Mode Data-Replication Framework in STAR using the HRM Grid CHEP ’04 Congress Centre Interlaken, Switzerland 27 th September – 1 st October Eric.
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
Current Globus Developments Jennifer Schopf, ANL.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
9/20/04Storage Resource Manager, Timur Perelmutov, Jon Bakken, Don Petravick, Fermilab 1 Storage Resource Manager Timur Perelmutov Jon Bakken Don Petravick.
Magda Distributed Data Manager Torre Wenaus BNL October 2001.
J Jensen / WP5 /RAL UCL 4/5 March 2004 GridPP / DataGrid wrap-up Mass Storage Management J Jensen
CERN IT-Storage Strategy Outlook Alberto Pace, Luca Mascetti, Julien Leduc
Gavin McCance University of Glasgow GridPP2 Workshop, UCL
GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European.
Grid Portal Services IeSE (the Integrated e-Science Environment)
WP7 objectives, achievements and plans
Gridifying the LHC Data
Presentation transcript:

1 P.Kunszt LCGP Data Management on the GRID Peter Z. Kunszt CERN Database Group EU DataGrid – Data Management

2 P.Kunszt LCGP Personal Information PhD in theoretical physics (Lattice QCD) at U of Bern ‘Builder of the SDSS Project’ – design and implementation work on the SDSS science archive SX both Objectivity and MS SQLServer CERN Database Group Activity task leader for Grid Data Management Management of WP2 (Data Management) of the EDG Project

3 P.Kunszt LCGP Scope of Data Management Data Transfer –Transport protocols Data Access –Remote I/O –Security / Policies Data Storage –Hierarchical Storage –Mass Storage Replication –Peer-to-Peer –Centralized –Distributed –Automatic Metadata management –Scalable –Distributed –Consistent Persistency –Grid-enabled databases and data stores –Independent of back-end implementation Optimisation –Data Access optimisation –Cost minimsation

4 P.Kunszt LCGP Vision of Grid Data Management Distributed Shared Data Storage Ubiquitous Data Access Transparent Data Transfer and Migration Consistency and Robustness Optimisation

5 P.Kunszt LCGP Vision of Grid Data Management GRID Distributed Shared Data Storage – Different architectures – Heterogenous data stores – Self-describing data and metadata

6 P.Kunszt LCGP Vision of Grid Data Management GRID Ubiquitous Data Access – Global Namespace – Transparent security control and enforcement – Access from anytime anywhere, physical data location irrelevant – Automatic Data Replication and Validation

7 P.Kunszt LCGP Vision of Grid Data Management GRID Transparent Data Transfer and Migration – Protocol negotiation and multiple protocol support – Management of data formats and database versions

8 P.Kunszt LCGP Vision of Grid Data Management GRID Consistency and Robustness – Replicated data is reasonably up-to-date – Reliable data transfer – Self-detecting and self-correcting mechanisms upon data corruption      X

9 P.Kunszt LCGP Vision of Grid Data Management GRID Optimisation – Customisation or self-adaptation to specific access patterns – Distributed Querying, Data Analysis and Data Mining ? ? ? !

10 P.Kunszt LCGP Grid Data Management Dependencies Performance Reliability Availability Usability Media Hardware Operating System Local File System Network Software Protocols Storage System

11 P.Kunszt LCGP Existing Middleware for Grid Data Management - Overview Globus –GridFTP –Replica Catalog –Replica Manager EU DataGrid –GDMP –Replica Catalog –Replica Manager –Spitfire Condor –NeST PPDG –Magda –JASMine –GDMP Griphyn/iVDGL –Virtual Data Toolkit Storage Resource Broker Storage Resource Manager ROOT –Alien Nimrod-G Not exhaustive

12 P.Kunszt LCGP Globus Data Management GridFTP –Fast, parallel file transfer –Towards self-optimising system –Work on reliable file transfer on top Replica Catalog – jointly with EDG WP2 –Configurable –Distributed, hierarchical –Scalable Replica Manager Security infrastructure

13 P.Kunszt LCGP European DataGrid WP2 GDMP – with PPDG –In production with CMS for Objectivity replication –Subscription-based replication –Scalable architecture Replica Catalog with Globus Replica Manager and Optimiser –Take Globus RM as core –Additional modules for pre- postprocessing of data Replica Selection in the WP2 Optimisation task –Simulator to test replica selection Spitfire –Unified front-end to databases –Suitable for Grid and Application Metadata

14 P.Kunszt LCGP WP2 Replica Manager Architecture Core API Optimisation API Replica Catalogue Metadata Catalogue

15 P.Kunszt LCGP Condor Data Management Condor Matchmaking –Find optimal resource Condor Network Storage (NeST) –Generic access to storage – abstract storage interface –Virtual Protocol Layer –User Management and Reservation Chirp –Minimum set of file access requests –Meta-management requests Condor Bypass

16 P.Kunszt LCGP PPDG / Griphyn Data Management Globus, Condor, SRB GDMP – with EDG Magda –To be used in ATLAS data challenges –Metadata catalog JASMine JLAB Asynchronous Storage Manager –Storage Management and Resource –Replica catalog based on MySQL, as Web Service –Replication service –File Server Griphyn Virtual Data System

17 P.Kunszt LCGP SRB, SRM SDSC Storage Resource Broker –Advanced resource techniques –Replica Catalog based on Oracle, catalog itself is being replicated using Oracle’s replication mechanism Storage Resource Manager (LBNL) –Interfaces to any Storage System –Joint functional definition with EDG, PPDG, Griphyn

18 P.Kunszt LCGP Reference Technologies P2P technology –Gnutella –Napster –Freenet –Oceanstore –CHORD –CAN –JXTA Search –Mojo Nation Database technology –Replication –Distributed heterogeneous databases –Query planning and optimization Storage –Unitree –DMF –HPSS –Castor, Enstore, Eurostore –SAM File Systems –AFS, Coda, Intermezzo –NFS –GPFS, CXFS, GFS, DFS, DAFS –SlashGrid

19 P.Kunszt LCGP Application to LCG Project Bridge the gap between immediate needs of experiments for production quality grid middleware and existing prototype middleware –Evolve existing grid middleware into production quality services –LCG Project is a Deployment Grid – nevertheless we will need to do some development Specialization of existing Grid Middleware to the LHC environment – explicitly to the tiered architecture model Very close relations to Application Area Physics Data Management task AFSGDM

20 P.Kunszt LCGP Issues / Dangers Commonalities – solving the same problems again and again ; potential for duplication of effort +Think in Virtual Organisations +RTAGs, like Common Persistency Framework Security – i can see what you can’t see +EDG Security Group – see Dave Kelsey’s talk +SciDAC +Building Trust relationships Standardisation – bringing it all together and agree, agree, agree +OGSA +GGF Consensus – too many cooks spoil the broth +Making decisions in time +Keeping agreements, sticking to standards +Avoid Micromanagement