Data Management GridPP and EDG Gavin McCance University of Glasgow May 9, 2002

Slides:



Advertisements
Similar presentations
Giggle: A Framework for Constructing Scalable Replica Location Services Ann Chervenak, Ewa Deelman, Ian Foster, Leanne Guy, Wolfgang Hoschekk, Adriana.
Advertisements

The Replica Location Service In wide area computing systems, it is often desirable to create copies (replicas) of data objects. Replication can be used.
Experiences of the Grid… Gavin McCance University of Glasgow NeSC Meeting, 24 October 2001.
Globus DataGrid Overview Bill Allcock, ANL GridPP Meeting 30 June 2003.
1 WP2: Data Management Paul Millar eScience All Hands Meeting September
WP2: Data Management Gavin McCance University of Glasgow November 5, 2001.
WP2: Data Management Gavin McCance University of Glasgow.
EU DataGrid TestBed 2 Component Review Paul Millar (University of Glasgow) (slides based on a presentation by Erwin Laure)
Dynamic Grid Optimisation TERENA Conference, Lijmerick 5/6/02 A. P. Millar University of Glasgow.
ATLAS/LHCb GANGA DEVELOPMENT Introduction Requirements Architecture and design Interfacing to the Grid Ganga prototyping A. Soroko (Oxford), K. Harrison.
WP2 and GridPP UK Simulation W. H. Bell University of Glasgow EDG – WP2.
Data Management Expert Panel - WP2. WP2 Overview.
Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
Author - Title- Date - n° 1 GDMP The European DataGrid Project Team
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
Andrew McNab - EDG Access Control - 14 Jan 2003 EU DataGrid security with GSI and Globus Andrew McNab University of Manchester
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance
DGC Paris Community Authorization Service (CAS) and EDG Presentation by the Globus CAS team & Peter Kunszt, WP2.
GRID DATA MANAGEMENT PILOT (GDMP) Asad Samar (Caltech) ACAT 2000, Fermilab October , 2000.
GGF Toronto Spitfire A Relational DB Service for the Grid Peter Z. Kunszt European DataGrid Data Management CERN Database Group.
Security Mechanisms The European DataGrid Project Team
GridPP9 – 5 February 2004 – Data Management DataGrid is a project funded by the European Union GridPP is funded by PPARC WP2+5: Data and Storage Management.
Andrew McNab - Manchester HEP - 26 June 2001 WG-H / Support status Packaging / RPM’s UK + EU DG CA’s central grid-users file grid “ping”
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
Don Quijote Data Management for the ATLAS Automatic Production System Miguel Branco – CERN ATC
Globus Data Replication Services Ann Chervenak, Robert Schuler USC Information Sciences Institute.
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
MAGDA Roger Jones UCL 16 th December RWL Jones, Lancaster University MAGDA  Main authors: Wensheng Deng, Torre Wenaus Wensheng DengTorre WenausWensheng.
1 WP2: Data Management Gavin McCance RAL Middleware Workshop 24 February 2003.
Tony Doyle & Gavin McCance - University of Glasgow ATLAS MetaData AMI and Spitfire: Starting Point.
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
Author - Title- Date - n° 1 Partner Logo WP5 Summary Paris John Gordon WP5 6th March 2002.
First attempt for validating/testing Testbed 1 Globus and middleware services WP6 Meeting, December 2001 Flavia Donno, Marco Serra for IT and WPs.
09/02 ID099-1 September 9, 2002Grid Technology Panel Patrick Dreher Technical Panel Discussion: Progress in Developing a Web Services Data Analysis Grid.
Grid Glasgow Outline LHC Computing at a Glance Glasgow Starting Point LHC Computing Challenge CPU Intensive Applications Timeline ScotGRID.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Replica Management Services in the European DataGrid Project Work Package 2 European DataGrid.
Caitriana Nicholson, CHEP 2006, Mumbai Caitriana Nicholson University of Glasgow Grid Data Management: Simulations of LCG 2008.
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
Stephen Burke – Data Management - 3/9/02 Partner Logo Data Management Stephen Burke, PPARC/RAL Jeff Templon, NIKHEF.
EGEE User Forum Data Management session Development of gLite Web Service Based Security Components for the ATLAS Metadata Interface Thomas Doherty GridPP.
DGC Paris WP2 Summary of Discussions and Plans Peter Z. Kunszt And the WP2 team.
Jens G Jensen RAL, EDG WP5 Storage Element Overview DataGrid Project Conference Heidelberg, 26 Sep-01 Oct 2003.
Grid Glasgow Outline LHC Computing at a Glance Glasgow Starting Point LHC Computing Challenge CPU Intensive Applications Timeline ScotGRID.
Rights Management in Globus Data Services Ann Chervenak, ISI/USC Bill Allcock, ANL/UC.
10 May 2001WP6 Testbed Meeting1 WP5 - Mass Storage Management Jean-Philippe Baud PDP/IT/CERN.
Magda Distributed Data Manager Prototype Torre Wenaus BNL September 2001.
Data Management The European DataGrid Project Team
Author - Title- Date - n° 1 Partner Logo WP5 Status John Gordon Budapest September 2002.
DGC Paris Spitfire A Relational DB Service for the Grid Leanne Guy Peter Z. Kunszt Gavin McCance William Bell European DataGrid Data Management.
1 P.Kunszt C EU DataGrid Data Management Workpackage : WP2 Status and Plans Peter Z Kunszt IT/DB
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Site Authorization Service Local Resource Authorization Service (VOX Project) Vijay Sekhri Tanya Levshina Fermilab.
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
Current Globus Developments Jennifer Schopf, ANL.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center.
WP2: Data Management Gavin McCance University of Glasgow.
ATLAS DDM Developing a Data Management System for the ATLAS Experiment September 20, 2005 Miguel Branco
Federating Data in the ALICE Experiment
Gavin McCance University of Glasgow GridPP2 Workshop, UCL
Spitfire Overview Gavin McCance.
Sergio Fantinel, INFN LNL/PD
A Replica Location Service
Data Management in Release 2
Patrick Dreher Research Scientist & Associate Director
Grid Data Replication Kurt Stockinger Scientific Data Management Group Lawrence Berkeley National Laboratory.
Presentation transcript:

Data Management GridPP and EDG Gavin McCance University of Glasgow May 9,

GridPP, 9 May, 2002Gavin McCance1/32 Overview Status of data management work Products delivered to 1.2 GDMP 3.0 Reptor: replica manager Spitfire Optor: grid simulation What’s currently available and future plans

GridPP, 9 May, 2002Gavin McCance2/32 WP2: Data Management Replication Replica catalogue Replica manager Query Optimisation* Grid replica optimisation Meta-data management* Secure, transparent access to meta-data Service discovery *Direct UK involvement Work is done within the EDG WP2 team (based in CERN)

GridPP, 9 May, 2002Gavin McCance3/32 General Status Deliverables on target Major software released for 1.2 UK manpower based at Glasgow: 2.5 RAs, Me, Will Bell, Paul Millar (50%) 1 PhD student, David Cameron 1 more student to come in September

GridPP, 9 May, 2002Gavin McCance4/32 File Replication Requires: replica catalogue or replica location service Keeps track of the mapping between logical file name and physical file names Requires: replica manager or replica management service High level tool to actually do the replication and manage what files are being replicated File-1 Paris Glasgow Chicago LFN

GridPP, 9 May, 2002Gavin McCance5/32 File Replication Current replication functionality provided by GDMP 3.0 – new for 1.2 release! Used for mirroring of storage elements Implements subscription based replication model with security, and updates the Globus replica catalogue

GridPP, 9 May, 2002Gavin McCance6/32 GDMP Site ‘B’ subscribes to site A’s files 2. ‘A’ produces new file – ‘B’ will be notified of this 3. ‘B’ then starts transfer of new files from ‘A’ 4. Replica catalogue at ‘B’ is updated to reflect new file replica. Site ASite B

GridPP, 9 May, 2002Gavin McCance7/32 GDMP 3.0 Changes w.r.t. 2.* : New security model – host certificates Server delegation, i.e. accounts on SE not necessarily required Client-only install possible Basic space management Stand-alone server option ‘unsubscribe’ option

GridPP, 9 May, 2002Gavin McCance8/32 GDMP 3.0 status Final version of GDMP released for 1.2 For future, GDMP will be absorbed into the Replica Manager Service which will offer richer functionality SRPM, RPM, tarball, User Guide, Quick Config for EDG SEs:

GridPP, 9 May, 2002Gavin McCance9/32 Replica Location Service Current Globus replica catalogue is LDAP based To be replaced with new ‘GIGGLE’ framework Replica Location Service Joint EDG WP2 / Globus / PPDG project Trade-offs: global consistency, space, query / update overhead, reliability

GridPP, 9 May, 2002Gavin McCance10/32 RLS model… Reliable local state Relaxed global consistency Soft state updates to global index nodes permits graceful behaviour in face of network problems Secure access Implemented as web service

GridPP, 9 May, 2002Gavin McCance11/32 LRC RLI LRC Storage Element Storage Element Storage Element Storage Element Storage Element Hierarchical indexing. The higher- level RLI contains pointers to lower-level RLIs or LRCs. RLI = Replica Location Index LRC = Local Replica Catalog

GridPP, 9 May, 2002Gavin McCance12/32 Scalable, reliable LFN Namespace partitioned among RLIs Redundant RLIs for reliability Lossy compression Higher level RLIs may lose accuracy about mappings

GridPP, 9 May, 2002Gavin McCance13/32 RLS status Currently Alpha for developers location-service/RLS.html location-service/RLS.html New version will be progressively integrated with other replication software. Testbed deployment in September release

GridPP, 9 May, 2002Gavin McCance14/32 Replica Management Service Web Service under development (Reptor) Will absorb GDMP functionality and extend it Will use the Replica Location Service Two facets Core Replica Management API Optimisation API

GridPP, 9 May, 2002Gavin McCance15/32 Core Reptor API Similar to GDMP API registerEntry copyFile copyAndRegisterFile replicateFile deleteFile listReplicas

GridPP, 9 May, 2002Gavin McCance16/32 Interactions with SE Defined file types: Physical file attributeFile type Masterpermanent secondary copypermanent, durable or volatile.

GridPP, 9 May, 2002Gavin McCance17/32 RMS Current Status Testbed can use GDMP for 1.2 Defined Reptor API currently wraps the Globus Replica Manager Will be developed progressively Full version on testbed in September Technical reports: management/publications.htmlhttp://cern.ch/grid-data- management/publications.html

GridPP, 9 May, 2002Gavin McCance18/32 Grid Query Optimisation Best place for a job? Joint WP1 / WP2 question… Approach: 2-Phase Optimisation: Phase 1: Find suitable CE for job execution given distribution of files it will access Phase 2: Re-optimise file access during job execution (due to dynamic nature of Grid, the resource status changes over time)

GridPP, 9 May, 2002Gavin McCance19/32 Optimisation API initFilePrefetch(LFN[], CE, protocol[], fraction) cancelFilePrefetch(LFN[], CE) getBestFile(LFN[], protocol[], fraction) getNetworkCosts(SE1, SE2, filesize, protocol) from WP7 getIOCosts(SE, PFN) from WP5

GridPP, 9 May, 2002Gavin McCance20/32 Grid Replica Optimisation Controlled intelligent replication to optimise grid over the longer term Collect getBestFile requests ‘Intelligence’ based on algorithms Test replication algorithms on data- centric grid simulator

GridPP, 9 May, 2002Gavin McCance21/32 Optor – replica optimiser simulation Simulate prototype Grid Input site policies and experiment data files. Introduce replication algorithm: Files are always replicated to the local storage. If necessary oldest files are deleted.

GridPP, 9 May, 2002Gavin McCance22/32 Optor first results Even a basic replication algorithm significantly reduces network traffic and program running times. New economics-based algorithms under investigation!

GridPP, 9 May, 2002Gavin McCance23/32 Meta-data Management Spitfire v1.1.0 delivered A grid enabled database service Grid enabled front end to any type of RDBMS Examples: Grid meta-data: replica catalogue, service registry Application meta-data: experimental data catalogues, calibration data

GridPP, 9 May, 2002Gavin McCance24/32 V1.1.0 XSQL Spitfire CURRENT (v1.1.0) is based on XSQL templates on the server, e.g. SELECT FILENAME FROM HFS_DATASET WHERE AND AND File URL =

GridPP, 9 May, 2002Gavin McCance25/32 V1.1.0 Spitfire client Any HTTP client – either your own app, or a web-browser form POST an HTML FORM to with parameters run=25555, trig=highlumi, stat=good The operation is made on the database, and the result send back to the client…

GridPP, 9 May, 2002Gavin McCance26/32 Security Mechanism Servlet Container SSLServletSocketFactory TrustManager Security Servlet Does user specify role? Map role to connection id Authorization Module HTTP + SSL Request + client certificate Yes Role Trusted CAs Is certificate signed by a trusted CA? No Has certificate been revoked? Revoked Certs repository Find default No Role repository Role ok? Connection mappings Translator Servlet RDBMS Request and connection ID Connection Pool

GridPP, 9 May, 2002Gavin McCance27/32 V1.1.0 V1.1.0 available for 1.2 release now! SRPM, RPM, tarball installation User / Admin / Quick Install guides

GridPP, 9 May, 2002Gavin McCance28/32 New spitfire client (dev) Users can use either this or v1.1.0 static (XSQL template based) functionality A database client API has been defined Will implement as grid service using standard web service technologies

GridPP, 9 May, 2002Gavin McCance29/32 Client side API to access remote database DB Admin Create(), Drop(), Alter() Table, Database DB Core functionality Insert(), Update(), Delete(), Select() DB Role admin Secure, role based authorisation DB Information Schema, Quotas, Disk space

GridPP, 9 May, 2002Gavin McCance30/32 Extra functionality To be developed.. Distributed querying Replication of meta-data Automated expiration and cleanup Discussions with UK DBTF and GGF Database Group

GridPP, 9 May, 2002Gavin McCance31/32 Service Index How do I find a specific grid service? E.g. replica location server, image database, information service XML Service description What, where, attributes, how to contact. Scalable architectures for querying this developed Service index web service W. Hoschek’s thesis and paper API developed

GridPP, 9 May, 2002Gavin McCance32/32 More Info More information available at…