Magda – Manager for grid-based data Wensheng Deng Physics Applications Software group Brookhaven National Laboratory.

Slides:



Advertisements
Similar presentations
Peter Berrisford RAL – Data Management Group SRB Services.
Advertisements

Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Data Management for Physics Analysis in PHENIX (BNL, RHIC) Evaluation of Grid architecture components in PHENIX context Barbara Jacak, Roy Lacey, Saskia.
GridScape Ding Choon Hoong Grid Computing and Distributed Systems (GRIDS) Lab. The University of Melbourne Melbourne, Australia WW Grid.
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
Data Grid Web Services Chip Watson Jie Chen, Ying Chen, Bryan Hess, Walt Akers.
Grappa: Grid access portal for physics applications Shava Smallen Extreme! Computing Laboratory Department of Physics Indiana University.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Don Quijote Data Management for the ATLAS Automatic Production System Miguel Branco – CERN ATC
3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol.
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
Cosener’s House – 30 th Jan’031 LHCb Progress & Plans Nick Brook University of Bristol News & User Plans Technical Progress Review of deliverables.
Grid Status - PPDG / Magda / pacman Torre Wenaus BNL U.S. ATLAS Physics and Computing Advisory Panel Review Argonne National Laboratory Oct 30, 2001.
Introduction to U.S. ATLAS Facilities Rich Baker Brookhaven National Lab.
File and Object Replication in Data Grids Chin-Yi Tsai.
PPDG and ATLAS Particle Physics Data Grid Ed May - ANL ATLAS Software Week LBNL May 12, 2000.
- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
MAGDA Roger Jones UCL 16 th December RWL Jones, Lancaster University MAGDA  Main authors: Wensheng Deng, Torre Wenaus Wensheng DengTorre WenausWensheng.
NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
Author - Title- Date - n° 1 Partner Logo WP5 Summary Paris John Gordon WP5 6th March 2002.
Magda Distributed Data Manager Status Torre Wenaus BNL ATLAS Data Challenge Workshop Feb 1, 2002 CERN.
David Adams ATLAS ADA, ARDA and PPDG David Adams BNL June 28, 2004 PPDG Collaboration Meeting Williams Bay, Wisconsin.
Production Tools in ATLAS RWL Jones GridPP EB 24 th June 2003.
Magda status and related work in PPDG year 2 Torre Wenaus, BNL/CERN US ATLAS Core/Grid Software Workshop, BNL May 6-7, 2002 CERN.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
Michael Doherty RAL UK e-Science AHM 2-4 September 2003 SRB in Action.
January 26, 2003Eric Hjort HRMs in STAR Eric Hjort, LBNL (STAR/PPDG Collaborations)
Jens G Jensen RAL, EDG WP5 Storage Element Overview DataGrid Project Conference Heidelberg, 26 Sep-01 Oct 2003.
PHENIX and the data grid >400 collaborators 3 continents + Israel +Brazil 100’s of TB of data per year Complex data with multiple disparate physics goals.
ATLAS Magda Distributed Data Manager Torre Wenaus BNL PPDG Robust File Replication Meeting Jefferson Lab January 10, 2002.
Replica Management Kelly Clynes. Agenda Grid Computing Globus Toolkit What is Replica Management Replica Management in Globus Replica Management Catalog.
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
LCG Distributed Databases Deployment – Kickoff Workshop Dec Database Lookup Service Kuba Zajączkowski Chi-Wei Wang.
10 May 2001WP6 Testbed Meeting1 WP5 - Mass Storage Management Jean-Philippe Baud PDP/IT/CERN.
Magda Distributed Data Manager Prototype Torre Wenaus BNL September 2001.
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
Data Management The European DataGrid Project Team
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
The Storage Resource Broker and.
Grid Status - PPDG / Magda / pacman Torre Wenaus BNL DOE/NSF Review of US LHC Software and Computing Fermilab Nov 29, 2001.
Status of tests in the LCG 3D database testbed Eva Dafonte Pérez LCG Database Deployment and Persistency Workshop.
Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.
1 Application status F.Carminati 11 December 2001.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
10 March Andrey Grid Tools Working Prototype of Distributed Computing Infrastructure for Physics Analysis SUNY.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
Magda Distributed Data Manager Torre Wenaus BNL October 2001.
BNL Tier1 Report Worker nodes Tier 1: added 88 Dell R430 nodes
Moving the LHCb Monte Carlo production system to the GRID
CERN-Russia Collaboration in CASTOR Development
LCG Monte-Carlo Events Data Base: current status and plans
Stephen Burke, PPARC/RAL Jeff Templon, NIKHEF
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
A Web-Based Data Grid Chip Watson, Ian Bird, Jie Chen,
Production Manager Tools (New Architecture)
Presentation transcript:

Magda – Manager for grid-based data Wensheng Deng Physics Applications Software group Brookhaven National Laboratory

What is Magda? A distributed data manager prototype for the ATLAS experiment. A project affiliated with the Particle Physics Data Grid (PPDG). Uses Globus Toolkit wherever applicable. An end-to-end application layered over grid middleware. –gets thinner the more middleware we are able to use.

Why is it needed? People are distributed. Hence data is distributed, computing power distributed. People build networks, to extend their capability. Experiment needs to know what data they have, and where these data are. Experiment needs to send data to where computing power is available. Hence cataloging and data moving activities – that is the motivation of making Magda. Users need convenient data lookup and retrieval!

How do we look at our data? Data is distributed, so storage facilities are distributed. We use the word site to abstract storage facility. Data is usually organized into directories at a storage facility. We use location to denote directory. Storage facility is accessed from computers. We use host to represent a group of computers. From a host, one can access a set of sites. That is how Magda organizes data: site, location, host

Architecture & Schema MySQL database at the core of the system. The DB interaction done via perl, C++, java, and cgi (perl) scripts. Users interact with the system via web interface and command line. For data movement gridFTP, bbftp and scp are used wherever applicable. –adaptable to available protocols. Principal components:  File catalog with logical & physical file info and metadata. support for master/replica instances.  Site, location and host relational tables realize our model.  Logical files can optionally be organized into collections.  Replication operations organized into reusable tasks.

AFS disk site location Mass store site location NFS disk site location host MySQL magda_putfile A file spider crawls data stores to populate and validate catalogs. Catalog entry can be added or modified individually from the command line.

File replication task A task is defined by user specifying source collection and host, transfer tool, pull/push, destination host and location, and intermediate caches. The source collection can be a set of files with a particular user-defined key, or files from the same location. Besides pull/push, third party transfer is also supported. A task is reusable.

source location source cache destination location dest cache MySQL fileCollection transferStatus fileCatalog

Web interface Present catalog content. Query catalog information. Update configuration.

Command line tools magda_findfile –Search catalog for logical files and their instances, –Optionally shows only local instances. magda_getfile –Retrieve file via catalog lookup –Creates local soft link to disk instance, or a local copy –Usage count maintained in catalog to manage deletion magda_putfile –Archive files and register them in catalog magda_validate –Validate file instances by comparing size and md5sum.

acas001 acas002 acas003 acas055 /acas003.usatlas.bnl.gov/home/scratch USATLAS linux farm Magda site: usatlasfarm Local disks at linux farm nodes They are seen as a special storage site ‘farm’

Usage so far Distributed catalog for ATLAS –Catalog of ATLAS data at Alberta, CERN, Lyon, INFN (CNAF, Milan), FZK, IFIC, IHEP.su, itep.ru, NorduGrid, RAL, many US institutes. –Supported data stores: CERN castor, BNL HPSS, Lyon HPSS, RAL tape system, NERSC HPSS, disk, code repositories. –264K files in catalog with total size 65.5 TB as of tested to 1.5M files.

Usage so far (con’t) In stable operation since May Heavily used in Atlas DC0 and DC1. Catalog entries from 10 countries or region. Data replication tasks have transferred more than 6 TB data between BNL HPSS and CERN castor. Is a main component in US grid testbed production. Using Magda Phenix experiment replicates data from BNL to Stony Brook, and catalogs data at Stony Brook. It is being evaluated by others.

Current and near term work Implement Magda as an option of file catalog back end to the LCG POOL persistency framework. Data replication usage in non-BNL, non-CERN institutions. Application in Atlas DC. Under test in the EDG testbed. Continue evaluation/integration of middleware components (e.g. RLS).