Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.

Slides:

Advertisements

Similar presentations

OptorSim: A Replica Optimisation Simulator for the EU DataGrid W. H. Bell, D. G. Cameron, R. Carvajal, A. P. Millar, C.Nicholson, K. Stockinger, F. Zini.

Advertisements

Dynamic Grid Optimisation TERENA Conference, Lijmerick 5/6/02 A. P. Millar University of Glasgow.

WP2 and GridPP UK Simulation W. H. Bell University of Glasgow EDG – WP2.

Data Management Expert Panel - WP2. WP2 Overview.

1 User Analysis Workgroup Update  All four experiments gave input by mid December  ALICE by document and links  Very independent.

IoP HEPP 2004 Birmingham, 7/4/04 David Cameron, University of Glasgow 1 Simulation of Replica Optimisation Strategies for Data.

T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.

Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.

1 Data Storage MICE DAQ Workshop 10 th February 2006 Malcolm Ellis & Paul Kyberd.

On Fairness, Optimizing Replica Selection in Data Grids Husni Hamad E. AL-Mistarihi and Chan Huah Yong IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,

Stuart K. PatersonCHEP 2006 (13 th –17 th February 2006) Mumbai, India 1 from DIRAC.Client.Dirac import * dirac = Dirac() job = Job() job.setApplication('DaVinci',

GRID COMPUTING: REPLICATION CONCEPTS Presented By: Payal Patel.

Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.

Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.

ALICE DATA ACCESS MODEL Outline ALICE data access model - PtP Network Workshop 2  ALICE data model  Some figures.

LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.

Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.

December 17th 2008RAL PPD Computing Christmas Lectures 11 ATLAS Distributed Computing Stephen Burke RAL.

Section 15.1 Identify Webmastering tasks Identify Web server maintenance techniques Describe the importance of backups Section 15.2 Identify guidelines.

ALICE data access WLCG data WG revival 4 October 2013.

Grid Data Management A network of computers forming prototype grids currently operate across Britain and the rest of the world, working on the data challenges.

Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.

Tiziana FerrariNetwork metrics usage for optimization of the Grid1 DataGrid Project Work Package 7 Written by Tiziana Ferrari Presented by Richard Hughes-Jones.

The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.

« Performance of Compressed Inverted List Caching in Search Engines » Proceedings of the International World Wide Web Conference Commitee, Beijing 2008)

A Dynamic Data Grid Replication Strategy to Minimize the Data Missed Ming Lei, Susan Vrbsky, Xiaoyan Hong University of Alabama.

EFFECTIVE LOAD-BALANCING VIA MIGRATION AND REPLICATION IN SPATIAL GRIDS ANIRBAN MONDAL KAZUO GODA MASARU KITSUREGAWA INSTITUTE OF INDUSTRIAL SCIENCE UNIVERSITY.

Grid infrastructure analysis with a simple flow model Andrey Demichev, Alexander Kryukov, Lev Shamardin, Grigory Shpiz Scobeltsyn Institute of Nuclear.

Grid Lab About the need of 3 Tier storage 5/22/121CHEP 2012, The need of 3 Tier storage Dmitri Ozerov Patrick Fuhrmann CHEP 2012, NYC, May 22, 2012 Grid.

Ajou University, South Korea GCC 2003 Presentation Dynamic Data Grid Replication Strategy based on Internet Hierarchy Sang Min Park , Jai-Hoon Kim, and.

November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.

1 ACTIVE FAULT TOLERANT SYSTEM for OPEN DISTRIBUTED COMPUTING (Autonomic and Trusted Computing 2006) Giray Kömürcü.

Data Replication and Power Consumption in Data Grids Susan V. Vrbsky, Ming Lei, Karl Smith and Jeff Byrd Department of Computer Science The University.

Update on replica management

Replica Management Services in the European DataGrid Project Work Package 2 European DataGrid.

Caitriana Nicholson, CHEP 2006, Mumbai Caitriana Nicholson University of Glasgow Grid Data Management: Simulations of LCG 2008.

T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.

Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.

CMS Computing Model Simulation Stephen Gowdy/FNAL 30th April 2015CMS Computing Model Simulation1.

Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,

EGI-InSPIRE EGI-InSPIRE RI DDM solutions for disk space resource optimization Fernando H. Barreiro Megino (CERN-IT Experiment Support)

LCG CERN David Foster LCG WP4 Meeting 20 th June 2002 LCG Project Status WP4 Meeting Presentation David Foster IT/LCG 20 June 2002.

The ATLAS Computing Model and USATLAS Tier-2/Tier-3 Meeting Shawn McKee University of Michigan Joint Techs, FNAL July 16 th, 2007.

Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.

Data Management The European DataGrid Project Team

EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Data Management Highlights in TSA3.3 Services for HEP Fernando Barreiro Megino,

ATLAS Distributed Computing perspectives for Run-2 Simone Campana CERN-IT/SDC on behalf of ADC.

Predrag Buncic CERN ALICE Status Report LHCC Referee Meeting 01/12/2015.

Distributed Data Access Control Mechanisms and the SRM Peter Kunszt Manager Swiss Grid Initiative Swiss National Supercomputing Centre CSCS GGF Grid Data.

Ian Bird WLCG Networking workshop CERN, 10 th February February 2014

1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.

Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.

Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,

CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.

Dynamic Data Placement: the ATLAS model Simone Campana (IT-SDC)

Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.

WMS baseline issues in Atlas Miguel Branco Alessandro De Salvo Outline  The Atlas Production System  WMS baseline issues in Atlas.

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES The Common Solutions Strategy of the Experiment Support group.

A Statistical Analysis of Job Performance on LCG Grid David Colling, Olivier van der Aa, Mona Aggarwal, Gidon Moont (Imperial College, London)

ScotGRID is the Scottish prototype Tier 2 Centre for LHCb and ATLAS computing resources. It uses a novel distributed architecture and cutting-edge technology,

Data Challenge with the Grid in ATLAS

LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.

Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group

Network Requirements Javier Orellana

Simulation use cases for T2 in ALICE

Page Replacement.

LHCb thinking on Regional Centres and Related activities (GRIDs)

The LHCb Computing Data Challenge DC06

L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher

Presentation transcript:

Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008

Outline Introduction Grid Replica Optimisation The OptorSim grid simulator OptorSim architecture Experimental setup Results Conclusions

Introduction Large Hadron Collider (LHC) at CERN will have raw data rate of ~15 PB/year LHC Computing Grid (LCG) for data storage and computing infrastructure 2008 will be first full year of LHC running Actual analysis behaviour still unknown  use simulation to investigate behaviour  investigate dynamic data replication

Grid Replica Optimisation Many variables determine overall grid performance –Impossible to reach one optimal solution! Possible to optimise variables which are part of grid middleware –Job scheduling, data management etc This talk considers data management only… …and dynamic replica optimisation in particular

Dynamic Replica Optimisation = optimisation of the placement of file replicas on grid sites… …in a dynamic, automated fashion

Design of a Replica Optimisation Service Centralised, hierarchical or distributed? Pull or push? Choosing a replication trigger –On file request? –On file popularity? Aim to achieve global optimisation as a result of local optimisation

OptorSim OptorSim is a grid simulator with a focus on data management Developed as part of European DataGrid Work Package 2 Based on EDG architecture Used to examine automated decisions about replica placement and deletion

Architecture Sites with Computing Element (CE) and/or Storage Element (SE) Replica Optimiser decides replications for its site Resource Broker schedules jobs Replica Catalogue maps logical to physical filenames Replica Manager controls and registers replications

Algorithms Job scheduling –Details not covered in this talk –“QueueAccessCost” scheduler used in these results Data replication –No replication –Simple replication:“always replicate, delete existing files if necessary” Least Recently Used (LRU) Least Frequently Used (LFU) –Economic model: “replicate only if profitable” Sites “buy” and “sell” files using auction mechanism Files deleted if less valuable than new file

Experimental Setup - Jobs & Files Job types based on computing models “Dataset” for each experiment ~1 year’s AOD (analysis data) 2GB files Placed at CERN and Tier-1s at start JobEvent Size (kB) Total no. of files Files per job alice-pp alice-hi atlas cms lhcb-small lhcb-big

Experimental Setup - Storage Resources CERN & Tier 1 site capacities from LCG Technical Design Report “Canonical” Tier 2 capacity of 197 TB each (18.8 PB / 95 sites) Define storage metric D = (average SE size) (total dataset size) Memory limitations -> scale down Tier 2 SE sizes to 500 GB –Allows file deletion to start quickly –Disadvantage of small D

Experimental Setup - Computing & Network Most (chaotic) analysis jobs run at Tier 2s –Tier 1s not given CE, except those running LHCb jobs –CERN Analysis Facility with CE of 7840 kSI2k –Tier 2s with averaged CE of 645 kSI2k each (61.3 MSI2k / 95 sites) Network based on NREN topologies –Sites connected to closest router –Default of 155 Mbps if published value not available

Network Topology

Parameters Job scheduler “QueueAccessCost” –Combines data location and queue information Sequential access pattern 1000 jobs per simulation Site policies set according to LCG Memorandum of Understanding

Evaluation Metrics Different grid users will have different criteria of evaluation Used in these summary results are: –Mean job time Average time taken for job to run, from scheduling to completion –Effective Network Usage (ENU) (File requests which use network resources) (Total number of file requests)

Results: Data Replication Performance of algorithms measured with varying D D varied by reducing dataset size 20-25% gain in mean job time as D approaches realistic value

Results: Data Replication ENU shows similar gain Allows clearer distinction between strategies

Results: Data Replication Number of jobs increased to 4000 Mean job time increases linearly Relative improvement as D increases will hold for higher numbers of jobs Realistic number of jobs is >O(10000)

Results: Site Policies Vary site policies: –All Job Types Sites accept jobs from any VO –One Job Type Sites accept jobs from one VO –Mixed default All Job Types is ~60% faster than One Job Type

Results: Site Policies All Job Types also give ~25% lower ENU than other policies Egalitarian approach benefits all grid users

Results: Access Patterns Sequential access likely for many physics applications Zipf-like access will also occur –Some files accessed frequently, many infrequently Replication gives performance gain of ~75% when Zipf access pattern used

Results: Access Patterns ENU also ~75% lower with Zipf access Any Zipf-like element makes replication highly desirable Size of efficiency gain depends on streaming model, etc

Conclusions OptorSim used to simulate LCG in 2008 Dynamic data replication reduces running time of simulated grid jobs: –20% reduction with sequential access –75% reduction with Zipf-like access –Similar reductions in network usage Little difference between replication strategies –Simpler LRU, LFU 20-30% faster than economic model Site policy which allows all experiments to share resources gives most effective grid use

The End

Backup Slides

Replica optimiser architecture Access Mediator (AM) - contacts replica optimisers to locate the cheapest copies of files and makes them available locally Storage Broker (SB) - manages files stored in SE, trying to maximise profit for the finite amount of storage space available P2P Mediator (P2PM) - establishes and maintains P2P communication between grid sites