Multi-Tiered Storage with Xrootd at ATLAS Western Tier 2 Andrew Hanushevsky Wei Yang SLAC National Accelerator Laboratory 1CHEP2012, New York2012-05-22.

Slides:

Advertisements

Similar presentations

Andrew Hanushevsky7-Feb Andrew Hanushevsky Stanford Linear Accelerator Center Produced under contract DE-AC03-76SF00515 between Stanford University.

Advertisements

EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.

What’s New: Windows Server 2012 R2 Tim Vander Kooi Systems Architect

Xrootd and clouds Doug Benjamin Duke University. Introduction Cloud computing is here to stay – likely more than just Hype (Gartner Research Hype Cycle.

Copy on Demand with Internal Xrootd Federation Wei Yang SLAC National Accelerator Laboratory Create Federated Data Stores for the LHC IN2P3-CC,

Automatic Storage Management The New Best Practice Steve Adams Ixora Rich Long Oracle Corporation Session id:

REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.

Denny Cherry Manager of Information Systems MVP, MCSA, MCDBA, MCTS, MCITP.

Scale-Out File Server Clusters Storage Spaces Virtualization and Resiliency Hyper-V Clusters SMB Shared JBOD Storage PowerShell & SCVMM 2012 R2 Management.

Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.

Outline Network related issues and thinking for FAX Cost among sites, who has problems Analytics of FAX meta data, what are the problems  The main object.

Simplify your Job – Automatic Storage Management Angelo Session id:

SQL Server 2008 & Solid State Drives Jon Reade SQL Server Consultant SQL Server 2008 MCITP, MCTS Co-founder SQLServerClub.com, SSC

Sponsored by: PASS Summit 2010 Preview Storage for the DBA Denny Cherry MVP, MCSA, MCDBA, MCTS, MCITP.

Experiences Deploying Xrootd at RAL Chris Brew (RAL)

CS 346 – Chapter 10 Mass storage –Advantages? –Disk features –Disk scheduling –Disk formatting –Managing swap space –RAID.

Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.

US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.

Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.

MCTS Guide to Microsoft Windows Vista Chapter 4 Managing Disks.

Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,

Xrootd, XrootdFS and BeStMan Wei Yang US ATALS Tier 3 meeting, ANL 1.

SLAC Experience on Bestman and Xrootd Storage Wei Yang Alex Sim US ATLAS Tier2/Tier3 meeting at Univ. of Chicago Aug 19-20,

D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.

Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.

Experience with the Thumper Wei Yang Stanford Linear Accelerator Center May 27-28, 2008 US ATLAS Tier 2/3 workshop University of Michigan, Ann Arbor.

Support in setting up a non-grid Atlas Tier 3 Doug Benjamin Duke University.

"1"1 Introduction to Managing Data " Describe problems associated with managing large numbers of disks " List requirements for easily managing large amounts.

Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April

Status & Plan of the Xrootd Federation Wei Yang 13/19/12 US ATLAS Computing Facility Meeting at 2012 OSG AHM, University of Nebraska, Lincoln.

Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.

USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.

SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.

Scientific Storage at FNAL Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015.

US ATLAS Western Tier 2 Status Report Wei Yang Nov. 30, 2007 US ATLAS Tier 2 and Tier 3 workshop at SLAC.

Tier-2 storage A hardware view. HEP Storage dCache –needs feed and care although setup is now easier. DPM –easier to deploy xrootd (as system) is also.

PERFORMANCE AND ANALYSIS WORKFLOW ISSUES US ATLAS Distributed Facility Workshop November 2012, Santa Cruz.

PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.

Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.

1 Xrootd-SRM Andy Hanushevsky, SLAC Alex Romosan, LBNL August, 2006.

A UK Computing Facility John Gordon RAL October ‘99HEPiX Fall ‘99 Data Size Event Rate 10 9 events/year Storage Requirements (real & simulated data)

D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, R. Brock,T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina,

Jérôme Jaussaud, Senior Product Manager

Database CNAF Barbara Martelli Rome, April 4 st 2006.

1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.

Dynamic staging to a CAF cluster Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE CAF / PROOF Workshop,

Bestman & Xrootd Storage System at SLAC Wei Yang Andy Hanushevsky Alex Sim Junmin Gu.

CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.

BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.

Introduce Caching Technologies using Xrootd Wei Yang 10/28/14ATLAS TIM 2014 Univ. Chicago1.

Data Analysis w ith PROOF, PQ2, Condor Data Analysis w ith PROOF, PQ2, Condor Neng Xu, Wen Guan, Sau Lan Wu University of Wisconsin-Madison 30-October-09.

© 2009 IBM Corporation Statements of IBM future plans and directions are provided for information purposes only. Plans and direction are subject to change.

Western Tier 2 Site at SLAC Wei Yang US ATLAS Tier 2 Workshop Harvard University August 17-18, 2006.

New Features of Xrootd SE Wei Yang US ATLAS Tier 2/Tier 3 meeting, University of Texas, Arlington,

STORAGE EXPERIENCES AT MWT2 (US ATLAS MIDWEST TIER2 CENTER) Aaron van Meerten University of Chicago Sarah Williams Indiana University OSG Storage Forum,

CERN Disk Storage Technology Choices LCG-France Meeting April 8 th 2005 CERN.ch.

BeStMan/DFS support in VDT OSG Site Administrators workshop Indianapolis August Tanya Levshina Fermilab.

Federating Data in the ALICE Experiment

Compute and Storage For the Farm at Jlab

a brief summary for users

Getting the Most out of Scientific Computing Resources

Getting the Most out of Scientific Computing Resources

Data Federation with Xrootd Wei Yang US ATLAS Computing Facility meeting Southern Methodist University, Oct 11-12, 2011.

Brookhaven National Laboratory Storage service Group Hironori Ito

Ákos Frohner EGEE'08 September 2008

The INFN Tier-1 Storage Implementation

Denny Cherry twitter.com/mrdenny

Large Scale Test of a storage solution based on an Industry Standard

DataOptimizer Transparent File Tiering for NetApp Storage Robert Graf

Presentation transcript:

Multi-Tiered Storage with Xrootd at ATLAS Western Tier 2 Andrew Hanushevsky Wei Yang SLAC National Accelerator Laboratory 1CHEP2012, New York

Why Divide Storage into Tiers? ALTAS production jobs stage input files to batch nodes, BUT analysis jobs read directly from Xrootd storage Need high performance storage to serve the random/sparse IO from analysis jobs Data becomes cold quickly Questions: Do we really need to keep on buying high performance storage for all data? Can we divide storage into tiers and federate them?

Data is cooling down … Age: Last access time in days Average over 17 identical Thumpers/Thors Over 30 days

Design a Two-Tier Storage for ATLAS? Storage cluster configured toward maximing capacity Storage cluster with high performance configuration All tier entrance sees all storage: SRM & Data management Data stage-in by production jobs GridFTP data export Top tier entrance: GridFTP data import (over WAN) Direct reading by analysis jobs All job outputs Internal data flow

Design details Uniform name space Data move between storage tiers automatically – Top moves cold data to back when space is needed – Reading at top triggers data stage-in from bottom Data movement is driven by the top tier – transparent to jobs, use Xrootd/FRM Validate checksum for all data movement

File Residency Manager FRM glues the two storage tiers together – It is Xrootd’s generic virtual remote storage interface. – What it does? Stage, Migration, Purge Stage: frm.xfr.copycmd in your_stage-in_script $LFN other_args Migration: frm.xfr.copycmd out your_migrate_script $LFN other_args Purge: frm.purge.policy * 1024g 2048g hold 48h polprog frm.purge.polprog pfn | your_purge_script other_args “stage” will put open() on hold until file is staged Simple to use. – Scripts. You have full control – Scripts’ return code and stdout will be checked Checking status: frm_admin –c config_file query all lfn qwt

Implementation at SLAC Top Tier storage – 17 Thumpers/Thors with Solaris/ZFS – 1TB /2 spindles, raid 1 (mirror), 7200rpm SATA – Total 748 spindling disks, ~340TB usable – Run xrootd/cmsd/FRM Bottom Tier storage – 5 hosts, each 6x 30-disk raid hot spare – 1 host, 6x 15-disk raid hot spare – 2TB/spindle, 5400rpm SATA – Total 990 spindling disks, ~1.6PB usable – Run xrootd/cmsd

Activity at Bottom Tier Activity at two time spots 10 minutes apart: 15:50 16: GB 1.0 GB Data out of the bottom tier Data to the bottom tier

Cost (delay) of Stage-in ~400 running jobs

Other benefits of tiered storage Outage of a node in bottom tier has very little impact to the operation Allow scheduling top tier node outage without down time – Need to vacation the data, could be days Data constantly spread across all top tier nodes. – Natural avoids hot spots. – Bottom tier doesn’t have to have more space

Operational Experience Moving small files is inefficient – This is the case for all storage and data transfers – Xrootd’s look-up delay for new files is expensive for small file migration from top to bottom. Bypass back tier redirector to improve performance Resource intensive operation when vacating a storage node. – Multiple steps, need to be careful

Future Developments Build 3 tier storage system – Top SSD (an array of 12 SSD’s is ready) We know we can get 48Gbps with 250K IOPS – Middle7,200rpm HD’s First stop for new data – Bottom5,400rpm HD’s Trick will be to properly manage the SSD’s – We suspect that as SSD’s become cheaper 2-tier setup will be more appealing