Tier-2 storage A hardware view. HEP Storage dCache –needs feed and care although setup is now easier. DPM –easier to deploy xrootd (as system) is also.

Slides:



Advertisements
Similar presentations
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft File Systems for your Cluster Selecting a storage solution for tier 2 Suggestions and experiences.
Advertisements

Storage Workshop Summary Wahid Bhimji University Of Edinburgh On behalf all of the participants…
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Jos van Wezel Doris Ressmann GridKa, Karlsruhe TSM as tape storage backend for disk pool managers.
ATLAS computing in Geneva Szymon Gadomski, NDGF meeting, September 2009 S. Gadomski, ”ATLAS computing in Geneva", NDGF, Sept 091 the Geneva ATLAS Tier-3.
NWfs A ubiquitous, scalable content management system with grid enabled cross site data replication and active storage. R. Scott Studham.
Storage Task Force Intermediate pre report. History GridKa Technical advisory board needs storage numbers: Assemble a team of experts. 04/05 At HEPiX.
Reliable PVFS. High Performance I/O ? Three Categories of applications demand good I/O performance  Database management systems (DBMSs) Reading or writing.
© 2009 IBM Corporation Statements of IBM future plans and directions are provided for information purposes only. Plans and direction are subject to change.
Outline Network related issues and thinking for FAX Cost among sites, who has problems Analytics of FAX meta data, what are the problems  The main object.
March 27, IndiaCMS Meeting, Delhi1 T2_IN_TIFR of all-of-us, for all-of-us, by some-of-us Tier-2 Status Report.
CERN IT Department CH-1211 Genève 23 Switzerland t Next generation of virtual infrastructure with Hyper-V Michal Kwiatek, Juraj Sucik, Rafal.
1 INDIACMS-TIFR TIER-2 Grid Status Report IndiaCMS Meeting, Sep 27-28, 2007 Delhi University, India.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Data oriented job submission scheme for the PHENIX user analysis in CCJ Tomoaki Nakamura, Hideto En’yo, Takashi Ichihara, Yasushi Watanabe and Satoshi.
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team.
GridKa SC4 Tier2 Workshop – Sep , Warsaw Tier2 Site.
Multi-Tiered Storage with Xrootd at ATLAS Western Tier 2 Andrew Hanushevsky Wei Yang SLAC National Accelerator Laboratory 1CHEP2012, New York
 Optimization and usage of D3PD Ilija Vukotic CAF - PAF 19 April 2011 Lyon.
Data management for ATLAS, ALICE and VOCE in the Czech Republic L.Fiala, J. Chudoba, J. Kosina, J. Krasova, M. Lokajicek, J. Svec, J. Kmunicek, D. Kouril,
SLAC Experience on Bestman and Xrootd Storage Wei Yang Alex Sim US ATLAS Tier2/Tier3 meeting at Univ. of Chicago Aug 19-20,
Grid Lab About the need of 3 Tier storage 5/22/121CHEP 2012, The need of 3 Tier storage Dmitri Ozerov Patrick Fuhrmann CHEP 2012, NYC, May 22, 2012 Grid.
Experience with the Thumper Wei Yang Stanford Linear Accelerator Center May 27-28, 2008 US ATLAS Tier 2/3 workshop University of Michigan, Ann Arbor.
KLOE Computing Update Paolo Santangelo INFN LNF KLOE General Meeting University of Rome 2, Tor Vergata 2002, December
Tier1 Hardware Review Martin Bly HEPSysMan - RAL, June 2013.
Performance tests of storage arrays Irina Makhlyueva ALICE DAQ group 20 September 2004.
Optimisation of Grid Enabled Storage at Small Sites Jamie K. Ferguson University of Glasgow – Jamie K. Ferguson – University.
LFC Replication Tests LCG 3D Workshop Barbara Martelli.
Factors affecting ANALY_MWT2 performance MWT2 team August 28, 2012.
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Implementation of a reliable and expandable on-line storage for compute clusters Jos van Wezel.
The KLOE computing environment Nuclear Science Symposium Portland, Oregon, USA 20 October 2003 M. Moulson – INFN/Frascati for the KLOE Collaboration.
ONStor Pantera 3110 ONStor NAS. Copyright 2008 · ONStor Confidential Pantera 3110 – An Integrated Channel only NAS  Integrated standalone NAS system.
PROOF Farm preparation for Atlas FDR-1 Wensheng Deng, Tadashi Maeno, Sergey Panitkin, Robert Petkus, Ofer Rind, Torre Wenaus, Shuwei Ye BNL.
1EMC CONFIDENTIAL—INTERNAL USE ONLY FAST VP and Exchange Server 2010 Don Turner Consultant Systems Integration Engineer Microsoft TPM.
OSG Abhishek Rana Frank Würthwein UCSD.
Scientific Storage at FNAL Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
US ATLAS Western Tier 2 Status Report Wei Yang Nov. 30, 2007 US ATLAS Tier 2 and Tier 3 workshop at SLAC.
KIT – University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association STEINBUCH CENTRE FOR COMPUTING - SCC
PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.
SATA In Enterprise Storage Ron Engelbrecht Vice President and General Manager Engineering and Manufacturing Operations September 21, 2004.
David Stickland CMS Core Software and Computing
Filename\location The Marketing Numbers VA7410VA7400VA K cache IOPs 28K cache IOPs 12K cache IOPs 11,000 IOPs8,500 IOPs3,400 IOPs 330MB/s180MB/s93MB/s.
01. December 2004Bernd Panzer-Steindel, CERN/IT1 Tape Storage Issues Bernd Panzer-Steindel LCG Fabric Area Manager CERN/IT.
Database CNAF Barbara Martelli Rome, April 4 st 2006.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Pathway to Petaflops A vendor contribution Philippe Trautmann Business Development Manager HPC & Grid Global Education, Government & Healthcare.
1 5/4/05 Fermilab Mass Storage Enstore, dCache and SRM Michael Zalokar Fermilab.
9/22/10 OSG Storage Forum 1 CMS Florida T2 Storage Status Bockjoo Kim for the CMS Florida T2.
Improving Performance using the LINUX IO Scheduler Shaun de Witt STFC ISGC2016.
Cloud computing and federated storage Doug Benjamin Duke University.
StoRM + Lustre Proposal YAN Tian On behalf of Distributed Computing Group
RAL Plans for SC2 Andrew Sansum Service Challenge Meeting 24 February 2005.
Centre de Calcul de l’Institut National de Physique Nucléaire et de Physique des Particules Data storage services at CC-IN2P3 Jean-Yves Nief.
November 28, 2007 Dominique Boutigny – CC-IN2P3 CC-IN2P3 Update Status.
STORAGE EXPERIENCES AT MWT2 (US ATLAS MIDWEST TIER2 CENTER) Aaron van Meerten University of Chicago Sarah Williams Indiana University OSG Storage Forum,
CERN Disk Storage Technology Choices LCG-France Meeting April 8 th 2005 CERN.ch.
BeStMan/DFS support in VDT OSG Site Administrators workshop Indianapolis August Tanya Levshina Fermilab.
Experience of Lustre at QMUL
The Beijing Tier 2: status and plans
Ignacio Cano, Srinivas Aiyar, Arvind Krishnamurthy
Solid State Disks Testing with PROOF
dCache “Intro” a layperson perspective Frank Würthwein UCSD
Experiences with http/WebDAV protocols for data access in high throughput computing
Southwest Tier 2 Center Status Report
Bernd Panzer-Steindel, CERN/IT
Experience of Lustre at a Tier-2 site
R. Graciani for LHCb Mumbay, Feb 2006
The LHCb Computing Data Challenge DC06
Presentation transcript:

Tier-2 storage A hardware view

HEP Storage dCache –needs feed and care although setup is now easier. DPM –easier to deploy xrootd (as system) is also in the picture but no SRM dCache and DPM use DB for metadata (single point of failure) scalability is for T2 not much of an issue –although this depends on the access pattern –any Analysis experience? FileSystems –Mostly XFS but has its flaws –Many look at ZFS. –Gridka uses GPFS –ext4 > 16 TB fs, has extents. (still development)

Disk arrangements For CMS in 2008 all T2 –19.3 MSI2k (~ 800kSI2k / avg T2) –4.9 PB (~200 TB / avg T2) RAID groups of 8 data disks → 750 GB/disk = 340 disks in 34 RAID6 groups (34 * 8 * 50 = IOs / s) 800kSI2k / 2kSI2k/core → 400 cores Available are 13600/400 = 34 IOs/core/s Writes reduce this by 50% → 17 IOs/core/s 50 MB/s / 17 IOs/core/s → 3 MB / IO / core 1 – 3 MB/s / core → 1200 MB / s → ~ 24 data servers Given the above number of 34 RAID groups use 34 data servers assume 50 MB/s per server, although today dCache tops out at around 30 MB/s per java virtual machine

Disk thumb rules from different cores: random access (although you have large > 2 GB files!) avg access of SATA is ~ 15 ms: ~ 50 IOs / s avg access of FC/SAS disks ~ 5 ms: 150 IOs / s SATA RW mix (buffers!): 1 write + 20 read accesses. End of story. SATA reliability is OK. Expect: 800 euro / TB (incl. system) RAID6 is suggested and the need for proper support (hot swap, alerts, failover) experience != experience: see summary on (hepix/hepix) calculate some servers that need to be HA (>3000 euro)

Disk configurations Storage in a box (NAS) –16 to 48 disks with server nodes in a case –popular example: SUN Thumper 48 disk, dual opteron. DAS: storage and server separate –required IO rates do not apply for big servers –but the random access applies for many servers –may use some compute nodes to do the work which would need SAS or FC to the storage Resilient dCache –probably good for “read-mostly” From earlier core to disk estimate –need 20 big NAS boxes –could be done with 4 servers but not with 4 links –

Use Cases thanks to Thomas Kress for the input MC and Pile-Up –mostly CPU bound. Events are merged to large files before transfer to T1 via output buffer –12 MB/s write and read streams on the buffer? suggestion: 1 write stream for 20 read streams –PileUp sample is GB random access by how many cores? suggestion: spread over many raid groups Calibration –storage area of 400 GB read only? random or stream? suggestion: max of 50 cores per disk (group) Analysis – TB per month with random access!! –avg flow of ~80 MB/s T1 to T2 ! ( files) following above 1:20 this means a system that sustains 1600 MB/s read ? “ein großer Teil der Daten eine längere Zeit vorhanden bleibt” (TK)

ToDo Analyze access patterns Simulate data/disk loss Iterate results Join forces for HW procurement