Jun 29, 20101/25 Storage Evaluation on FG, FC, and GPCF Jun 29, 2010 Gabriele Garzoglio Computing Division, Fermilab Overview Introduction Lustre Evaluation:

Slides:

Advertisements

Similar presentations

Jun 29, 20111/13 Investigation of storage options for scientific computing on Grid and Cloud facilities Jun 29, 2011 Gabriele Garzoglio & Ted Hesselroth.

Advertisements

Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.

S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.

RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.

CERN IT Department CH-1211 Genève 23 Switzerland t Next generation of virtual infrastructure with Hyper-V Michal Kwiatek, Juraj Sucik, Rafal.

The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.

Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.

Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.

Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.

Report ： Zhen Ming Wu 2008 IEEE 9th Grid Computing Conference.

CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.

Investigation of Storage Systems for use in Grid Applications 1/20 Investigation of Storage Systems for use in Grid Applications ISGC 2012 Feb 27, 2012.

The SAMGrid Data Handling System Outline:  What Is SAMGrid?  Use Cases for SAMGrid in Run II Experiments  Current Operational Load  Stress Testing.

Mar 24, 20111/17 Investigation of storage options for scientific computing on Grid and Cloud facilities Mar 24, 2011 Keith Chadwick for Gabriele Garzoglio.

Grid Job and Information Management (JIM) for D0 and CDF Gabriele Garzoglio for the JIM Team.

03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.

CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team.

Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.

D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina, L.Lueking,

D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.

Grid Developers’ use of FermiCloud (to be integrated with master slides)

3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.

Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.

CHEP'07 September D0 data reprocessing on OSG Authors Andrew Baranovski (Fermilab) for B. Abbot, M. Diesburg, G. Garzoglio, T. Kurca, P. Mhashilkar.

1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.

Jan. 17, 2002DØRAM Proposal DØRACE Meeting, Jae Yu 1 Proposal for a DØ Remote Analysis Model (DØRAM) IntroductionIntroduction Remote Analysis Station ArchitectureRemote.

SAMGrid as a Stakeholder of FermiGrid Valeria Bartsch Computing Division Fermilab.

Wide Area Network Access to CMS Data Using the Lustre Filesystem J. L. Rodriguez †, P. Avery*, T. Brody †, D. Bourilkov *, Y.Fu *, B. Kim *, C. Prescott.

03/03/09USCMS T2 Workshop1 Future of storage: Lustre Dimitri Bourilkov, Yu Fu, Bockjoo Kim, Craig Prescott, Jorge L. Rodiguez, Yujun Wu.

São Paulo Regional Analysis Center SPRACE Status Report 22/Aug/2006 SPRACE Status Report 22/Aug/2006.

Investigation of Storage Systems for use in Grid Applications 1/20 Investigation of Storage Systems for use in Grid Applications ISGC 2012 Feb 27, 2012.

Dzero MC production on LCG How to live in two worlds (SAM and LCG)

Data reprocessing for DZero on the SAM-Grid Gabriele Garzoglio for the SAM-Grid Team Fermilab, Computing Division.

KIT – The cooperation of Forschungszentrum Karlsruhe GmbH und Universität Karlsruhe (TH) Hadoop on HEPiX storage test bed at FZK Artem Trunov.

GridPP11 Liverpool Sept04 SAMGrid GridPP11 Liverpool Sept 2004 Gavin Davies Imperial College London.

CERN Database Services for the LHC Computing Grid Maria Girone, CERN.

Arne Wiebalck -- VM Performance: I/O

Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.

 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.

Tier-2 storage A hardware view. HEP Storage dCache –needs feed and care although setup is now easier. DPM –easier to deploy xrootd (as system) is also.

First Look at the New NFSv4.1 Based dCache Art Kreymer, Stephan Lammel, Margaret Votava, and Michael Wang for the CD-REX Department CD Scientific Computing.

BlueArc IOZone Root Benchmark How well do VM clients perform vs. Bare Metal clients? Bare Metal Reads are (~10%) faster than VM Reads. Bare Metal Writes.

PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.

Experiments in Utility Computing: Hadoop and Condor Sameer Paranjpye Y! Web Search.

April 25, 2006Parag Mhashilkar, Fermilab1 Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week.

GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.

Database CNAF Barbara Martelli Rome, April 4 st 2006.

CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.

BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.

1 5/4/05 Fermilab Mass Storage Enstore, dCache and SRM Michael Zalokar Fermilab.

Nov 18, 20091/15 Grid Storage on FermiGrid and GPCF – Activity Discussion Grid Storage on FermiGrid and GPCF Activity Discussion Nov 18, 2009 Gabriele.

LIOProf: Exposing Lustre File System Behavior for I/O Middleware

10/18/01Linux Reconstruction Farms at Fermilab 1 Steven C. Timm--Fermilab.

StoRM + Lustre Proposal YAN Tian On behalf of Distributed Computing Group

Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 1 What do we want DØ Regional Analysis Centers (DØRAC) do? Why do we need a DØRAC? What do we want a DØRAC do?

CD FY10 Budget and Tactical Plan Review FY10 Tactical Plans for Scientific Computing Facilities / General Physics Computing Facility (GPCF) Stu Fuess 06-Oct-2009.

Compute and Storage For the Farm at Jlab

Experience of Lustre at QMUL

Solid State Disks Testing with PROOF

Diskpool and cloud storage benchmarks used in IT-DSS

2016 Citrix presentation.

Scalability to Hundreds of Clients in HEP Object Databases

Experience of Lustre at a Tier-2 site

CERN Lustre Evaluation and Storage Outlook

Ákos Frohner EGEE'08 September 2008

Grid Canada Testbed using HEP applications

Investigation of Storage Systems for use in Grid Applications

Overview Context Test Bed Lustre Evaluation Standard benchmarks

UM D0RACE STATION Status Report Chunhui Han June 20, 2002

Proposal for a DØ Remote Analysis Model (DØRAM)

Presentation transcript:

Jun 29, 20101/25 Storage Evaluation on FG, FC, and GPCF Jun 29, 2010 Gabriele Garzoglio Computing Division, Fermilab Overview Introduction Lustre Evaluation: Status -The test bed -Results from std benchmarks -Results from app-based benchmarks Metrics measurement for the RunII experiments Conclusions and future work

Jun 29, 20102/25 Storage Evaluation on FG, FC, and GPCF Context Goal –Evaluation of storage technologies for the use case of data intensive Grid jobs. Technologies considered –Hadoop Distributed File System (HDFS) –Lustre –Blue Arc (BA) Targeted infrastructures: –FermiGrid, FermiCloud, and the General Physics Computing Farm.

Jun 29, 20103/25 Storage Evaluation on FG, FC, and GPCF Collaboration REX was asked by Patty to do an evaluation of storage technologies looking into the future REX and DOCS have agreed to work together to reuse the infrastructure at FermiCloud allocated to the original evaluation project AND take advantage of REX contacts with the users Other partners in the evaluation include the FermiGrid / FermiCloud, OSG Storage, DMS, and FEF groups at Fermilab

Jun 29, 20104/25 Storage Evaluation on FG, FC, and GPCF Evaluation Method Set the scale: measure storage metrics from running experiments to set the scale on expected bandwidth, typical file size, number of clients, etc. (Done) Check sanity: run standard benchmarks on storage installations Measure performance: study response of the technology to real-life applications access patterns Fault tolerance: simulate faults and study reactions

Jun 29, 20105/25 Storage Evaluation on FG, FC, and GPCF Machine Specifications FermiCloud (FC) Server Machines: –Lustre 1.8.3: Striped across 3 OSS, 1 MB block –CPU: dual, quad core Xeon 2.67GHz with 12 MB cache, 24 GB RAM –Disk: 6 SATA disks in RAID 5 for 2 TB + 2 sys disks ( hdparm on raid  MB/sec ) –1 GB Eth + IB (not used yet) ITB Server Machines: –Lustre : Striped across 2 OSS, 1 MB block –CPU: dual, quad core Xeon 2.66GHz with 4 MB cache: 16 GB RAM –Disk: single 500 GB disk ( hdparm on disk  MB/sec ) DMS Server Machine: Lustre 1.6 –same specs as ITB ITB Client Machines –same specs as ITB

Jun 29, 20106/25 Storage Evaluation on FG, FC, and GPCF Lustre Evaluation Test Bed Phase 0 - ITB 0.4 TB 1 Disks eth Lustre: 2 OST & 1 MDT FG ITB Clients BA mount Dom0: - 8 CPU - 16 GB RAM Lustre Server NOTE: similar setup as DMS Lustre evaluation: - Same servers - 2 OST vs. 3 OSG for DMS.

Jun 29, 20107/25 Storage Evaluation on FG, FC, and GPCF Lustre Evaluation Test Bed Phase I 2 TB 6 Disks eth Lustre: 3 OST & 1 MDT FG ITB Clients BA mount Dom0: - 8 CPU - 24 GB RAM Lustre Server Current Configuration! NOTE: MDS Should be configured as RAID10 rather than RAID 5

Jun 29, 20108/25 Storage Evaluation on FG, FC, and GPCF Lustre Evaluation Test Bed Phase II 2 TB 6 Disks eth Lustre: 3 OST & 1 MDT FG ITB Clients BA mount Dom0: - 8 CPU - 24 GB RAM Lustre Server Luster Client VM 6 x NOTE: attempted this configuration with Phase 0 test bed; VMs could access Lustre. Clients problems with paravirtualizer.

Jun 29, 20109/25 Storage Evaluation on FG, FC, and GPCF Lustre Evaluation Test Bed Phase III 2 TB 6 Disks eth Lustre: 3 OST & 1 MDT FG ITB Clients BA mount Dom0: - 8 CPU - 24 GB RAM Luster Client VM 6 x Luster Server VM

Jun 29, /25 Storage Evaluation on FG, FC, and GPCF Standard Benchmarks Results - Summary Mdtest – Tests metadata rates from multiple clients. File/Directory Creation, Stat, Deletion. Setup: 48 clients on 6 VM/nodes. Result: Within 25% or faster of DMS results. IOZone – Writes (2GB) file from each client and performs read/write tests. Setup: 3-48 clients on 3 VM/nodes. Result: Write results match DMS report, ~70MB/sec ITB, ~85MB/sec DMS, ~110MB/sec FC. Read results vary based on memory/file-size due to caching effects Fileop – Iozone's metadata tests. Tests rates of mkdir, chdir, open, close, etc. Setup: 1 client. Result: Within 30% of DMS results on most tests.

Jun 29, /25 Storage Evaluation on FG, FC, and GPCF 48 clients on 6 VM on 6 different nodes

Jun 29, /25 Storage Evaluation on FG, FC, and GPCF Up to 48 clients on 3 VM on 3 different nodes Plots show aggregate bandwidth 1 Gbit / s Possible Network Bottleneck… KB / Sec DMS Evaluation

Jun 29, /25 Storage Evaluation on FG, FC, and GPCF Single client Opposite result to mdtests above…

Jun 29, /25 Storage Evaluation on FG, FC, and GPCF Response to real-life apps Lustre is pre-loaded with root data from Minos and Nova (mc) A job invokes the application to run on 1 file for all files sequentially Multiple clients are run simultaneously (1 or 9 or 21) We measure bandwidth to storage, file size, and file processing time.

Jun 29, /25 Storage Evaluation on FG, FC, and GPCF 21 Clients Minos application (loon) skimming Random access to 1400 files (interrupted) Access is CPU- bound (checked via top) Minos

Jun 29, /25 Storage Evaluation on FG, FC, and GPCF 9 Clients Minos application (loon) skimming Random access to 1400 files (interrupted) Access is CPU- bound Minos

Jun 29, /25 Storage Evaluation on FG, FC, and GPCF 21 Clients Minos application (loon) skimming Seq. access to 1400 files (interrupted) Access is CPU- bound Minos

Jun 29, /25 Storage Evaluation on FG, FC, and GPCF Nova 21 Clients Nova application (ana) skimming Rand access to 330 files: real 9m5.8s user 5m11.4s sys 1m28.4s

Jun 29, /25 Storage Evaluation on FG, FC, and GPCF Nova 9 Clients Nova application (ana) skimming Rand access to 330 files: real 8m54.1s user 5m11.9s sys 1m28.2s 1 Client Nova application (ana) skimming Rand access to 330 files

Jun 29, /25 Storage Evaluation on FG, FC, and GPCF DZero Sam File Access Metrics Metrics from the SAM database (for stations FNAL-CABSRV1 and FNAL- CABSRV2) Analysis of processes and files from all SAM projects ended on the week of Jan 25, 2010 Jobs run on the CAB cluster. They request file delivery to the SAM-Grid system through the two stations above. Upon request, each system delivers files to a disk cache local to the node where the job is running. Some files may be already in the local cache. Analysis code evolved from SAM Monitoring (by Robert Illingworth) For all jobs on the CAB cluster, the data shows these distributions –file size –time to "consume" the file –bandwidth from local disk required to read the file. –file delivery time –number of total and idle processes. Details at

Jun 29, /25 Storage Evaluation on FG, FC, and GPCF

Jun 29, /25 Storage Evaluation on FG, FC, and GPCF CDF Sam / dCache File Access Metrics Metrics from the SAM database and dCache billing log. Data from Jan 25, 2010 to Jan 28, The analysis associates event records according to this processing model: –A user analysis (client) requests a file to SAM –SAM returns a file location in dCache (file create time record is created in the SAM DB) –The client makes a transfer request to the dCache system (request record is logged in the dCache billing log) –When the file is available on disk, the client can copy it locally or stream it from dCache (at the end of this process, a transfer record is logged in the billing log) –The user analysis processes the file and requests a new one to SAM (the status of the file just processed is changed to consumed in the SAM DB and the update time for the file is created). This analysis is limited in scope by design Details at family.html

Jun 29, /25 Storage Evaluation on FG, FC, and GPCF

Jun 29, /25 Storage Evaluation on FG, FC, and GPCF Future work Evolve test bed –Investigate network bottlenecks between client and server –Install and run clients on FC; Configure MDS as RAID 10 –Improve machine monitoring (e.g. install ganglia, HP Lustre monitoring, …) –Test luster with NO stripping (easier recovery) Standard benchmarks –Run with different configuration e.g. more clients, O(1M) files, different number of OSS, … App-based benchmarks –Include more applications: Nova (reco, daq2root, pre-scaled skim), Dzero / CDF (via SAM station) –Evolve measurements (study wall vs. real vs. sys time; include CPU, mem, net usage, …) –Collaborate with HEPIX Storage Group Move to Phase II and III AND test fault tolerance Study Hadoop

Jun 29, /25 Storage Evaluation on FG, FC, and GPCF Conclusions We are about 3 months late according to the original plan: we were too optimistic in the delivery date of FC. Est. completion date: 1 st Q We have deployed a Lustre test bed and we are taking measurements Thank you for the tremendous response to our requests for help

Jun 29, /25 Storage Evaluation on FG, FC, and GPCF EXTRA SLIDES

Jun 29, /25 Storage Evaluation on FG, FC, and GPCF Storage evaluation metrics Metrics from Stu, Gabriele, and DMS (Lustre evaluation) Cost Data volume Data volatility (permanent, semi-permanent, temporary) Access modes (local, remote) Access patterns (random, sequential, batch, interactive, short, long, CPU intensive, I/O intensive) Number of simultaneous client processes Acceptable latencies requirements (e.g for batch vs. interactive) Required per-process I/O rates Required aggregate I/O rates File size requirements Reliability / redundancy / data integrity Need for tape storage, either hierarchical or backup Authentication (e.g. Kerberos, X509, UID/GID, AFS_token) / Authorization (e.g. Unix perm., ACLs) User & group quotas / allocation / auditing Namespace performance ("file system as catalog") Supported platforms and systems Usability: maintenance, troubleshooting, problem isolation Data storage functionality and scalability