Mar 24, 20111/17 Investigation of storage options for scientific computing on Grid and Cloud facilities Mar 24, 2011 Keith Chadwick for Gabriele Garzoglio.

Slides:



Advertisements
Similar presentations
Jun 29, 20111/13 Investigation of storage options for scientific computing on Grid and Cloud facilities Jun 29, 2011 Gabriele Garzoglio & Ted Hesselroth.
Advertisements

Current Testbed : 100 GE 2 sites (NERSC, ANL) with 3 nodes each. Each node with 4 x 10 GE NICs Measure various overheads from protocols and file sizes.
1 Magnetic Disks 1956: IBM (RAMAC) first disk drive 5 Mb – Mb/in $/year 9 Kb/sec 1980: SEAGATE first 5.25’’ disk drive 5 Mb – 1.96 Mb/in2 625.
1 stdchk : A Checkpoint Storage System for Desktop Grid Computing Matei Ripeanu – UBC Sudharshan S. Vazhkudai – ORNL Abdullah Gharaibeh – UBC The University.
1 Andrew Hanushevsky - HEPiX, October 6-8, 1999 Mass Storage For BaBar at SLAC Andrew Hanushevsky Stanford.
Idle virtual machine detection in FermiCloud Giovanni Franzini September 21, 2012 Scientific Computing Division Grid and Cloud Computing Department.
Xen and the Art of Virtualization. Introduction  Challenges to build virtual machines Performance isolation  Scheduling priority  Memory demand  Network.
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.
Frangipani: A Scalable Distributed File System C. A. Thekkath, T. Mann, and E. K. Lee Systems Research Center Digital Equipment Corporation.
Minerva Infrastructure Meeting – October 04, 2011.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Jun 29, 20101/25 Storage Evaluation on FG, FC, and GPCF Jun 29, 2010 Gabriele Garzoglio Computing Division, Fermilab Overview Introduction Lustre Evaluation:
Object-based Storage Long Liu Outline Why do we need object based storage? What is object based storage? How to take advantage of it? What's.
Report : Zhen Ming Wu 2008 IEEE 9th Grid Computing Conference.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
The Hadoop Distributed File System
CERN - IT Department CH-1211 Genève 23 Switzerland t The High Performance Archiver for the LHC Experiments Manuel Gonzalez Berges CERN, Geneva.
Investigation of Storage Systems for use in Grid Applications 1/20 Investigation of Storage Systems for use in Grid Applications ISGC 2012 Feb 27, 2012.
Storage Management in Virtualized Cloud Environments Sankaran Sivathanu, Ling Liu, Mei Yiduo and Xing Pu Student Workshop on Frontiers of Cloud Computing,
Grid Developers’ use of FermiCloud (to be integrated with master slides)
Virtualization within FermiGrid Keith Chadwick Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
SSS Test Results Scalability, Durability, Anomalies Todd Kordenbrock Technology Consultant Scalable Computing Division Sandia is a multiprogram.
Wide Area Network Access to CMS Data Using the Lustre Filesystem J. L. Rodriguez †, P. Avery*, T. Brody †, D. Bourilkov *, Y.Fu *, B. Kim *, C. Prescott.
03/03/09USCMS T2 Workshop1 Future of storage: Lustre Dimitri Bourilkov, Yu Fu, Bockjoo Kim, Craig Prescott, Jorge L. Rodiguez, Yujun Wu.
JLab Scientific Computing: Theory HPC & Experimental Physics Thomas Jefferson National Accelerator Facility Newport News, VA Sandy Philpott.
Investigation of Storage Systems for use in Grid Applications 1/20 Investigation of Storage Systems for use in Grid Applications ISGC 2012 Feb 27, 2012.
KIT – The cooperation of Forschungszentrum Karlsruhe GmbH und Universität Karlsruhe (TH) Hadoop on HEPiX storage test bed at FZK Artem Trunov.
U N I V E R S I T Y O F S O U T H F L O R I D A Hadoop Alternative The Hadoop Alternative Larry Moore 1, Zach Fadika 2, Dr. Madhusudhan Govindaraju 2 1.
Introduction to virtualization
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
Arne Wiebalck -- VM Performance: I/O
Ian Gable HEPiX Spring 2009, Umeå 1 VM CPU Benchmarking the HEPiX Way Manfred Alef, Ian Gable FZK Karlsruhe University of Victoria May 28, 2009.
A Silvio Pardi on behalf of the SuperB Collaboration a INFN-Napoli -Campus di M.S.Angelo Via Cinthia– 80126, Napoli, Italy CHEP12 – New York – USA – May.
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
First Look at the New NFSv4.1 Based dCache Art Kreymer, Stephan Lammel, Margaret Votava, and Michael Wang for the CD-REX Department CD Scientific Computing.
BlueArc IOZone Root Benchmark How well do VM clients perform vs. Bare Metal clients? Bare Metal Reads are (~10%) faster than VM Reads. Bare Metal Writes.
PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.
Nov 18, 20091/15 Grid Storage on FermiGrid and GPCF – Activity Discussion Grid Storage on FermiGrid and GPCF Activity Discussion Nov 18, 2009 Gabriele.
Next Generation of Apache Hadoop MapReduce Owen
Introduction to Exadata X5 and X6 New Features
IT-DSS Alberto Pace2 ? Detecting particles (experiments) Accelerating particle beams Large-scale computing (Analysis) Discovery We are here The mission.
Auxiliary services Web page Secrets repository RSV Nagios Monitoring Ganglia NIS server Syslog Forward FermiCloud: A private cloud to support Fermilab.
Authentication, Authorization, and Contextualization in FermiCloud S. Timm, D. Yocum, F. Lowe, K. Chadwick, G. Garzoglio, D. Strain, D. Dykstra, T. Hesselroth.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
FermiCloud Project: Integration, Development, Futures Gabriele Garzoglio Associate Head Grid & Cloud Computing Department Fermilab Work supported by the.
WP5 – Infrastructure Operations Test and Production Infrastructures StratusLab kick-off meeting June 2010, Orsay, France GRNET.
Hao Wu, Shangping Ren, Gabriele Garzoglio, Steven Timm, Gerard Bernabeu, Hyun Woo Kim, Keith Chadwick, Seo-Young Noh A Reference Model for Virtual Machine.
1© Copyright 2015 EMC Corporation. All rights reserved. NUMA(YEY) BY JACOB KUGLER.
G. Russo, D. Del Prete, S. Pardi Frascati, 2011 april 4th-7th The Naples' testbed for the SuperB computing model: first tests G. Russo, D. Del Prete, S.
St. Petersburg, 2016 Openstack Disk Storage vs Amazon Disk Storage Computing Clusters, Grids and Cloud Erasmus Mundus Master Program in PERCCOM Author:
Predrag Buncic CERN Data management in Run3. Roles of Tiers in Run 3 Predrag Buncic 2 ALICEALICE ALICE Offline Week, 01/04/2016 Reconstruction Calibration.
S. Pardi Computing R&D Workshop Ferrara 2011 – 4 – 7 July SuperB R&D on going on storage and data access R&D Storage Silvio Pardi
29/04/2008ALICE-FAIR Computing Meeting1 Resulting Figures of Performance Tests on I/O Intensive ALICE Analysis Jobs.
July 18, 2011S. Timm FermiCloud Enabling Scientific Computing with Integrated Private Cloud Infrastructures Steven Timm.
Lustre File System chris. Outlines  What is lustre  How does it works  Features  Performance.
Getting the Most out of Scientific Computing Resources
Answer to Summary Questions
Experience of Lustre at QMUL
Getting the Most out of Scientific Computing Resources
Solid State Disks Testing with PROOF
Diskpool and cloud storage benchmarks used in IT-DSS
Running virtualized Hadoop, does it make sense?
Experience of Lustre at a Tier-2 site
CERN Lustre Evaluation and Storage Outlook
Virtualization in the gLite Grid Middleware software process
Ákos Frohner EGEE'08 September 2008
Grid Canada Testbed using HEP applications
Investigation of Storage Systems for use in Grid Applications
Overview Context Test Bed Lustre Evaluation Standard benchmarks
Presentation transcript:

Mar 24, 20111/17 Investigation of storage options for scientific computing on Grid and Cloud facilities Mar 24, 2011 Keith Chadwick for Gabriele Garzoglio Computing Division, Fermilab Overview Context Test Bed Lustre Evaluation −Standard benchmarks −Application-based benchmark −HEPiX Storage Group report Current work (Hadoop Evaluation) Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359

Mar 24, 20112/17 Investigation of storage options for scientific computing on Grid and Cloud facilities Acknowledgements Ted Hesselroth, Doug Strain – IOZone Perf. measurements Andrei Maslennikov – HEPiX storage group Andrew Norman, Denis Perevalov – Nova framework for the storage benchmarks and HEPiX work Robert Hatcher, Art Kreymer – Minos framework for the storage benchmarks and HEPiX work Steve Timm, Neha Sharma – FermiCloud support Alex Kulyavtsev, Amitoj Singh – Consulting This work is supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359

Mar 24, 20113/17 Investigation of storage options for scientific computing on Grid and Cloud facilities Context Goal –Evaluation of storage technologies for the use case of data intensive jobs on Grid and Cloud facilities at Fermilab. Technologies considered –Lustre (DONE) –Hadoop Distributed File System (HDFS) (Ongoing) –Blue Arc (BA) (TODO) –Orange FS (new request) (TODO) Targeted infrastructures: –FermiGrid, FermiCloud, and the General Physics Computing Farm. Collaboration at Fermilab: –FermiGrid / FermiCloud, Open Science Grid Storage area, Data Movement and Storage, Running EXperiments

Mar 24, 20114/17 Investigation of storage options for scientific computing on Grid and Cloud facilities Evaluation Method Set the scale: measure storage metrics from running experiments to set the scale on expected bandwidth, typical file size, number of clients, etc. family.htmlhttp://home.fnal.gov/~garzogli/storage/cdf-sam-file-access-per-app- family.html Measure performance –run standard benchmarks on storage installations –study response of the technology to real-life applications access patterns (root-based) –use HEPiX storage group infrastructure to characterize response to IF applications Fault tolerance: simulate faults and study reactions Operations: comment on potential operational issues

Mar 24, 20115/17 Investigation of storage options for scientific computing on Grid and Cloud facilities Lustre Test Bed: FCL “Bare Metal” 2 TB 6 Disks eth FCL Lustre: 3 OST & 1 MDT FG ITB Clients (7 nodes - 21 VM) BA mount Dom0: - 8 CPU - 24 GB RAM Lustre Server - ITB clients vs. Lustre “Bare Metal” Lustre 1.8.3: set up with 3 OSS (different striping) CPU: dual, quad core Xeon 2.67GHz with 12 MB cache, 24 GB RAM Disk: 6 SATA disks in RAID 5 for 2 TB + 2 sys disks ( hdparm  MB/sec ) 1 GB Eth + IB cards CPU: dual, quad core Xeon 2.66GHz with 4 MB cache; 16 GB RAM. 3 Xen VM per machine; 2 cores / 2 GB RAM each.

Mar 24, 20116/17 Investigation of storage options for scientific computing on Grid and Cloud facilities Lustre Test Bed: FCL “Virtual Server” 2 TB 6 Disks eth FCL Lustre: 3 OST & 1 MDT FG ITB Clients (7 nodes - 21 VM) BA mount Dom0: - 8 CPU - 24 GB RAM Lustre Client VM 7 x Lustre Server VM - ITB clients vs. Lustre Virtual Server - FCL clients vs. Lustre V.S. - FCL + ITB clients vs. Lutre V.S. 8 KVM VM per machine; 1 cores / 2 GB RAM each.

Mar 24, 20117/17 Investigation of storage options for scientific computing on Grid and Cloud facilities Data Access Tests IOZone – Writes (2GB) file from each client and performs read/write tests. Setup: 3-48 clients on 3 VM/nodes. ITB clts vs. FCL bare metal Lustre ITB clts vs. virt. Lustre - virt vs. bare m. server. –read vs. different types of disk and net drivers for the virtual server. –read and write vs. number of virtual server CPU (no difference) FCL clts vs. virt. Lustre - “on-board” vs. “remote” IO –read and write vs. number of idle VMs on the server –read and write w/ and w/o data striping (no significant difference) Tests Performed

Mar 24, 20118/17 Investigation of storage options for scientific computing on Grid and Cloud facilities ITB clts vs. FCL Bare Metal Lustre 350 MB/s read 250 MB/s write Our baseline…

Mar 24, 20119/17 Investigation of storage options for scientific computing on Grid and Cloud facilities ITB clts vs. FCL Virt. Srv. Lustre Changing Disk and Net drivers on the Lustre Srv VM… 350 MB/s read 70 MB/s write (250 MB/s write on Bare M.) Bare Metal Virt I/O for Disk and Net Virt I/O for Disk and default for Net Default driver for Disk and Net Read I/O Rates Write I/O Rates Use Virt I/O drivers for Net

Mar 24, /17 Investigation of storage options for scientific computing on Grid and Cloud facilities ITB & FCL clts vs. FCL Virt. Srv. Lustre FCL client vs. FCL virt. srv. compared to ITB clients vs. FCL virt. srv. w/ and w/o idle client VMs... FCL clts 15% slower than ITB clts: not significant 350 MB/s read 70 MB/s write ( 250 MB/s write on BM ) reads writes FCL clts ITB clts

Mar 24, /17 Investigation of storage options for scientific computing on Grid and Cloud facilities Application-based Tests Focusing on root-based applications: –Nova: ana framework, simulating skim app – read large fraction of all events  disregard all (read- only) or write all. –Minos: loon framework, simulating skim app – data is compressed  access CPU bound (does NOT stress storage) Nova ITB clts vs. bare metal Lustre – Write and Read-only Minos ITB clts vs. bare m Lustre – Diversification of app. Nova ITB clts vs. virt. Lustre – virt. vs. bare m. server. Nova FCL clts vs. virt. Lustre – “on-board” vs. “remote” IO Nova FCL / ITB clts vs. striped virt Lustre – effect of striping Nova FCL + ITB clts vs. virt Lustre – bandwidth saturation Tests Performed

Mar 24, /17 Investigation of storage options for scientific computing on Grid and Cloud facilities 21 Nova clt vs. bare m. & virt. srv. Read – ITB vs. virt. srv. BW =  0.08 MB/s (1 ITB cl.: 15.3  0.1 MB/s) Read – FCL vs. virt. srv. BW =  0.05 MB/s (1 FCL cl.: 14.4  0.1 MB/s) Read – ITB vs. bare metal BW =  0.06 MB/s (1 cl. vs. b.m.: 15.6  0.2 MB/s) Virtual Clients on-board (on the same machine as the Virtual Server) are as fast as bare metal for read Virtual Server is almost as fast as bare metal for read

Mar 24, /17 Investigation of storage options for scientific computing on Grid and Cloud facilities 49 Nova ITB / FCL clts vs. virt. srv. ITB clients FCL clients 49 clts (1 job / VM / core) saturate the bandwidth to the srv. Is the distribution of the bandwidth fair? Minimum processing time for 10 files (1.5 GB each) = 1268 s Client processing time ranges up to 177% of min. time Clients do NOT all get the same share of the bandwidth (within 20%). No difference in bandwidth between ITB and FCL clts. ITB clts: Ave time = 141  4 % Ave bw = 9.0  0.2 MB/s FCL clts: Ave time = 148  3 % Ave bw = 9.3  0.1 MB/s

Mar 24, /17 Investigation of storage options for scientific computing on Grid and Cloud facilities 21 Nova ITB / FCL clt vs. striped virt. srv. What effect does striping have on bandwidth? Read – ITB vs. virt. srv. BW =  0.08 MB/s NON STRIPED Read – FCL vs. virt. srv. BW =  0.05 MB/s Read – ITB vs. virt. srv. BW =  0.01 MB/s STRIPED 4MB stripes on 3 OST Read – FCL vs. virt. srv. BW =  0.03 MB/s MDT OSTs More “consistent” BW Slightly better BW on OSTs 14 MB/s

Mar 24, /17 Investigation of storage options for scientific computing on Grid and Cloud facilities HEPiX Storage Group Collaboration with Andrei Maslennikov Nova offline skim app. used to characterize storage solutions Lustre with AFS front-end for caching has best performance (AFS/VILU).

Mar 24, /17 Investigation of storage options for scientific computing on Grid and Cloud facilities Current Work: Hadoop Eval. Hadoop: 1 meta-data + 3 storage servers. Testing access rates with different replica numbers. Clients access data via Fuse. Only semi-POSIX: root app.: cannot write; untar: returned before data is available; chown: not all features supported; … Read I/O Rates Write I/O Rates Root-app Read Rates Lustre on Bare Metal: 350 MB/s read 250 MB/s write ( Lustre on Bare Metal:  0.06 MB/s Read )

Mar 24, /17 Investigation of storage options for scientific computing on Grid and Cloud facilities Conclusions Performance –Lustre Virtual Server writes 3 times slower than bare metal. Use of virtio drivers is necessary but not sufficient. –The HEP applications tested do NOT have high demands for write bandwidth. Virtual server may be valuable for them. –Using VM clts on the Lustre VM server has the same performance as “external” clients (within 15%) –Data striping has minimal (5%) impact on read bandwidth. None on write. –Fairness of bandwidth distribution is within 20%. –More data will be coming through HEPiX Storage tests. Fault tolerance (results not presented) –Fail-out mode did NOT work –Fail-over tests show graceful degradation General Operations –Managed to destroy data with a change of fault tolerance configuration. Could NOT recover from MDT vs. OST de-synch. –Some errors are easy to understand, some very hard. –The configuration is coded on the Lustre partition. Need special commands to access it. Difficult to diagnose and debug.

Mar 24, /17 Investigation of storage options for scientific computing on Grid and Cloud facilities EXTRA SLIDES

Mar 24, /17 Investigation of storage options for scientific computing on Grid and Cloud facilities Storage evaluation metrics Metrics from Stu, Gabriele, and DMS (Lustre evaluation) Cost Data volume Data volatility (permanent, semi-permanent, temporary) Access modes (local, remote) Access patterns (random, sequential, batch, interactive, short, long, CPU intensive, I/O intensive) Number of simultaneous client processes Acceptable latencies requirements (e.g for batch vs. interactive) Required per-process I/O rates Required aggregate I/O rates File size requirements Reliability / redundancy / data integrity Need for tape storage, either hierarchical or backup Authentication (e.g. Kerberos, X509, UID/GID, AFS_token) / Authorization (e.g. Unix perm., ACLs) User & group quotas / allocation / auditing Namespace performance ("file system as catalog") Supported platforms and systems Usability: maintenance, troubleshooting, problem isolation Data storage functionality and scalability

Mar 24, /17 Investigation of storage options for scientific computing on Grid and Cloud facilities Lustre Test Bed: ITB “Bare Metal” 0.4 TB 1 Disks eth ITB Lustre: 2 OST & 1 MDT Dom0: - 8 CPU - 16 GB RAM Lustre Server NOTE: similar setup as DMS Lustre evaluation: - Same servers - 2 OST vs. 3 OST for DMS. FG ITB Clients (7 nodes - 21 VM) BA mount

Mar 24, /17 Investigation of storage options for scientific computing on Grid and Cloud facilities Machine Specifications FCL Client / Server Machines: –Lustre 1.8.3: set up with 3 OSS (different striping) –CPU: dual, quad core Xeon 2.67GHz with 12 MB cache, 24 GB RAM –Disk: 6 SATA disks in RAID 5 for 2 TB + 2 sys disks ( hdparm  MB/sec ) –1 GB Eth + IB cards ITB Client / Server Machines: –Lustre : Striped across 2 OSS, 1 MB block –CPU: dual, quad core Xeon 2.66GHz with 4 MB cache: 16 GB RAM –Disk: single 500 GB disk ( hdparm  MB/sec )

Mar 24, /17 Investigation of storage options for scientific computing on Grid and Cloud facilities 48 clients on 6 VM on 6 different nodes Mdtest: Tests metadata rates from multiple clients. File/Directory Creation, Stat, Deletion. Setup: 48 clients on 6 VM / nodes.  1 ITB cl vs. ITB bare metal srv.  DMS results  1 ITB cl vs. FCL bare metal srv. Metadata Tests

Mar 24, /17 Investigation of storage options for scientific computing on Grid and Cloud facilities Single client MDS Should be configured as RAID10 rather than RAID 5. Is this effect due to this? Fileop: Iozone's metadata tests. Tests rates of mkdir, chdir, open, close, etc.  1 ITB cl vs. ITB bare metal srv.  1 ITB cl vs. FCL bare metal srv.  DMS results Metadata Tests

Mar 24, /17 Investigation of storage options for scientific computing on Grid and Cloud facilities Status and future work Storage evaluation project status –Initial study of data access model: DONE –Deploy test bed infrastructure: DONE –Benchmarks commissioning: DONE –Lustre evaluation: DONE –Hadoop evaluation: STARTED –Orange FS and Blue Arc evaluations TODO –Prepare final report: STARTED Current completion estimate is May 2011

Mar 24, /17 Investigation of storage options for scientific computing on Grid and Cloud facilities ITB clts vs. FCL Virt. Srv. Lustre Trying to improve write IO changing num of CPU on the Lustre Srv VM… 350 MB/s read 70 MB/s write NO DIFFERENCE Write I/O Rates Read I/O Rates Write IO does NOT depend on num. CPU. 1 or 8 CPU (3 GB RAM) are equivalent for this scale

Mar 24, /17 Investigation of storage options for scientific computing on Grid and Cloud facilities ITB & FCL clts vs. Striped Virt. Srv. What effect does striping have on bandwidth? Read Write No Striping FCL & ITB Striping Reads w/ striping: - FCL clts 5% faster -ITB clts 5% slower Writes are the same Not significant

Mar 24, /17 Investigation of storage options for scientific computing on Grid and Cloud facilities Fault Tolerance Basic fault tolerance tests of ITB clients vs. FCL lustre virtual server Read / Write rates during iozone tests when turning off 1,2,3 OST or MDT for 10 sec or 2 min. 2 modes: Fail-over vs. Fail-out. Fail-out did not work. Graceful degradation: If OST down  access is suspended If MDT down  ongoing access is NOT affected

Mar 24, /17 Investigation of storage options for scientific computing on Grid and Cloud facilities 1 Nova ITB clt vs. bare metal Read & Write BW read = 2.63  0.02 MB/s BW write = 3.25  0.02 MB/s Read BW = 15.6  0.2 MB/s Write is always CPU bound – It does NOT stress storage

Mar 24, /17 Investigation of storage options for scientific computing on Grid and Cloud facilities 1 Nova ITB / FCL clt vs. virt. srv. 1 FCL clt – Read BW = 14.9  0.2 MB/s (Bare m: 15.6  0.2 MB/s) w/ default disk and net drivers: BW = 14.4  0.1 MB/s 1 ITB clt – Read BW = 15.3  0.1 MB/s (Bare m: 15.6  0.2 MB/s) Virtual Server is as fast as bare metal for read On-board client is almost as fast as remote client

Mar 24, /17 Investigation of storage options for scientific computing on Grid and Cloud facilities 21 Clients Minos application (loon) skimming Random access to 1400 files Minos Loon is CPU bound – It does NOT stress storage