Sensitivity of Cluster File System Access to I/O Server Selection A. Apon, P. Wolinski, and G. Amerson University of Arkansas.

Slides:



Advertisements
Similar presentations
Crossing the Chasm: Sneaking a parallel file system into Hadoop Wittawat Tantisiriroj Swapnil Patil, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon.
Advertisements

File Consistency in a Parallel Environment Kenin Coloma
Performance Characterization of a 10-Gigabit Ethernet TOE W. Feng ¥ P. Balaji α C. Baron £ L. N. Bhuyan £ D. K. Panda α ¥ Advanced Computing Lab, Los Alamos.
By Ali Alskaykha PARALLEL VIRTUAL FILE SYSTEM PVFS PVFS Distributed File System:
Presented by: Yash Gurung, ICFAI UNIVERSITY.Sikkim BUILDING of 3 R'sCLUSTER PARALLEL COMPUTER.
An Adaptable Benchmark for MPFS Performance Testing A Master Thesis Presentation Yubing Wang Advisor: Prof. Mark Claypool.
Data Locality Aware Strategy for Two-Phase Collective I/O. Rosa Filgueira, David E.Singh, Juan C. Pichel, Florin Isaila, and Jesús Carretero. Universidad.
Comparison and Performance Evaluation of SAN File System Yubing Wang & Qun Cai.
Performance Evaluation of Load Sharing Policies on a Beowulf Cluster James Nichols Marc Lemaire Advisor: Mark Claypool.
Federated DAFS: Scalable Cluster-based Direct Access File Servers Murali Rangarajan, Suresh Gopalakrishnan Ashok Arumugam, Rabita Sarker Rutgers University.
Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.
Network File Systems Victoria Krafft CS /4/05.
Presented by: Alvaro Llanos E.  Motivation and Overview  Frangipani Architecture overview  Similar DFS  PETAL: Distributed virtual disks ◦ Overview.
Parallel Communications and NUMA Control on the Teragrid’s New Sun Constellation System Lars Koesterke with Kent Milfeld and Karl W. Schulz AUS Presentation.
Interposed Request Routing for Scalable Network Storage Darrell Anderson, Jeff Chase, and Amin Vahdat Department of Computer Science Duke University.
1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.
File System Benchmarking
1 A Look at PVFS, a Parallel File System for Linux Talk originally given by Will Arensman and Anila Pillai.
Pooja Shetty Usha B Gowda.  Network File Systems (NFS)  Drawbacks of NFS  Parallel Virtual File Systems (PVFS)  PVFS components  PVFS application.
Early Experiences with NFS over RDMA OpenFabric Workshop San Francisco, September 25, 2006 Sandia National Laboratories, CA Helen Y. Chen, Dov Cohen, Joe.
Emalayan Vairavanathan
A TCP/IP transport layer for the DAQ of the CMS Experiment Miklos Kozlovszky for the CMS TriDAS collaboration CERN European Organization for Nuclear Research.
Boosting Event Building Performance Using Infiniband FDR for CMS Upgrade Andrew Forrest – CERN (PH/CMD) Technology and Instrumentation in Particle Physics.
The MPC Parallel Computer Hardware, Low-level Protocols and Performances University P. & M. Curie (PARIS) LIP6 laboratory Olivier Glück.
Workload-driven Analysis of File Systems in Shared Multi-Tier Data-Centers over InfiniBand K. Vaidyanathan P. Balaji H. –W. Jin D.K. Panda Network-Based.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
EFFECTS OF LOCALITY, CONTENT AND JAVA RUNTIME ON VIDEO PERFORMANCE Vikram Chhabra, Akshay Kothare, Mark Claypool Computer Science Department Worcester.
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Building a Parallel File System Simulator E Molina-Estolano, C Maltzahn, etc. UCSC Lab, UC Santa Cruz. Published in Journal of Physics, 2009.
November 2, 2000HEPiX/HEPNT FERMI SAN Effort Lisa Giacchetti Ray Pasetes GFS information contributed by Jim Annis.
CMAQ Runtime Performance as Affected by Number of Processors and NFS Writes Patricia A. Bresnahan, a * Ahmed Ibrahim b, Jesse Bash a and David Miller a.
Amy Apon, Pawel Wolinski, Dennis Reed Greg Amerson, Prathima Gorjala University of Arkansas Commercial Applications of High Performance Computing Massive.
A Measurement Based Memory Performance Evaluation of High Throughput Servers Garba Isa Yau Department of Computer Engineering King Fahd University of Petroleum.
Towards Exascale File I/O Yutaka Ishikawa University of Tokyo, Japan 2009/05/21.
CHEP04 Performance Analysis of Cluster File System on Linux Yaodong CHENG IHEP, CAS
Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University.
PHENIX Computing Center in Japan (CC-J) Takashi Ichihara (RIKEN and RIKEN BNL Research Center ) Presented on 08/02/2000 at CHEP2000 conference, Padova,
Welcome to the PVFS BOF! Rob Ross, Rob Latham, Neill Miller Argonne National Laboratory Walt Ligon, Phil Carns Clemson University.
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Implementation of a reliable and expandable on-line storage for compute clusters Jos van Wezel.
Sockets Direct Protocol Over InfiniBand in Clusters: Is it Beneficial? P. Balaji, S. Narravula, K. Vaidyanathan, S. Krishnamoorthy, J. Wu and D. K. Panda.
Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
The 2001 Tier-1 prototype for LHCb-Italy Vincenzo Vagnoni Genève, November 2000.
Distributed File Systems Questions answered in this lecture: Why are distributed file systems useful? What is difficult about distributed file systems?
Parallel IO for Cluster Computing Tran, Van Hoai.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
FroNtier Stress Tests at Tier-0 Status report Luis Ramos LCG3D Workshop – September 13, 2006.
GRAPE 현황. Hardware Configuration Master –2 x Dual Core Xeon 5130 (2 GHz) –2 GB memory –500 GB hard disk 4 Slaves –2 x Dual Core Xeon 5130 (2 GHz) –1 GB.
Cluster computing. 1.What is cluster computing? 2.Need of cluster computing. 3.Architecture 4.Applications of cluster computing 5.Advantages of cluster.
1 Evaluation of Cooperative Web Caching with Web Polygraph Ping Du and Jaspal Subhlok Department of Computer Science University of Houston presented at.
LIOProf: Exposing Lustre File System Behavior for I/O Middleware
An Introduction to GPFS
Chapter 16 Client/Server Computing Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
Experiences with VI Communication for Database Storage Yuanyuan Zhou, Angelos Bilas, Suresh Jagannathan, Cezary Dubnicki, Jammes F. Philbin, Kai Li.
1© Copyright 2015 EMC Corporation. All rights reserved. NUMA(YEY) BY JACOB KUGLER.
29/04/2008ALICE-FAIR Computing Meeting1 Resulting Figures of Performance Tests on I/O Intensive ALICE Analysis Jobs.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
High Performance Storage System (HPSS) Jason Hick Mass Storage Group HEPiX October 26-30, 2009.
Parallel Virtual File System (PVFS) a.k.a. OrangeFS
Getting the Most out of Scientific Computing Resources
Getting the Most out of Scientific Computing Resources
Table General Guidelines for Better System Performance
Diskpool and cloud storage benchmarks used in IT-DSS
SAM at CCIN2P3 configuration issues
KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures
Table General Guidelines for Better System Performance
High-Performance Storage System for the LHCb Experiment
PVFS: A Parallel File System for Linux Clusters
Presentation transcript:

Sensitivity of Cluster File System Access to I/O Server Selection A. Apon, P. Wolinski, and G. Amerson University of Arkansas

Overview Benchmarking study – Parallel Virtual File System (PVFS) – Network File System (NFS) Testing parameters include – Pentium-based cluster node hardware – Myrinet interconnect – Varying number and configuration of I/O servers and client request patterns

Outline File system architectures Performance study design Experimental results Conclusions and future work

Node 0 NFS ServerNode 1 Node 2 Node N Each cluster node has dual-processor Pentium Linux, HD, lots of memory Network Switch NFS Architecture Client/server system Single server for files DATA FILE

PVFS Architecture Also a client/server system Many servers for each file Fixed sized stripes in round-robin fashion Node 0 Node 2 Node 1 DATA FILE Each cluster node still has dual-processor Pentium Linux, HD, lots of memory Network Switch

PVFS Architecture One node is a manager node – Maintains metadata information for files Configuration and usage options include: – Size of stripe – Number of I/O servers – Which nodes serve as I/O servers – Native PVFS API vs. UNIX/POSIX API

Native PVFS API example #include int main() { int fd, bytes; fd=pvfs_open(fn,O_RDONLY,0,NULL,NULL);... pvfs_lseek(fd, offset, SEEK_SET);... bytes_read = pvfs_read(fd, buf_ptr, bytes);... pvfs_close(fd); }

Performance Study Design Goals – Investigate the effect on cluster I/O when using the NFS server or the PVFS I/O servers also as clients – Compare PVFS with NFS

Performance Study Design Experimental cluster – Seven dual-processor Pentium III 1GHz, 1GB memory computers – Dual EIDE disk RAID 0 subsystem in all nodes, measured throughput about 50MBps – Myrinet switches, 250MBps theoretical bandwidth

Performance Study Design Two extreme client workloads – Local whole file (LWF) Takes advantage of caching on server side One process per node, each process reads the entire file from beginning to end Node 1Node 2Node N

Performance Study Design Two extreme client workloads – Global whole file (GWF) Minimal help from caching on the server side One process per node, each process reads a different portion of the file, balanced workload Node 1Node 2Node N

NFS Parameters Mount on Node 0 is a local mount – Optimization for NFS NFS server can participate or not as a client in the workload

PVFS Parameters A preliminary study was performed to determine the “best” stripe size and request size for the LWF and GWF workloads – Stripe size of 16KB – Request size of 16MB – File size of 1GB All I/O servers for a given file participate in all requests for that file

System Software RedHat Linux version 7.1 Linux kernel version rc2 NFS protocol version 3 PVFS version PVFS kernel version Myrinet network drivers gm-1.5-pre3b MPICH version 1.2.1

Experimental Pseudocode For all nodes Open the test file Barrier synchronize with all clients Get start time Loop to read/write my portion Barrier synchronize with all clients Get end time Report bytes processed and time For Node 0 Receive bytes processed, report aggregate throughput

Clearcache Clear NFS client and server-side caches – Unmount NFS directory, shutdown NFS – Restart NFS, remount NFS directories Clear server-side PVFS cache – Unmount PVFS directories on all nodes – Shutdown PVFS I/O daemons, manager – Unmount pvfs-data directory on slaves – Restart PVFS manager, I/O daemons – Remount PVFS directories, all nodes

Experimental Parameters Number of participating clients Number of PVFS I/O servers PVFS native API vs. UNIX/POSIX API  I/O servers (NFS as well as PVFS) may or may not also participate as clients

Experimental Results NFS PVFS native API vs UNIX/POSIX API GWF, varying server configurations LWF, varying server configurations

NFS, LWF and GWF with and without server reading

PVFS, LWF and GWF native PVFS API vs. UNIX/POSIX API

PVFS UNIX/POSIX API compared to NFS

PVFS, GWF using native API servers added from Node 6 down

PVFS and NFS, GWF, 1 and 2 clients with/without server participating

PVFS, LWF using native API servers added from Node 6 down

PVFS and NFS, LWF, 1, 2, 3 clients with/without servers participating

PVFS, LWF and GWF, separate clients and servers, seven nodes

Conclusions NFS can take advantage of a local mount NFS performance is limited by contention at the single server – Limited to the disk throughput or the network throughput from the server, whichever has the most contention

Conclusions PVFS performance generally improves (does not decrease) as the number of clients increases – More improvement seen with LWF workload than with the GWF workload PVFS performance improves when the workload can take advantage of server-side caching

Conclusions PVFS is better than NFS for all types of workloads where more than one I/O server can be used PVFS UNIX/POSIX API performance is much less than the performance using the PVFS native API – May be improved by a new release of the Linux kernel

Conclusions For a given number of servers, PVFS I/O throughput decreases when the servers also act as clients For the workloads tested, PVFS system throughput increases to the maximum possible for the cluster when all nodes participate as both clients and servers

Observation The drivers and libraries have been in constant upgrade during these studies. However, our recent experiences indicate that they are now stable and interoperate well together.

Future Work Benchmarking with cluster workloads that include both computation and file access Expand the benchmarking to a cluster with a higher number of PVFS clients and PVFS servers