Presentation is loading. Please wait.

Presentation is loading. Please wait.

Small File File Systems USC Jim Pepin. Level Setting  Small files are ‘normal’ for lots of people Metadata substitute (lots of image data are done this.

Similar presentations


Presentation on theme: "Small File File Systems USC Jim Pepin. Level Setting  Small files are ‘normal’ for lots of people Metadata substitute (lots of image data are done this."— Presentation transcript:

1 Small File File Systems USC Jim Pepin

2 Level Setting  Small files are ‘normal’ for lots of people Metadata substitute (lots of image data are done this way)  Comes from ‘pc’/desktop world  These users have discovered ‘hpc’ but don’t want to change programs (not even MPI)  Find ways to help (best is ‘rewrite’ but that is not reasonable to expect)  Small files are deadly to most file systems Some more than others Impact of ‘custer’ systems

3 Level Setting  Disks Sata  Not fast.  Reliability issues  Cheap Fast disk (15k etc)  Not cheap  Fast People are looking at ‘cheap’  Drives better backup/maintainability solutions Distributed doesn’t mean ‘faster’ Virtualization can be your enemy (in some ways)

4 Basics  1800 node cluster Presents special problems  Myrinet ‘interconnect’  Ethernet (gb) data plane  Fiber channel disk/tape data plane (2Gb/s) 256+ disk/tape devices  15+ file servers  250+ TB disk  Tape Backup DR site

5 Basics  QFS base FS Archiving and distributed access Sun thing  Local parallel FS on nodes  NFS Issues around it  “Condo” disk versus Condo nodes

6 Basics  Three types of File systems  Parallel FS on compute nodes (temp) Exception on ‘condo’ nodes  Small files More directory transactions Small frames win No stripes  Large files More data transactions Jumbo frames win Stripes win  Tuning is stripe factors and blocksizes

7 Small Files  Examples  Genomics Group 10ks of files in single directory  Natural Language Group 50-250k files in directory Many nodes accessing same stuff  Dictionaries  Backups are ‘slower’ / ‘harder’ Reasons Updating directory data Blocking of data on tape

8 Small Files  Ways to help “Faster” disk (helps metadata/directory space) Distributed file access (qfs)  Metadata still a ‘block’. Read/write locks Updating for distributed access Next version scales better (lock improvements) No free lunch Special Purpose File Systems and/or local space on cluster nodes (replication)

9 Next generation  Why change needed NFS doesn’t cut it  Why  GPFS Helps some  10Gb hosts on ‘data plane’ Next month  Ram disk for ‘metadata’?

10 Next generation  Storage management solutions SRB and friends Database based solutions Lustre possible Object storage Performance for small files/objects is question in my mind  All these have potential but… Back to don’t change code “Virtualization” conundrum  How to build massively parallel data spaces HPCS/other projects


Download ppt "Small File File Systems USC Jim Pepin. Level Setting  Small files are ‘normal’ for lots of people Metadata substitute (lots of image data are done this."

Similar presentations


Ads by Google