Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dynamic and Scalable Distributed Metadata Management in Gluster File System Huang Qiulan Computing Center,Institute of High Energy Physics,

Similar presentations


Presentation on theme: "Dynamic and Scalable Distributed Metadata Management in Gluster File System Huang Qiulan Computing Center,Institute of High Energy Physics,"— Presentation transcript:

1 Dynamic and Scalable Distributed Metadata Management in Gluster File System Huang Qiulan huangql@ihep.ac.cn Computing Center,Institute of High Energy Physics, Computing Center,Institute of High Energy Physics, Chinese Academy of Sciences Chinese Academy of Sciences

2 Introduction to Gluster Gluster in IHEP Deployment, Performance and Issues Design and Implementation What we have done? Experiment results Summary HUANG QIULAN/CC/IHEP Topics

3 Gluster Introduction

4 Gluster Overview Gluster is an open-source distributed file system Linear scale-out, support several petabytes and thousands connect  No metadata structure  Elastic hashing algorithm to distribute data efficiently  Fully distributed architecture Global Namespace to support POSIX High reliability  Data replication  Data self-heal Design and implementation based on a stackable modular user space HUANG QIULAN/CC/IHEP

5 Gluster Architecture HUANG QIULAN/CC/IHEP Brick server: store data in EXT3/EXT4/XFS Client: Access file system by TCP/NFS/SAMBA

6 Stackable and Modular Design HUANG QIULAN/CC/IHEP Stackable and modular structure Each functional module called translator All translators constitute a tree Greatly reduces the complexity of system Easy to expansion of system functions

7 Gluster in IHEP

8 Deployment of Gluster file system HUANG QIULAN/CC/IHEP Version:3.2.7 ( optimization) I/O servers: 5 (39 bricks) Storage Capacity: 315TB Serviced : Cosmic Ray/Astrophysics experiments(YBJ-ARGO) Brick stoarge Computing Cluster SATA Disk Array RAID 6 ( Main ) 10Gb Ethernet Brick stoarge SATA Disk Array RAID 6 ( extended )

9 HUANG QIULAN/CC/IHEP Gluster Performance Performance of each server Peak IO throughput of is up to +850MB/s, the 10Gb Ethernet is fully occupied When the read/write requests increase, Wait IO of data server will up to +40%, When Wait IO >20%, “ls” performance is not optimistic

10 HUANG QIIULAN/CC/IHEP Gluster Issues Metadata problems When data server is busy, “ls” performance lost more With bricks increase, “mkdir”, “ rmdir” performance changed worse Directory tree inconsistent When one brick got problems, client requests would stuck Ownership of link files changed to root:root Most of problems is metadata What can we do for Gluster?

11 Design and Implementation

12 HUANG QIIULAN/CC/IHEP Architecture of our system

13 volume testvol-client-0 type protocol/client option remote-host 192.168.23.31 option remote-subvolume /data02/gdata01 option transport-type tcp end-volume volume testvol-client-1 type protocol/client option remote-host 192.168.122.196 option remote-subvolume /data03/gdata01 option transport-type tcp end-volume volume testvol-client-2 type protocol/client option remote-host 192.168.23.31 option remote-subvolume /data03/gdata01 option transport-type tcp end-volume volume testvol-dht type cluster/distribute subvolumes testvol-client-0 testvol-client-1 testvol-client-2 end-volume ( Add new translator ) volume testvol-md type cluster/md subvolumes testvol-dht testvol-md-replica-0 testvol-md-replica-1 end-volume volume testvol-stat-prefetch type performance/stat-prefetch subvolumes testvol-md end-volume volume testvol type debug/io-stats option latency-measurement on option count-fop-hits on subvolumes testvol-stat-prefetch end-volume Clients Distribute/stripe/replication MD Volume Read ahead io cache stats prefetch VFS HUANG QIULAN/CC/IHEP

14 HUANG QIIULAN/CC/IHEP How to distribute metadata? Adaptive Directory Sub-tree Partition algorithm(ADSP) Improved sub-tree partition algorithm Partitions the namespace into sub trees by the granularity of directory Stores sub trees on the storage device using flat structure Records sub-tree distribution information and file attributions with extended attribute Adjusts sub-tree placement adaptively based on the load of metadata cluster ADSP can solve HOW and WHERE to distribute metadata Features High scalability load balance Flexibility

15 HUANG QIULAN/CC/IHEP ADSP Implementation Flat structure in storage device Use UUID of dir as directory name Metadata layout Stored in extended attribution Sub directories and files under /ybjgfs/argo/public File metadata stored in extended attribution File system namespace /ybjgfs/argo/public /ybjgfs/asgamma/ybjgfs/argo/user argo/ argo/public asgamma/ gfid(public)/ argo/user/ …… … ID=1ID=2 ID=n layout=2

16 HUANG QIULAN/CC/IHEP How to locate data? Distributed Unified Layout Algorithm(DULA) Improved consistent hashing algorithm without any routing information to locate data average time complexity is O (1) Hash ring is divided into intervals of equal length Mapping all storage devices in a hash ring and each device relate to each interval Hash(GFID)  [start, end]

17 Experiment results

18 HUANG QIIULAN/CC/IHEP Metadata performance greatly improved by ADSP Dir operation: ADSP improved about 2~3 times than GLUSTER File operation:ADSP improved about 2 times than GLUSTER and ADSP is better than LUSTRE Metadata performance(1)

19 HUANG QIIULAN/CC/IHEP Metadata performance(2)

20 HUANG QIIULAN/CC/IHEP Metadata performance(3)

21 HUANG QIIULAN/CC/IHEP Metadata performance(4) 1,800,000files under testdir “ls –lRa testdir” Our system ZEFS takes 276 minutes and GLUSTER takes 3643 minutes Performance of ZEFS is about 13 times that of GLUSTER

22 HUANG QIIULAN/CC/IHEP Summary Expanse Metadata module in Gluster framework ADSP algorithm to be responsible metadata distribution and organization DULA algorithm to solve data position in cluster Metadata performance greatly improved Single client, single process :dir operation improved about 2~3 times than Gluster, file operation improved about 2 times than Gluster and our system is better than Lustre Multi-clients, multi processes: High oncurrent access to small files, performance of our system is about 3~4 times that of Gluster, different file size has little effect on performance of directory operation. The overall trend showed greater file size, the performance is slower, but the trend is not particularly obvious. Better scalability than Gluster

23 Thank you Question? Author email: huangql@ihep.ac.cn HUANG QIULAN/CC/IHEP


Download ppt "Dynamic and Scalable Distributed Metadata Management in Gluster File System Huang Qiulan Computing Center,Institute of High Energy Physics,"

Similar presentations


Ads by Google