Presentation is loading. Please wait.

Presentation is loading. Please wait.

Building Advanced Storage Environment Cheng Yaodong Computing Center, IHEP December 2002.

Similar presentations


Presentation on theme: "Building Advanced Storage Environment Cheng Yaodong Computing Center, IHEP December 2002."— Presentation transcript:

1 Building Advanced Storage Environment Cheng Yaodong Computing Center, IHEP December 2002

2 Outline ◆ Current Environment ◆ Main Problems ◆ Solutions ◆ Related Techniques ◆ Introduction to CERN/Castor ◆ Test environment

3 Current storage Environment Isolated Storage ◆ Isolated Storage Each server has its own storage ■ Each server has its own storage Multi-platform ◆ Multi-platform Redhat Linux, HP-UX, Solaris, Windows ■ Redhat Linux, HP-UX, Solaris, Windows Various mediums ◆ Various mediums Disk array, tapes including LTO, DLT, SDLT, etc. ■ Disk array, tapes including LTO, DLT, SDLT, etc. Obsolete Management ◆ Obsolete Management NFS ◆ NFS

4 Isolated Storage Sun Storage File System Volume Manager File System Volume Manager HP StorageDell Storage File System Volume Manager Storage Island

5 Main Problems DAS( Directly Attached Storage )  Data Island ◆ DAS( Directly Attached Storage )  Data Island Bad scalability ◆ Bad scalability Low efficiency ◆ Low efficiency Inconvenient to use ◆ Inconvenient to use NFS ◆ NFS Overload on System ■ Overload on System Overhead on Network ■ Overhead on Network ◆ Small capacity

6 Solutions Building an Advanced Storage Environment ◆ Building an Advanced Storage Environment Provides ■ Provides ● Remote access to disk files ● Disk pool management ● Indirect access to tape ● Volume manager ● Hierarchical Storage Manager Functionality Main Objectivities ■ Main Objectivities ● Focussed on HEP requirements ● Easy to use, deploy, administer ● High performance ● Good scalability ● Available on most Unix systems and Windows/NT ● Integration and Virtualization of storage resource

7 Related Techniques ◆ Hierarchical Storage Manager (HSM) Distributed file system ◆ Distributed file system Storage Area Network (SAN) ◆ Storage Area Network (SAN) Virtual Storage ◆ Virtual Storage

8 Hierarchical Storage Manager Characteristics of data in High Energy Physics ◆ Characteristics of data in High Energy Physics ■ 20% active, 80% non-active Layers of storage devices ◆ Layers of storage devices Data migration ◆ Data migration Data recall ◆ Data recall 3-tier storage infrastructure ◆ 3-tier storage infrastructure

9 Distributed file system Load balance between storage devices ◆ Load balance between storage devices Alleviate the overload of OS and network ◆ Alleviate the overload of OS and network A single, shared name space for all users, ◆ A single, shared name space for all users, from all machines from all machines Location-independent file sharing ◆ Location-independent file sharing Client Caching ◆ Client Caching Extended security through Kerberos authentication and Access Control Lists ◆ Extended security through Kerberos authentication and Access Control Lists Replication techniques for file system reliability ◆ Replication techniques for file system reliability

10 Storage Area Network ◆ A private network specially for storage ◆ Storage devices are connected to a switch through FCP, iscsi, InfiniBand and other protocols ◆ These protocols are designed specially for large amount of data transfer ◆ Servers are directly connected to the disks and share data ◆ Use native filesystems  much better performance than NFS ◆ Some HSM functionality still needed

11 HSM SAN Model LAN Storage Area Network server

12 Virtual Storage Map all the storage resource to a virtual device or a single file space ◆ Map all the storage resource to a virtual device or a single file space Integrating storage devices ◆ Integrating storage devices different storage connections: DAS, NAS, SAN ■ different storage connections: DAS, NAS, SAN different storage mediums: Disk, Tape ■ different storage mediums: Disk, Tape Indirectly access physical storage devices ◆ Indirectly access physical storage devices Easy to use, administer ◆ Easy to use, administer Support multi-platform ◆ Support multi-platform Data sharing ◆ Data sharing

13 Our Implement of Virtual Storage Physical Storage Devices Storage Management Software Redhat Client HP Client Solaris Client NT Client Virtual Storage Space Client Virtualization Transparent access

14 Introduction to CERN/castor Cern Advanced STORage manager ◆ Cern Advanced STORage manager In January, 1999, CERN began to develop CASTOR ■ In January, 1999, CERN began to develop CASTOR ■ Hierarchical Storage Manager used to store user and physics files It manages the secondary and tertiary storage ■ It manages the secondary and tertiary storage Currently holds more than 1800 TB of data ■ Currently holds more than 1800 TB of data The servers are installed in the Computer center, while the clients are deployed on most of the computers including the desktops ■ The servers are installed in the Computer center, while the clients are deployed on most of the computers including the desktops. automatic management experiment data on files ■ automatic management experiment data on files Main access to data is through RFIO (Remote File I/O package) ◆ Main access to data is through RFIO (Remote File I/O package)

15 Remote File I/O (RFIO) Provide transparent access to files: they can be local, remote or HSM files ◆ Provide transparent access to files: they can be local, remote or HSM files There exist ◆ There exist ■ a command line interface: rfcp, rfmkdir, rfdir ■ an Application Programming Interface (API) All calls handle standard file names and file descriptors (Unix or Windows) ◆ All calls handle standard file names and file descriptors (Unix or Windows) The routine names are obtained by pre-pending standard Posix system calls by rfio_ ◆ The routine names are obtained by pre-pending standard Posix system calls by rfio_ The function prototypes are unchanged ◆ The function prototypes are unchanged The function name translation is done automatically by including the header file “rfio.h” ◆ The function name translation is done automatically by including the header file “rfio.h”

16 RFIO access to data RFIO Client RFIOD (DISK MOVER) RFIO Client LOCAL DISK REMOTE DISK

17 Disk Pool a series of disks on different machines form disk pool managed by Stager ◆ a series of disks on different machines form disk pool managed by Stager disk virtualization ◆ disk virtualization allocate space in disk pool to store files ◆ allocate space in disk pool to store files make space in the pools to store new files by garbage collector ◆ make space in the pools to store new files by garbage collector Keeps a catalog of all files residing in the pools ◆ Keeps a catalog of all files residing in the pools

18 File access in a disk pool STAGER RFIOD (DISK MOVER) DISK POOL RFIO Client CATALOG

19 Castor Name Server File names are in the form: ◆ File names are in the form: ■ /castor/domain_name/experiment_name/… for example: /castor/ihep.ac.cn/ybj/ ■ /castor/domain_name/user/… for example: /castor/ihep.ac.cn/user/c/cheng Role: ◆ Role: ■ Implement an hierarchical view of the name space: files and directories ■ Remember the file residency on tertiary storage ■ Keep the file class definitions

20 CASTOR file access NAME server STAGER RFIOD (DISK MOVER) DISK POOL NAME server RFIO Client CATALOG

21 CASTOR components The backend store consists of: ◆ The backend store consists of: RFIOD (Disk Mover) ■ RFIOD (Disk Mover) Name server ■ Name server Volume Manager ■ Volume Manager Volume and Drive Queue Manager ■ Volume and Drive Queue Manager ■ RTCOPY daemon (Tape Mover) Tpdaemon ■ Tpdaemon Main characteristics of the servers ◆ Main characteristics of the servers Distributed ■ Distributed Critical servers are replicated ■ Critical servers are replicated Use CASTOR Database (Cdb), Open Source databases (MySQL) ■ Use CASTOR Database (Cdb), Open Source databases (MySQL)

22 Main components Distributed components ◆ Distributed components Remote File I/O (RFIO) ■ Remote File I/O (RFIO) ■ CASTOR Name Server (Cns) Stager ■ Stager Tape Mover (RTCOPY) ■ Tape Mover (RTCOPY) Physical Volume Repository (Ctape) ■ Physical Volume Repository (Ctape) Central components ◆ Central components Volume Manager (VMGR) ■ Volume Manager (VMGR) Volume and Drive Queue Manager (VDQM) ■ Volume and Drive Queue Manager (VDQM) Message Daemon ■ Message Daemon

23 Stager Role: Storage Resource Manager ◆ Role: Storage Resource Manager ■ Disk pool manager Allocates space on disk to store files Keeps a catalog of all files residing in the pools Makes space in the pools to store new files (garbage collector) ■ Hierarchical Resource Manager Migrates files according to file class and disk pool policies Recalls files ■ Tape Stager (deprecated) Caches tape files on disk

24 File classes Associated with each file or directory ◆ Associated with each file or directory Inherited from the parent directory but can be changed (at sub-directory level) ◆ Inherited from the parent directory but can be changed (at sub-directory level) Describes how the file is managed on disk, migrated and purged ◆ Describes how the file is managed on disk, migrated and purged File class attributes are: ◆ File class attributes are: Ownership ■ Ownership Migration time interval ■ Migration time interval Minimum time before migration ■ Minimum time before migration Number of copies ■ Number of copies Retention period on disk ■ Retention period on disk Number of parallel streams (number of drives) ■ Number of parallel streams (number of drives) Tape pools ■ Tape pools

25 Migration policies Migration policy depends on ◆ Migration policy depends on File class ■ File class Disk pool ■ Disk pool Start migration ◆ Start migration Amount of data ready to be migrated exceeds a given threshold ■ Amount of data ready to be migrated exceeds a given threshold Percentage of free space below a given threshold ■ Percentage of free space below a given threshold Time interval ■ Time interval ■ Migration can also be forced Stop migration ◆ Stop migration Data ready at start migration time has been migrated ■ Data ready at start migration time has been migrated Algorithm ◆ Algorithm Least recently accessed file migrated first ■ Least recently accessed file migrated first Maximum number of tape drives (parallel streams) can be set ■ Maximum number of tape drives (parallel streams) can be set

26 Physical Volume Repository (Ctape) ◆ Dynamic configuration of tape drives Reservation of resources ◆ Reservation of resources Drive allocation (when not using VDQM) ◆ Drive allocation (when not using VDQM) Tape volume mount and position ◆ Tape volume mount and position Automatic label checking ◆ Automatic label checking User callable routines to write labels ◆ User callable routines to write labels Drive status display ◆ Drive status display Operator interface ◆ Operator interface VMGR and VDQM interface ◆ VMGR and VDQM interface Hardware supported: ◆ Hardware supported: Drives: DLT, LTO, IBM 3590, STK 9840, STK9940 ■ Drives: DLT, LTO, IBM 3590, STK 9840, STK9940 Robots: ADIC Scalar, IBM 3494, IBM 3584, Odetics, Sony DMS24, STK ■ Robots: ADIC Scalar, IBM 3494, IBM 3584, Odetics, Sony DMS24, STK

27 Volume Manager (VMGR) Handle pool of tapes ◆ Handle pool of tapes private to an experiment ■ private to an experiment public pool ■ public pool ■ supply pool Features: ◆ Features: ■ Determine the most appropriate tapes for storing files in a given tape pool according to file size ■ minimize the number of tape volumes for a given file Tape volumes are administered by the Computer Center. They are not owned nor managed by users. ◆ Tape volumes are administered by the Computer Center. They are not owned nor managed by users. There is one single Volume Manager ◆ There is one single Volume Manager

28 Volume and Drive Queue Manager (VDQM) VDQM maintains a global queue of tape requests per device group ◆ VDQM maintains a global queue of tape requests per device group VDQM maintains a global table of all tape drives ◆ VDQM maintains a global table of all tape drives ■ Provide tape server load-balancing ■ Optimize the number of tape mounts Tape requests are assigned a priority: ◆ Tape requests are assigned a priority: ■ Requests are queued in priority order ■ Requests with same priority are queued in time order ◆ Drives may be dedicated ◆ Easy to add functionality like ■ Drive quotas ■ Fair share scheduler (prototype exists)

29 User interface Command line ◆ Command line ■ name server commands: nsls, nsmkdir, nsrm, nstouch, nschmod,nsenterclass ■ rfio commands: rfdir, rfcp,rfcat, rfchmod,rfrm,rfrename Applications Programming Interface (API) ◆ Applications Programming Interface (API) ■ # include ■ Add a library “lshift” when compiling ■ Two forms of routine names ● obtained by pre-pending standard Posix system calls by rfio_, such as rfio_open, rfio_read, rfio_write, rfio_seek, rfio_close, etc. ● The function prototypes are unchanged. The function name translation is done automatically by including the header file “rfio.h”

30 Test Environment Hardware ◆ Hardware Servers: Dell 6400, Dell 4400, Dell 2400,Dell GX110 ■ Servers: Dell 6400, Dell 4400, Dell 2400,Dell GX110 Disk array, DAS disk ■ Disk array, DAS disk Tape library: Adic100 ( 2 HP LTO devices, 12+60 slots) ■ Tape library: Adic100 ( 2 HP LTO devices, 12+60 slots) ◆ Software ■ Operation System: redhat 7.2 ■ Storage Management Software: CERN/castor Distributed file system: NFS, AFS ■ Distributed file system: NFS, AFS Job scheduling system: PBS ■ Job scheduling system: PBS Database: MySQL ■ Database: MySQL

31

32 Future Storage Environment

33 Conclusion Handle the large amount of data in a fully distributed environment. ◆ Handle the large amount of data in a fully distributed environment. Mapping all the storage resource to a single file space ◆ Mapping all the storage resource to a single file space Users access files in the space through command line or API ◆ Users access files in the space through command line or API Users only remember the file name, and don’t know where their files are placed and whether the storage capacity is enough ◆ Users only remember the file name, and don’t know where their files are placed and whether the storage capacity is enough

34 Thanks!! Thanks!!

35 Storage Hierarchy

36 3-tier storage infrastructure SlowFastVery Fast $$$$$/MB$$/MB $/MB Storage Network LAN or WAN “disk-to-disk” appliance Filers Servers Tape Library Optical Library Heterogeneous Storage Tier 1 primary storage Tier 3 tertiary storage Tier 2 Secondary storage Archival / HSM Backup/restore Parameter s of Prevalent Tapes


Download ppt "Building Advanced Storage Environment Cheng Yaodong Computing Center, IHEP December 2002."

Similar presentations


Ads by Google