Presentation is loading. Please wait.

Presentation is loading. Please wait.

Universita’ degli Studi di Pisa EGEE is a project funded by the European Union under contract IST-2003-508833 Storage Management in LHC Computing Grid.

Similar presentations


Presentation on theme: "Universita’ degli Studi di Pisa EGEE is a project funded by the European Union under contract IST-2003-508833 Storage Management in LHC Computing Grid."— Presentation transcript:

1 Universita’ degli Studi di Pisa EGEE is a project funded by the European Union under contract IST-2003-508833 Storage Management in LHC Computing Grid Flavia Donno PhD candidate in Computer Engineering Universität Wien and University of Pisa Forschungsprivatissimum # 415040 27 June 2005

2 Forschungsprivatissimum # 415040 – June 27, 2005 - 2 Grid The Grid: what is it ? Why the Grid at CERN ? The LHC Computing Grid The LCG Architecture The Storage Element Hardware/Software solutions on LAN Parallel Filesystems The SRM Protocol StoRM: A Storage Resource Manager for Filesystems The StoRM Architecture StoRM as Policy Enforcement Point (PEP) for Storage Status of the StoRM Project Conclusions Outline

3 Forschungsprivatissimum # 415040 – June 27, 2005 - 3 The Grid: what is it ? Many definitions: It's an aggregation of geographically dispersed computing, storage, and network resources, coordinated to deliver improved performance, higher quality of service, better utilization, and easier access to data. It enables virtual, collaborative organizations, sharing applications and data in an open, heterogeneous environment. Researchers perform their activities regardless of geographical location, interact with colleagues, share and access data Scientific instruments and experiments provide huge amount of data The GRID: networked data processing centres and ”middleware” software as the “glue” of resources.

4 Forschungsprivatissimum # 415040 – June 27, 2005 - 4 Compute and Data Grids A compute grid is essentially a collection of distributed computing resources, within or across locations, which are aggregated to act as a unified processing resource or virtual supercomputer. Collecting these resources into a unified pool involves coordinated usage policies, job scheduling and queuing characteristics, grid-wide security, and user authentication. A data grid provides wide area, secure access to current data. Data grids enable users and applications to manage and efficiently use database information from distributed locations. Much like compute grids, data grids also rely on software for secure access and usage policies. Data grids can be deployed within one administrative domain or across multiple domains. ComputeGridDataGrid Computing Element

5 Forschungsprivatissimum # 415040 – June 27, 2005 - 5 The Grid: clusters, intra-grids, extra-grids

6 Forschungsprivatissimum # 415040 – June 27, 2005 - 6 Why the Grid ? Scale of the problems  frontier research in many different fields today requires world-wide collaborations (i.e. multi-domain access to distributed resources) GRIDs provide access to large data processing power and huge data storage possibilities  As the grid grows its usefulness increases (more resources available) Large communities of possible GRID users :  High Energy Physics  Environmental studies: Earthquakes forecast, geologic and climate changes, ozone monitoring  Biology, Genetics, Earth Observation  Astrophysics,  New composite materials research  Astronautics, etc.

7 Forschungsprivatissimum # 415040 – June 27, 2005 - 7 Why the Grid @ CERN ? CMS ATLAS LHCb ~10 PetaBytes / year ~10 8 events/year ~10 3 batch and interactive users

8 Forschungsprivatissimum # 415040 – June 27, 2005 - 8 Why the Grid @ CERN ? High-throughput computing (based on reliable “commodity” technology) More than 3000 (dual processor) PCs with Linux More than 3 Petabyte of data (on disk and tapes) Nowhere near enough!

9 Forschungsprivatissimum # 415040 – June 27, 2005 - 9 Why the Grid @ CERN ? Problem: CERN alone can provide only a fraction of the necessary resources Solution: Computing centers, which were isolated in the past, should now be connected, uniting the computing resources of particle physicists in the world! Europe: 267 institutes 4603 users Elsewhere: 208 institutes 1632 users

10 Forschungsprivatissimum # 415040 – June 27, 2005 - 10 The Grid Projects at CERN : LCG The LCG (LHC Computing Grid) has started in 2002 Its goal is to build a world-wide computing infrastructure based on Grid middleware to offer a computing platform for the LHC experiments. http://www.cern.ch/lcg More than 23,000 HEP jobs running in a day concurrently.

11 Forschungsprivatissimum # 415040 – June 27, 2005 - 11 The LCG Architecture It is based at the moment on the Globus Toolkit version 3 (not Web Service Resource Framework - WSRF - based) Features: Single sign-on (Globus Security Infrastructure), delegation, remote submission (Globus Resource Allocation Manager), GridFTP, Monitor and Directory Service (MDS) Other projects contributing to the LCG middleware are: European Data Grid, DataTag, PPDG, GriPhyN, OSG, Condor Services: Resource Broker, Virtual Organization Management Service, Data Management Service, Data Catalogues Fabric Management and Configuration, Monitoring and Control, Storage Management Solutions Ian Foster Karl Kesselman Steve Tuecke

12 Forschungsprivatissimum # 415040 – June 27, 2005 - 12 The Middleware components UI Network Server Job Contr. - CondorG Workload Manager Replica Location Service Inform. Service Computing Element Storage Element RB node CE chars. & status SE chars. & status Storage Element VOMS Job Match- Maker/ Broker Job “Grid enabled” data transfers/ accesses Result

13 Forschungsprivatissimum # 415040 – June 27, 2005 - 13 The Storage Element Site A Site B Data Movement Replication Data Storage, Access Registration of data GridFTP Globus Replica Catalog (RLS= Replica Location Service)

14 Forschungsprivatissimum # 415040 – June 27, 2005 - 14 Requirements for Storage Site A Data Storage, Access Users on the Grid share resources and access them concurrently Users on the Grid share resources and access them concurrently  Transparent access to files (migration to/from disk pools, other site storage, Mass Storage Systems)  File pinning  File Locking  Space reservation and management  File status notification  Life Time Management  Security  Privacy  Local Policy Enforcement  High I/O performance

15 Forschungsprivatissimum # 415040 – June 27, 2005 - 15 HW/SW solutions on LAN LCG: Hierarchical Structure – CERN is the Tier-0 center where data are collected. Tier-1 centers need to be able to serve ~Petabyte of data. Tier-2 are smaller centers that allow users to access ~100Terabyte of data. Tier-3 are small University sites. Mass Storage Systems (MSS) are normally hosted at Tier-0 and Tier-1 centers. Through robotic tape systems and home developed solutions, data are transparently spooled from tape to disks servers and made available to users (CASTOR, ENSTORE, HPSS, JasMINE, HSM UniTree, …). Protocols for file access are normally “proprietary”: rfio, dcap, ftp, … Disk Pool Servers are based on low-cost parallel or serial ATA disks and can operate at the block or file level and aggregate RAID (Redundant Array of Independent Disks) controllers and capacity. The arrays perform load-balance among self-contained storage modules to allow for performance growth in a linear manner (Castor DiskPool, d-Cache, LCG DPM, SRB, SAM...). Access to files is guaranteed via POSIX-like calls. Management is quite hard.

16 Forschungsprivatissimum # 415040 – June 27, 2005 - 16 HW/SW solutions on LAN Storage Area Network(SAN) is a high-speed special-purpose network (or sub-network) that interconnects different kinds of data storage devices with associated data servers. SANs utilizes Fiber Channel over high-speed fibre optic or copper cabling and can reach data transfer rates of up to 200 Mbps. SANs support disk mirroring; backup and restore; archival and retrieval of archived data; data migration from one storage device to another; and the sharing of data among different servers in a network. SAN solutions operate at the block level. Network Attached Storage (NAS) is a product concept that packages file system hardware and software with a complete storage I/O subsystem as an integrated file server solution. NAS Servers are normally specialized servers that can handle a number of network protocols, including Microsoft's Internetwork Packet Exchange, NetBEUI and CIFS, Novell's Netware Internetwork Packet Exchange, and Sun Microsystem’s NFS. NAS systems provide for dynamic load balancing capabilities, dynamic volume and file system expansion and offer a single, global namespace. NAS systems can deliver performance of tens of Gigabytes/sec in a standard sequential read/write test.

17 Forschungsprivatissimum # 415040 – June 27, 2005 - 17 HW/SW solutions on LAN Grid Storage refers to a topology for scaling the capacity of NAS in response to application requirements, and a technology for enabling and managing a single file system so that it can span an increasing volume of storage. NAS heads are the components containing a thin operating system optimized for NFS (or proprietary) protocol support and storage device attachment. NAS heads are joined together using clustering technology to create one virtual head. Distributed Storage Tank (DST) project by IBM aims in the Global Grid Forum to produce a standards-based Lightweight Directory Access Protocol (LDAP) server to act as the master namespace server.

18 Forschungsprivatissimum # 415040 – June 27, 2005 - 18 Distributed and Parallel File Systems Cluster and distributed file systems are an alternative form of shared file system technology They do not use a separate meta-data server, are designed to work only in homogenous server environments and improving storage manageability is not a goal. Using very high-speed connections (Switched Gigabit Ethernet, Infiniband, etc.) such solutions provide for POSIX I/O, centralized management, load balancing, monitoring, and fail- over capabilities

19 Forschungsprivatissimum # 415040 – June 27, 2005 - 19 Distributed and Parallel File Systems IBM/GPFS, LUSTRE and PVFS-2. Capacity: large files (10-50GB), 100TB file-systems; High throughput: wide striping, large blocks, many GB/s throughputs; Reliability and fault-tolerance: node and disk failures; Online centralized system management: dynamic configuration and monitoring; Parallel data and metadata access: shared disks and distributed locking; Space allocation at file level; Quota, meta-data and file lifetime management; Access Control Lists (ACLs).

20 Forschungsprivatissimum # 415040 – June 27, 2005 - 20 The SRM Protocol Storage Resource Manager (SRM) Storage resource managers are middleware components that manage shared storage resources on the grid and provide management functionalities like:  Uniform access to heterogeneous types of storage  File pinning  Disk space allocation and advanced disk space reservation  Protocol negotiation  Life time management of files  Management of security

21 Forschungsprivatissimum # 415040 – June 27, 2005 - 21 The SRM Protocol SRM functionality SRM interface specification describes:  Space management functions Space reservation Dynamic space management.  Permission functions Permission setting over storage resources.  Data Transfer functions Protocol negotiation Pinning of files. File lifetime management.  Status functions Status of asynchronous requests. SRM missing functionality  File locking  Quota management  Local Policies Enforcement  Security/Privacy  Not fully defined

22 Forschungsprivatissimum # 415040 – June 27, 2005 - 22 SRM interface Methods definition Space Management Functions srmReserveSpace srmReleaseSpace srmUpdateSpace srmCompactSpace srmGetSpaceMetaData srmChangeFileStorageType srmGetSpaceToken Permission Functions srmSetPermission srmReassignToUser srmCheckPermission Directory Functions srmMkdir srmRmdir srmRm srmLs Data Transfer Functions srmPrepareToGet srmPrepareToPut srmCopy srmRemoveFiles srmReleaseFiles srmPutDone srmAbortRequest srmAbortFiles srmSuspendRequest srmResumeRequest Status Functions srmStatusOfGetRequest srmStatusOfPutRequest srmStatusOfCopyRequest srmGetRequestSummary srmGetRequestID

23 Forschungsprivatissimum # 415040 – June 27, 2005 - 23 StoRM for performing filesystems StoRM is a Storage Resource Manager. It exposes a web service interface.  StoRM Web service description (wsdl) is compliant with SRM specification version 2.1.1 It is built on top of GPFS (provides for POSIX/IO). It guarantees coherent access to storage for both Grid and local applications. VOMS certificates. It extends the SRM interface with quota management, locking, ACLs and policy enforcement. It is integrated with:  Replica Consistency service  Workload Management Service (WMS) Agreement Provider for Advance Reservation of Storage resource  Third parties SRM service implementations (SRM compliant)

24 Forschungsprivatissimum # 415040 – June 27, 2005 - 24 StoRM

25 Forschungsprivatissimum # 415040 – June 27, 2005 - 25 StoRM: WMS with reserveSpace SRM associates a space token PUSH or PULL SpaceToken is a job paramenter Write at end of job Space Area in any SE

26 Forschungsprivatissimum # 415040 – June 27, 2005 - 26 StoRM: The Server Architecture

27 Forschungsprivatissimum # 415040 – June 27, 2005 - 27 StoRM: The Server Architecture

28 Forschungsprivatissimum # 415040 – June 27, 2005 - 28 StoRM: The Server Architecture

29 Forschungsprivatissimum # 415040 – June 27, 2005 - 29 StoRM as Policy Enforcement Point

30 Forschungsprivatissimum # 415040 – June 27, 2005 - 30 Status of StoRM Main functionalities available. Request manager stressed-tested. Integration tests performed. Databases schema is now stable. First demo with WS-Ag has been demonstrated successfully. Integration with Just in Time ACLs management is proceeding now. Intense collaboration with IBM for both GPFS functionalities, SRM definition and GGF Filesystem WG. Big interest from Grid research communities to use StoRM. It will be deployed by EGEE/LCG.

31 Forschungsprivatissimum # 415040 – June 27, 2005 - 31 Conclusion The Grid Storage access and management is still an open issue Many solutions exist but do not cover all needs Need to well characterize storage in Grid information system Integration with vendor hardware/software solutions is still not accomplished Global Grid Forum is trying to establish the bases for a standard for a Grid Open FileSystem. Vendors competition still makes the effort hard. StoRM is a step forward in this direction, proposing a Grid interface to distributed and parallel filesystems. StoRM exercizes the software development cycle for standard for the SRM Grid interface proposed and extends it. StoRM is in testing phase. It will be adopted by EGEE/LCG Grid for High Energy Physics communities, Biology and other e-sciences.

32 Forschungsprivatissimum # 415040 – June 27, 2005 - 32 Forshungsprivatissimum # 415040 Storage Management in LHC Computing Grid Flavia Donno PhD candidate in Computer Engineering Universitat Wien and University of Pisa And …. Hope you enjoyed this lecture. Thank you for your attention !


Download ppt "Universita’ degli Studi di Pisa EGEE is a project funded by the European Union under contract IST-2003-508833 Storage Management in LHC Computing Grid."

Similar presentations


Ads by Google