StoRM + Lustre Proposal YAN Tian On behalf of Distributed Computing Group 2014.12.10 1.

Slides:

Advertisements

Similar presentations

Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.

Advertisements

Role Based VO Authorization Services Ian Fisk Gabriele Carcassi July 20, 2005.

Status of BESIII Distributed Computing BESIII Workshop, Mar 2015 Xianghu Zhao On Behalf of the BESIII Distributed Computing Group.

Minerva Infrastructure Meeting – October 04, 2011.

DIRAC API DIRAC Project. Overview  DIRAC API  Why APIs are important?  Why advanced users prefer APIs?  How it is done?  What is local mode what.

Makrand Siddhabhatti Tata Institute of Fundamental Research Mumbai 17 Aug

Windows Server MIS 424 Professor Sandvig. Overview Role of servers Performance Requirements Server Hardware Software Windows Server IIS.

Distributed Computing for CEPC YAN Tian On Behalf of Distributed Computing Group, CC, IHEP for 4 th CEPC Collaboration Meeting, Sep ,

CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.

BaBar WEB job submission with Globus authentication and AFS access T. Adye, R. Barlow, A. Forti, A. McNab, S. Salih, D. H. Smith on behalf of the BaBar.

03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.

BINP/GCF Status Report BINP LCG Site Registration Oct 2009

YAN, Tian On behalf of distributed computing group Institute of High Energy Physics (IHEP), CAS, China CHEP-2015, Apr th, OIST, Okinawa.

The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.

INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.

:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: GridKA School 2009 MPI on Grids 1 MPI On Grids September 3 rd, GridKA School 2009.

11/30/2007 Overview of operations at CC-IN2P3 Exploitation team Reported by Philippe Olivero.

Status of StoRM+Lustre and Multi-VO Support YAN Tian Distributed Computing Group Meeting Oct. 14, 2014.

Group 1 : Grid Computing Laboratory of Information Technology Supervisors: Alexander Ujhinsky Nikolay Kutovskiy.

BESIII Production with Distributed Computing Xiaomei Zhang, Tian Yan, Xianghu Zhao Institute of High Energy Physics, Chinese Academy of Sciences, Beijing.

Author - Title- Date - n° 1 Partner Logo WP5 Summary Paris John Gordon WP5 6th March 2002.

9 th Weekly Operation Report on DIRAC Distributed Computing YAN Tian From to

Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April

June 24-25, 2008 Regional Grid Training, University of Belgrade, Serbia Introduction to gLite gLite Basic Services Antun Balaž SCL, Institute of Physics.

Distributed Computing for CEPC YAN Tian On Behalf of Distributed Computing Group, CC, IHEP for 4 th CEPC Collaboration Meeting, Sep , 2014 Draft.

Legion - A Grid OS. Object Model Everything is object Core objects - processing resource– host object - stable storage - vault object - definition of.

Role Based VO Authorization Services Ian Fisk Gabriele Carcassi July 20, 2005.

US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.

Jens G Jensen RAL, EDG WP5 Storage Element Overview DataGrid Project Conference Heidelberg, 26 Sep-01 Oct 2003.

Derek Ross E-Science Department DCache Deployment at Tier1A UK HEP Sysman April 2005.

Rutherford Appleton Lab, UK VOBox Considerations from GridPP. GridPP DTeam Meeting. Wed Sep 13 th 2005.

Tier 3 Status at Panjab V. Bhatnagar, S. Gautam India-CMS Meeting, July 20-21, 2007 BARC, Mumbai Centre of Advanced Study in Physics, Panjab University,

27 th Weekly Operation Report on DIRAC Distributed Computing YAN Tian From to

VO Box Issues Summary of concerns expressed following publication of Jeff’s slides Ian Bird GDB, Bologna, 12 Oct 2005 (not necessarily the opinion of)

Auditing Project Architecture VERY HIGH LEVEL Tanya Levshina.

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Update Authorization Service Christoph Witzig,

Status of BESIII Distributed Computing BESIII Workshop, Sep 2014 Xianghu Zhao On Behalf of the BESIII Distributed Computing Group.

AFS/OSD Project R.Belloni, L.Giammarino, A.Maslennikov, G.Palumbo, H.Reuter, R.Toebbicke.

1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.

The GridPP DIRAC project DIRAC for non-LHC communities.

The VOMS and the SE in Tier2 Presenter: Sergey Dolgobrodov HEP Meeting Manchester, January 2009.

LHCb 2009-Q4 report Q4 report LHCb 2009-Q4 report, PhC2 Activities in 2009-Q4 m Core Software o Stable versions of Gaudi and LCG-AA m Applications.

Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.

ATLAS Computing Wenjing Wu outline Local accounts Tier3 resources Tier2 resources.

IHEP Computing Center Site Report Gang Chen Computing Center Institute of High Energy Physics 2011 Spring Meeting.

StoRM+Lustre Performance Test with 10Gbps Network YAN Tian for Distributed Computing Group Meeting Nov. 4th, 2014.

Bologna, March 30, 2006 Riccardo Zappi / Luca Magnoni INFN-CNAF, Bologna.

Progress of Work on SE and DMS YAN Tian April. 16, 2014.

The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.

Enabling Grids for E-sciencE Claudio Cherubino INFN DGAS (Distributed Grid Accounting System)

Status of BESIII Distributed Computing BESIII Collaboration Meeting, Nov 2014 Xiaomei Zhang On Behalf of the BESIII Distributed Computing Group.

Vendredi 27 avril 2007 Management of ATLAS CC-IN2P3 Specificities, issues and advice.

The CMS Beijing Tier 2: Status and Application Xiaomei Zhang CMS IHEP Group Meeting December 28, 2007.

EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI solution for high throughput data analysis Peter Solagna EGI.eu Operations.

Jean-Philippe Baud, IT-GD, CERN November 2007

AuthN and AuthZ in StoRM A short guide

Status of BESIII Distributed Computing

SuperB – INFN-Bari Giacinto DONVITO.

Large Output and Shared File Systems

Distributed Computing in IHEP

Xiaomei Zhang CMS IHEP Group Meeting December

StoRM: a SRM solution for disk based storage systems

Overview of the Belle II computing

Report of Dubna discussion

Status of Storm+Lustre and Multi-VO Support

Ákos Frohner EGEE'08 September 2008

Discussions on group meeting

Xiaomei Zhang On behalf of CEPC software & computing group Nov 6, 2017

The CMS Beijing Site: Status and Application

The LHCb Computing Data Challenge DC06

Presentation transcript:

StoRM + Lustre Proposal YAN Tian On behalf of Distributed Computing Group

INTRODUCTION TO STORM 1.Architecture 2.Security (X509) 3.User Access Management 4.Server Scalability 2

StoRM Architecture Overview 3 Simple architecture – FE handle authorization and SRM request – DB store asynchronous SRM request info – BE execute syn/asyn request, bind with underlying fs StoRM act as a frontend of storage at a site

StoRM Security StoRM rely on user credential for what concern user authentication and authorization. StoRM is able to support VOMS extension, and to use that to define access policy (complete VOMS- awareness) 4

User Access Management There are several steps StoRM does to manage access to file: 1.User makes a request with his proxy 2.StoRM checks if the user can perform the requested operation on the required resource 3.StoRM ask user mapping to the LCMAPS service 4.StoRM enforce a real ACL on the file and directories requested 5.Jobs running on behalf of the user can perform a direct access on the data 5

Scalability Single host 6 Clustered deployment

STORM + LUSTRE PERFORMANCE TEST 1.Test Bed 2.SE Transfer Out Test (besdirac’s dir. read) 3.Job Write to Lustre Test (besdirac’s dir. write) 4.SE Transfer In Test (besdirac’s dir. write) 5.DST Data Trasfer between SE (other user’s dir. read – to be taken) 6.Multi-VO Support Test 7

Test Bed Single server without data disk 10 Gbps network /cefs mounted with ACL enabled A subdirectory of /cefs (own to account besdirac) is bind to StoRM pub directory ModelDell PowerEdger R620 CPUXeon E v2 ( 8 cores) Memory64 GB HDD300 GB SAS RAID-1 Network10 Gbps 8

SE Transfer Out Test Test procedure: 1.prepare 2,000 files of 1GB size located in /cefs 2.registering metadata into DIRAC DFC 3.transfer the dataset to remote SE at WHU Test Results: 1.registering DFC takes 70 seconds, i.e., 35 seconds per 1k files 2.average transfer speed is 80.9 MB/s, peak speed is 91.9 MB/s 3.one-time success rate is 100% 9

10 IHEP-STORM  WHU-USER average: 80.9 MB/s peak: 91.9 MB/s 2 TB data transferred in 7 hours

files of 1GB size 100% success

Job Write to Lustre Test Facts about testing jobs: – 200M total events, bhabha sim. + rec. – split by run, 20k max events/job – 10,929 total jobs submitted – 10,282 jobs done (94.1%) – Job failed reason: 353 stalled (USTC unstable power supply) 275 overload (UMN node error) 6 application failed 13 network failure – 1.4 TB data generated and uploaded to StoRM+Lustre (IHEP-STORM) Test results: 1.No job failed because of upload output data error TB output data write to test SE with high success rate 3.output can be immediately seen at Lustre 12

13 3 days >10K jobs

% success rate no job failed for upload output data

15 ~ 1.4 TB output data write to StoRM+Lustre

16 data uploaded with good quality

Output data can be seen at Lustre 17 data write to immediately

SE Transfer In Test Facts: – tranfser from UMN SE to /cefs/tmp_storage/yant/transfer/DsHi – 2.3 TB MC Sample (dst, rtraw, logs, scripts) – files – registered into DFC in 12m50s (48s/1k files) – speed: 20~30 MB/s – quality: > 99% 18

19

20

Multi-VO Test Currently supported VO: bes, cepc, juno Each VO’s user can read/write it’s own root directory User from one VO can not access other VO user’s files A test is performed: 1.initialize proxy as cepc VO user 2.check if bes VO’s directory is available 3.check if cepc VO’s directory is available 4.srmcp test (read/write) Test Result: 1.cepc user can not visit bes VO’s directory 2.cepc user can read/write its VO’s own directory 21

22 Register as cepc user Failed to access BES VO’s storage area Success to read/write CEPC VO’s storage area

SUMMARY AND DISCUSSION 23

Test Summary With ACL enabled /cefs, in besdirac’s diretory, read/write is OK Need more debug&test on reading other user’s data Speed of read (80MB/s) is acceptable Speed of write (20-30MB/s), need more test Mulit-VO support is working 24

Comparison of StoRM and dCache The StoRM solution is easier to install and maintain, no extra development is required The StoRM solution could be more efficient without registering lustre metadata in advance and without data movement StoRM is a promising solution and we will do more tests before making final decision 25

Lustre Data Security StoRM SE server acts like lxslc5 login node Lustres are mounted on / use mount –bind to remount a subdirectory of Lustre to StoRM pub directory only this subdirectory is visiable to grid user (by low level srm command) currently, in StoRM, all grid user are mapped to AFS account ‘besdirac’, r/w on Lustre is executed by user ‘besdirac’. So, only besdirac’s directory can be modified, other user’s data in Lustre is safe In production senario: input/output data of DIRAC jobs will be located at one Lustre user’s directory (i.e. besdirac) in besdirac’s directory, we create subdirectories for each grid users When we need to transfer DST from IHEP to remote site, that DST directory is mounted temporarily and read only When transfer DST from remote site back to IHEP, data will be write into besdirac’s dir. 26

Production Solution 1 enable ACL, user_xattr on production Lustre ( /besfs, /besfs2, /bes3fs, /junofs, etc) create a directory for user besdirac in each Lustre with serveral or dozens of TB quota (depend on physics user’s requirements) disadvantage: prod. Lustre are busy and can’t be shutdown to enable ACL a solution: can be performed during mantaince time 27

Production Solution 2 prepare a seperated Lustre, e.g. /diracfs or we can change current IHEPD-USER’s 88TB disk pool (even 126TB data disk) to Lustre advantage: production Lustres are un-effected disadvantage: – abandon StoRM+Lustre solution ; – hard to enlarge /diracfs to PB level 28