Distributed Computing for CEPC YAN Tian On Behalf of Distributed Computing Group, CC, IHEP for 4 th CEPC Collaboration Meeting, Sep , 2014 Draft Version
Outline Introduction Experience of BES-DIRAC Distributed Computing Computing model Computing resources list Official MC production Data transfer system Distributed Computing for CEPC – A test bed established – Sharing resources with BES – User job workflow – Physical validation – To do
INTRODUCTION Part I
About Distributed Computing Distributed computing plays an import role in discovery of Higgs – “Without the LHC Computing Grid, the discovery could not have occurred” ---- Foster Many HEP experiments have employed distributed computing to integrate resources contribued by collaboration members. – such as LHCb, Belle 2, CTA, ILC, BES 3, etc.. Large HEP experiments need plenty of computing resources, which may not be afforded by only one institution or university.
DIRAC: a Interware DIRAC (Distributed Infrastructure with Remote Agent Control) is a interware for grid computing. It’s powfull, flexible and widely used as central component of grid solution. More info: DIRAC Homepage: DIRAC Github:
DIRAC Users: LHCb, Belle, CTA, ILC, etc… ILC: ~ 3,000 CPU Cores LHCb: ~ 40,000 CPU Cores Belle2: ~ 12,000 CPU Cores CTA: ~ 5,000 CPU Cores
EXPERIENCE OF BES-DIRAC DISTRIBUTED COMPUTING Part II
BES-DIRAC: Computing Model Detector IHEP Data Center DIRAC Central SE Remote Site Raw dst & ramdomtrg Raw data Remote Site MC dst Remote Users IHEP Users All dst
BES-DIRAC: Computing Resources List #ContributorsCE TypeCPU CoresSE TypeSE CapacityStatus 1IHEPCluster + Cloud144dCache214 TBActive 2Univ. of CASCluster152Active 3USTCCluster200 ~ 1280dCache24 TBActive 4Peking Univ.Cluster100Active 5Wuhan Univ.Cluster100 ~ 300StoRM39 TBActive 6Univ. of MinnesotaCluster768BeStMan50 TBActive 7JINRgLite + Cloud100 ~ 200dCache8 TBActive 8INFN & Torino Univ.gLite + Cloud26420 TBActive 9CERNCloud20Active 10Soochow Univ.Cloud20Active Total1868 ~ TB 11Shandong Univ.Cluster100Preparing 12BUAACluster256Preparing 13SJTUCluster TBPreparing Total TB
BES-DIRAC: Official MC Production #TimeTaskBOSS Ver.Total EventsJobsData Output J/psi inclusive (round 05) M32, TB ~ Psi3770 (round 03,04)6.6.4.p M69, TB Total M102, TB job 2 nd batch of 2 nd productionPhysical Validation Check of 1 st production keep run ~1350 jobs for one week 2 nd batch: Dec.7~15
BES-DIRAC: Simulation+Reconstruction Simulation + reconstruction jobs are supported. Randomtrg data has been distributed to remoted sites with SE. Job download randomtrg data from local SE, or directly read from SE mounted to nodes.
BES-DIRAC: Data Trasfer System Data transfered from March to July 2014, total 85.9 TB DataSource SEDestination SEPeak SpeedAverage Speed randomtrg r04USTC, WHUUMN96 MB/S76.6 MB/s (6.6 TB/day) randomtrg r07IHEPUSTC, WHU191 MB/s115.9 MB/s (10.0 TB/day) Data TypeDataData SizeSource SEDestination SE DST xyz24.5 TBIHEPUSTC psippscan2.5 TBIHEPUMN Random trigger data round TBIHEPUSTC, WHU, UMN, JINR round TBIHEPUSTC, WHU, UMN round TBIHEPUSTC, WHU, UMN round TBIHEPUSTC, WHU, UMN round TBIHEPUSTC, WHU, UMN, JINR round TBIHEPUSTC, WHU high quality ( > 99% one-time success rate) high transfer speed ( ~ 1 Gbps to USTC, WHU, UMN; 300Mbps to JINR):
IHEP USTC, 10.0 TB/day USTC, WHU 6.6 TB/day one-time success > 99%
DISTRIBUTED COMPUTING FOR CEPC part III
A Test Bed Established BES-DIRAC Servers Job flow *.stdhep input data *.slcio output data BUAA Site OS: SL 5.8 Remote WHU Site OS: SL 6.4 Remote IHEP PBS Site OS: SL 5.5 IHEP-OpenStack Site IHEP Lustre WHU SE IHEP Local Resources IHEP DB DB mirror CVMFS Server
Sharing Resources with BES Which resources can be shared? – Central DIRAC Servers & Mantainers. (hope CEPC coll. can contribute manpower) – Computing & Storage resources contributed by sites who wish to support both BES and CEPC, such as IHEP, WHU, BUAA, Soochow Univ., etc … Multi-VO (Virutal Organization) support technology is under development – It’s a grid framework for managing resources for multi collaborations. – VOMS server has been configured, tested, now is ready to use. – multi-VO workload management system is under testing. – StoRM SE with multi-VO support is under developing.
User Job Workflow Submit a User Job Step by Step: (1) upload input data to SE (2) prepare a JDL file: job.jdl (3) prepare job.sh (4) submit job to DIRAC (5) monitoring job status in web portal (6) Download output data to Lustre
Physical Validation Check Under going… will be finished before Sep.10
To Do List Add and test new sites; Deploy remote mirror MySQL database; Development a frontend module for massive job splitting, submission, monitoring & data management; Refine multi-VO suport to manage BES&CEPC sharing resources;
Thanks Thank you for your attention! Q & A Further questions and cooperations, please contact ZHANG Xiaomei and YAN Tian