Presentation is loading. Please wait.

Presentation is loading. Please wait.

Site Report from KEK, Japan JP-KEK-CRC-01 and JP-KEK-CRC-02 Go Iwai, KEK/CRC Grid Operations Workshop – 2007 Kungliga Tekniska högskolan, Stockholm, Sweden.

Similar presentations


Presentation on theme: "Site Report from KEK, Japan JP-KEK-CRC-01 and JP-KEK-CRC-02 Go Iwai, KEK/CRC Grid Operations Workshop – 2007 Kungliga Tekniska högskolan, Stockholm, Sweden."— Presentation transcript:

1 Site Report from KEK, Japan JP-KEK-CRC-01 and JP-KEK-CRC-02 Go Iwai, KEK/CRC Grid Operations Workshop – 2007 Kungliga Tekniska högskolan, Stockholm, Sweden 13-15 June 2007

2 DEPLOYMENT STATUS AT KEK JP-KEK-CRC-01 and JP-KEK-CRC-02 2007/6/13Grid Operations Workshop at KTH, Stockholm2

3 KEK External Network KEK Internal Network Logical Site Overview JP-KEK-CRC-02JP-KEK-CRC-02 KEK Firewall HPSSHPSS Central Computing System New KEK-CC Central Computing System New KEK-CC Grid LAN Scoped only for GRIDs Taiwan Taiwan Asia-Pacific region Asia-Pacific region Taiwan Taiwan Asia-Pacific region Asia-Pacific region APANAPAN Domestic institutes Domestic institutes U.S.A U.S.A Domestic institutes Domestic institutes U.S.A U.S.A SuperSINETSuperSINET Production System Production System Not for WLCG Not for WLCG Staff’s training Staff’s training Will Shift to PPS Will Shift to PPS Not for WLCG Not for WLCG Staff’s training Staff’s training Will Shift to PPS Will Shift to PPS JP-KEK-CRC-00JP-KEK-CRC-00 JP-KEK-CRC-01JP-KEK-CRC-01 Production System Production System 2007/6/133Grid Operations Workshop at KTH, Stockholm

4 KEK-1KEK-1 KEK-2KEK-2 2007/6/134Grid Operations Workshop at KTH, Stockholm Physical Site Overview

5 Brief Summary of LCG Deployment JP-KEK-CRC-01JP-KEK-CRC-01 since Nov. 2005. since Nov. 2005. is registered to GOC, is ready to WLCG is registered to GOC, is ready to WLCG is operated by KEK staffs. is operated by KEK staffs. Site Role: Site Role: – practice for production system JP-KEK-CRC-02. – test use among university groups in Japan. Resource and Component: Resource and Component: – SL-3.0.5 w/ gLite-3.0 later – CPU: 14, Storage: ~1.5TB – FTS, FTA, RB, MON, BDII, LFC, CE, SE Supported VOs: Supported VOs: – belle, apdg, g4med, ppj, dteam, ops, calice, ilc and ail JP-KEK-CRC-02JP-KEK-CRC-02 since early 2006. since early 2006. is registered to GOC, is ready to WLCG. is registered to GOC, is ready to WLCG. Site Role: Site Role: – More stable services based on KEK-1 experiences. Resource and Component: Resource and Component: – SL or SLC w/ gLite-3.0 later – CPU: 48, Storage: ~1TB (w/o HPSS) – Full components Supported VOs: Supported VOs: – belle, apdg, g4med, atlasj, ppj, ilc, calice, dteam, ops and ail 2007/6/135Grid Operations Workshop at KTH, Stockholm

6 Grid Related Services We have our own GRID CA We have our own GRID CA – is started on Feb. 2006, and is recognized by LCG. – is accredited by APGRID PMA – http://gridca.kek.jp/ http://gridca.kek.jp/ VO Membership Service VO Membership Service – Supported VOs: apdg is the VO for Asia-Pacific Data Grid. apdg is the VO for Asia-Pacific Data Grid. belle is the VO for Belle experiments. belle is the VO for Belle experiments. atlasj is the VO for Atlas experiments in Japan. atlasj is the VO for Atlas experiments in Japan. g4med is the VO for Geant4 medical application. g4med is the VO for Geant4 medical application. PPJ is the VO for the Particle Physics in Japan. PPJ is the VO for the Particle Physics in Japan. ail is the VO for Associated International Laboratory between Japan and France. ail is the VO for Associated International Laboratory between Japan and France. – http://voms.kek.jp/ http://voms.kek.jp/ Local Mirror Service Local Mirror Service – SL, SLC, LCG, gLite – It takes ~30 minutes to update by using apt-get with CERN or FNAL repositories. ~3 minutes with KEK repository ~3 minutes with KEK repository – http://hepdg.cc.kek.jp/mirror/ http://hepdg.cc.kek.jp/mirror/ Semi-automatic Installation Service Semi-automatic Installation Service – WNs can be installed semi-automatically by PXE (Preboot eXecution Environment) and kickstart configuration file. – http://hepdg.cc.kek.jp/install/ http://hepdg.cc.kek.jp/install/ Site Portal Site Portal – http://grid.kek.jp/ http://grid.kek.jp/ 2007/6/136Grid Operations Workshop at KTH, Stockholm

7 People on Grid at KEK/CRC 7 persons in total 7 persons in total CA CA – T. Sasaki and Y. Iida VOMS VOMS – Y. Watase and G. Iwai Site Operation and Security Site Operation and Security – KEK-0 G. Iwai G. Iwai – KEK-1 T. Sasaki, Y. Iida, Y. Watase and G. Iwai T. Sasaki, Y. Iida, Y. Watase and G. Iwai – KEK-2 T. Sasaki, Y. Watase, and G. Iwai T. Sasaki, Y. Watase, and G. Iwai Deployment Deployment – Y. Watase, Y. Iida and G. Iwai Documentation Documentation – Y. Watase Networking Networking – S. Suzuki, S. Yashiro and Y. Iida Application (SRB, Portal and some Gridified applications) Application (SRB, Portal and some Gridified applications) – K. Murakami, Y. Iida and G. Iwai 2007/6/13Grid Operations Workshop at KTH, Stockholm7

8 OPERATION STATISTICS 2007/6/13Grid Operations Workshop at KTH, Stockholm8

9 Submitted GGUS Tickets in JFY2006 Total number of submitted ticket: 28 Total number of submitted ticket: 28 – KEK-1: 11 – KEK-2: 17 2007/6/139Grid Operations Workshop at KTH, Stockholm

10 Number of Submitted Jobs in JFY2006 JP-KEK-CRC-01JP-KEK-CRC-01JP-KEK-CRC-02JP-KEK-CRC-02 2007/6/1310Grid Operations Workshop at KTH, Stockholm

11 Normalized CPU time in JFY2006 (kSI2K*hrs) JP-KEK-CRC-01JP-KEK-CRC-01JP-KEK-CRC-02JP-KEK-CRC-02 2007/6/1311Grid Operations Workshop at KTH, Stockholm

12 VIRTUAL ORGANIZATION Belle Experiment and Accelerator Science 2007/6/1312Grid Operations Workshop at KTH, Stockholm

13 VO for the Belle Experiment Belle VO is federated among 4 countries, 6 institutes, 9 sites. Belle VO is federated among 4 countries, 6 institutes, 9 sites. – Japan: Nagoya University and KEK – Taiwan: ASGC and NCU – Australia: University of Melborne – Poland: CYFRONET – Korea University comes up soon. Started using SRB and LCG Started using SRB and LCG Data distribution service using SRB-DSI Data distribution service using SRB-DSI – Belle already has a few PBs data in total including 100s TB DST and MC Bulk file register helps us: Sregister Bulk file register helps us: Sregister we do not move any of them we do not move any of them – It is too much difficult to export existing data to LCG physically – Benefits both for native SRB users and LCG users SRB-DSI with LCG is in operation now. SRB-DSI with LCG is in operation now. 2007/6/13Grid Operations Workshop at KTH, Stockholm13 CYFRONETPolandCYFRONETPoland KEKJapanKEKJapan Nagoya Univ. Japan Japan Melbourne Univ. Australia Australia ASGCTaiwanASGCTaiwan NCUTaiwanNCUTaiwan

14 2007/6/1314Grid Operations Workshop at KTH, Stockholm

15 Hiroshima IT VO for the Accelerator Science Domestic supports Domestic supports – Typical case at laboratory: A few staffs, ~10 students and no technician. Start to monitor them centrally over the VO Start to monitor them centrally over the VO – Focused on their operation supports – Not only for WLCG sites but also for NON-WLCG sites – PPJ VO is started for the accelerator science in Japan. – Federated among a few universities. Tohoku Univ., Tsukuba Univ., Kobe Univ., Hiroshima Univ., Nagoya Univ. and KEK. – Usage: To share resources and experiences among major groups, ILC, KamLand, CDF and ATLAS without depending on experimental projects. 2007/6/13Grid Operations Workshop at KTH, Stockholm15

16 Conclusion Tools used in daily grid operations Tools used in daily grid operations – Semi –automatic installation tools only for WNs Most of tools are handmade scripts Most of tools are handmade scripts – Monitoring tools, e.g.; SAM and GSTAT are very useful. – GGUS Search and APWIKI are also. – We are testing to audit by using nCircle, vulnerability management system. Scheduled Interventions Scheduled Interventions – 11 times in JFY2006 – Due to Software/hardware upgrade and site reconfiguration Software/hardware upgrade and site reconfiguration Annual maintenance Annual maintenance Replacement of host certificate Replacement of host certificate Unscheduled interventions Unscheduled interventions – ~10 times/year – Ex) Failed to reconfigure the site, or power cut by thunder. Domestic supports in Japan Domestic supports in Japan – Important mission for KEK. ~90% of problems are detected by the COD, SAM, GSTAT and nagios. ~90% of problems are detected by the COD, SAM, GSTAT and nagios. – Our operation on Grid is supported by great efforts by APROC members in ASGC, Taiwan. – We’d like to keep the tighter collaboration with ASGC. 2007/6/1316Grid Operations Workshop at KTH, Stockholm

17 END Thank you 2007/6/1317Grid Operations Workshop at KTH, Stockholm

18 2007/6/13Grid Operations Workshop at KTH, Stockholm18 KEK-CCKEK-CC Grid LAN B-NETB-NET KEK-FBKEK-FB KEK-2202.13.197.0/24KEK-2202.13.197.0/24 New built 130.87.224.0/21SRB/MCAT172.22.28.0/24130.87.224.0/21SRB/MCAT172.22.28.0/24 KEK-1130.87.208.0/22KEK-1130.87.208.0/22 KEK-DMZKEK-DMZ KEK Firewall GridFTP130.87.104.0/22GridFTP130.87.104.0/22 HSMHSM NFS SRB GridFTP SRB-DSISRB-DSI Pluggable Extension APANAPAN SuperSINETSuperSINET LCG with SRB at Belle VO

19 Points to Cover in Each Presentation tools used in daily grid operations tools used in daily grid operations what features are missing to make your work easier what features are missing to make your work easier examples of the most frequent scheduled interventions at your site examples of the most frequent scheduled interventions at your site examples of the most frequent unscheduled interventions at your site examples of the most frequent unscheduled interventions at your site points to improve in communication with ROC, other sites, Vos, rest of the world... points to improve in communication with ROC, other sites, Vos, rest of the world... How do you plan deployment of updates/new versions so continuous production is not interrupted? How do you plan deployment of updates/new versions so continuous production is not interrupted? Communication with users: how are you informed about operational problems at your site reported by local/remote users? Mail/GGUS/phone/other? Communication with users: how are you informed about operational problems at your site reported by local/remote users? Mail/GGUS/phone/other? Correlation of cross-site issues: is the operations meeting enough for this? How do you do it otherwise? Correlation of cross-site issues: is the operations meeting enough for this? How do you do it otherwise? What percentage of real site problems are detected and reported by the COD before you know about them? What percentage of real site problems are detected and reported by the COD before you know about them? usefulness of the following operations bodies/meetings and suggestions to improve them: usefulness of the following operations bodies/meetings and suggestions to improve them: – COD – your ROC support team – operations meeting 2007/6/1319Grid Operations Workshop at KTH, Stockholm


Download ppt "Site Report from KEK, Japan JP-KEK-CRC-01 and JP-KEK-CRC-02 Go Iwai, KEK/CRC Grid Operations Workshop – 2007 Kungliga Tekniska högskolan, Stockholm, Sweden."

Similar presentations


Ads by Google