Presentation is loading. Please wait.

Presentation is loading. Please wait.

Status Report on Tier-1 in Korea Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC) April 28, 2014 at 15th CERN-Korea Committee, Geneva Korea Institute.

Similar presentations


Presentation on theme: "Status Report on Tier-1 in Korea Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC) April 28, 2014 at 15th CERN-Korea Committee, Geneva Korea Institute."— Presentation transcript:

1 Status Report on Tier-1 in Korea Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC) April 28, 2014 at 15th CERN-Korea Committee, Geneva Korea Institute of Science and Technology Information Global Science experiment Data hub Center

2 OUTLINE 2  Computing Resources  Operations  Network  Conclusion 28 April 201415th CERN-Korea Committee

3 KISTI GSDC Tier-1 Team 3 ROLEName Representative Haeng-Jin Jang System Management Hee-Jun Yoon System Administration Jeong-Heon Kim Storage (Disk & Tape) Hee-Jun Yoon Sang-Oh Park Network Hyoung-Woo Park KISTI support (Dr. Bu-Seung Cho) Site Operation & Administration Il-Yeon Yeo Sang-Un Ahn KIAF Operation & User Support Sang-Un Ahn ~ 9 people 28 April 201415th CERN-Korea Committee

4 Computing Resource Status 4  2013 Pledges (CPU): HepSpec06 25,000  Current HepSpec06: 28,055  2,524 Jobs slots available (4 reserved slots for pilot jobs) with H/T enabled  2013 Pledges (Tape Storage): Tape 1,500 TB  Current Tape capacity: 1,000 TB  Pledges will be met in this year  2013 Pledges (Disk Storage): Disk 1,000 TB  Current Disk capacity: 966 TB (allocated 1,000 TB but usable space slightly below) 28 April 201415th CERN-Korea Committee

5 OPERATIONS 5

6 Total wall clock hours for ALICE jobs in the last 6 months KISTI, 3.9 % (Including Tier-2 ) Jobs Oct 2013 T1 worker nodes migration to 10GbE equipped ones ALICE Central Service Maintenance EMI-3 Migration & Delivery of full pledges ~ 800 ~ 1800 ~ 2500 Apr 2014 6 Current capacity: 2,524 job slots, 28.1 kHS06 –84 nodes, 32 (logical) cores per node, 11 HS06/core Maintenance issues –Worker nodes migration to 10GbE equipped ones –Middleware: EMI-3 migration (end of support to EMI-2 by 30 April) –Delivered full pledges for 2013 3.58% (2013)

7 Site Reliability 7 28 April 201415th CERN-Korea Committee

8 KISTI Analysis Facility - KIAF Parallel Analysis Facility based on PROOF In operation since 2011, ALICE use only 1 master, 8 worker nodes, 12 cores and 22 TB disk per node Similar size and utilization as CAF - CERN Analysis Facility 8 28 April 201415th CERN-Korea Committee

9 Plans for On-call Service Alarm system – Nagios + e-mail notifications – Implementing SMS plugin + Night Owl shift by private company – Tape system - hardware/software malfunction reported to IBM and third-party company – 24/7 support, intervention to be carried out within one day – Ongoing evaluation of monitoring frameworks: e.g. Icinga, Zabbix, etc. On-call scheme – One week shift cycle with 5-6 personnel – Expecting 1 or 2 calls in a cycle - alarms from batch scheduler and services, WN servicing – From daily monitoring report – detailed action list on services and hardware incidents Night owl shift – Private company contract – on-site support – If necessary - SMS and e-mail notification to off-site on-duty experts – Supercomputing division at KISTI is running similar system for years We are planning to prepare for On-call Service. Maybe it has 3 functions of service. 28 April 201415th CERN-Korea Committee

10 NETWORK 10

11 Internal Network Internal network for Tier-1 is isolated from the computing centre service network Done in Oct 2013 - internal network re-structuring (3-week shutdown) Preparation for upgrade of bandwidth of external network up to 10Gbps Main switch upgrade: bandwidth up to 2.5 Tbps HA configuration of private network Remove bottlenecks to storage Full 20 Gbps configuration (Incoming/Outgoing) Replaced all switches by 10 Gbps; done on part of service racks 1Gbps switches in place for servers with 1Gbps cards Worker nodes to be upgraded with10 Gb cards Tape service nodes are being connected to the 10 Gbps switches 11

12 External Network Current Bandwidth to CERN: 2 Gbps Dedicated link via Daejeon-Chicago-Amsterdam-Geneva Roadmap for 10 Gbps upgrade presented to WLCG MB and accepted Working on upgrading bandwidth up to 10 Gbps 12

13 LHC OPN KISTI T1 network (134.75.125.0/24) included into LHC OPN BGP Peering between Kreonet router @ KISTI and LCG network @ CERN perfSONAR has been deployed for measuring bandwidth and latency; firewall policy issue persists concerning the ports below 1024 e.g. 80 (http), 443 (https), 843 (bwctl) 13

14 Conclusion KISTI T1 has been approved as a full T1 at the meeting of WLCG Overview Board in Nov. 2013 The progress of ramping up the capability as a T1 appreciated by ALICE community and a roadmap to 10G network accepted In Jan, KISTI T1 joined LHC OPN Over the last 6 months, KISTI T1 has been in “shape-shifting” in terms of network Core switches replaced (bandwidth: 0.9 Tbps  2.5 Tbps) Rack switches replaced (bandwidth: 1 Gbps  10 Gbps) Servers migrated to 10GbE equipped ones 14


Download ppt "Status Report on Tier-1 in Korea Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC) April 28, 2014 at 15th CERN-Korea Committee, Geneva Korea Institute."

Similar presentations


Ads by Google