Presentation is loading. Please wait.

Presentation is loading. Please wait.

Simon C. Lin and Hsin-Yen Chen Academia Sinica Grid Computing Centre (ASGC) Taiwan ATCF, KISTI 23 Sep. 2015.

Similar presentations


Presentation on theme: "Simon C. Lin and Hsin-Yen Chen Academia Sinica Grid Computing Centre (ASGC) Taiwan ATCF, KISTI 23 Sep. 2015."— Presentation transcript:

1 Simon C. Lin and Hsin-Yen Chen Academia Sinica Grid Computing Centre (ASGC) Taiwan ATCF, KISTI 23 Sep. 2015

2 Distributed Computing is Critical to Research Challenges in 21 st Century Vision Buildup capacity to manage, store, and analyze very large, heterogeneous and complex datasets Provide a secure & scalable research infrastructure to share data, tools and resources to multidisciplinary research communities Drive IT innovations Building the Internet-wide production distributed computing infrastructure with the world: ASGC is the WLCG and e-Science Asia Centre since 2005

3 Strategy and Approach e-Infrastructure Development Maximize availability, performance, and operation automation while minimize operation cost Intelligent operation aided by intelligent monitoring Support HTC & HPC by Service Grid, Desktop Grid and Cloud Integrated the new technologies to e-Infrastructure evolving: distributed cloud over Grid e-Science Development Target at life science, earth science, climate change, social sciences and HEP Expedite the innovation and international collaboration by DCI Capture the requirements and improve the general/reusable DCI components Regional and International Collaboration are the necessary drivers for the e-Science communities Aggregate resources, creativity and momentum to achieve advanced knowledge International collaboration: WLCG,EGI, EMI, CHAIN, WeNMR, APAN, TEIN Dissemination and Outreach for the communities ISGC

4 Global Science Achievement Accelerator, Detector & Grid

5 5 ~10% data is processed in Taiwan 5 16 Gb/s inbound performance reached in ASGCNet 20Gb link to Europe ATALS & CMS Support at ASGC: DCI Development, Operation & Extension CPU: ATLAS(65%), Climate C (22%), CMS(6%), Earth(4%), AMS(2%) Accumulated CPU Time (08.2010 – 05.2015)

6 Distributed Computing Infrastructure (HW, SW, Networking) Earthquake and Tsunami Early Warning Drug Discovery Application Portal Weather/ Climate Changes Higgs Search

7 Regional e-Science Collaborations Started from Big Sciences, the new distributed infrastructure, and human network across countries Taking advantages of Global collaborations – Middleware, User Communities and Applications, Operation Technology, etc. Saving lives by e-Science: Natural Disaster Mitigation (including earth, climate, neglected diseases, etc.) is the common focal point Towards Big Data Analysis More countries start to deploy the production applications (Drug Discovery, Earthquake & Tsunami simulation, weather simulation, Climate Changes, etc.) and develop new features according to user requirements. Vision is to share data, infrastructure, tools, analytics, human resources, etc.

8 Asia Pacific Regional Operation Centre (APROC) Extending the infrastructure and maximize the availability from 2005: Support 38 sites in 16 countries to join the World Wide Grid and e-Science collaborations Australia, China, Hong Kong, India, Indonesian, Iran, Japan, Korea, Malaysia, New Zealand, Pakistan, Philippine, Singapore, Thailand, Vietnam, and Taiwan Training, Workshop and Internship Program (from 2003) Host International Symposium on Grid & Cloud (ISGC) annually Coordinate 65 events in 9 countries (IN, KR, MN, MY, PH, SG, VN, TH and TW) Internship: 10 persons from DE, IN, JP, KR, MY and PK

9 ASGC on WLCG Technology 9 AreasActivities Distributed ComputingDiane, Ganga, Panda, JEDI, CVMFS, Ceph Distributed Data ManagementDPM, SRM-SRB/iRODS, Rucio, EOS Information System MonitoringGStat Experiment ComputingATLAS, CMS, AMS Cloud Core TechnologyVMIC, Distributed Cloud, OpenStack, Cloud Accounting, Container, CERNBox NetworkingLHCOPN, LHCONE, SDN Regional SupportAPROC, APGridPMA Data CenterIntelligent & energy saving Center, system efficiency High Level CoordinationGDB, MB, CB, C/RRB 9

10

11 Training, Workshop and GridCamp Events Grid TechnologyE-Science Applications Others 2015TW(CVMFS, OpenStack, Security)TW(Disaster Mitigation)ECAI, eLearning 2014TW(dCache, Security)TW(WeNMR)ECAI, eLearning 2013TW(dCache, DPM, Security, EMI, GridCamp)TH(gWRF), TW(WeNMR)TW(CHAIN-REDS) 2012TW(iRODS, VC, FIM, Cloud)TW(WeNMR)TW(CHAIN) 2011TW(iRODS, IDGF, IGTF)TW(NDM, Life Sci), MNTW(OGF31) 2010TW(gLite, VC, Security, iRODS)TW(Social Simulation)TW(EUAsiaGrid) 2009VN(Grid), TW(Grid, iRODS, Security, GridCamp) MY(e-Science)TW(EUAsiaGrid) 2008KR, PH, TW(EGEE, iRODS)TW(WLCG)TW(EUAsiaGrid) 2007VN(Grid), SG(Grid), MY(Grid), TW(EGEE, GridCamp) 2006TW(EGEE)IN(WLCG) 2005TW(Grid), TW(Grid @ 2 univ.)TW(WLCG SC) 2004TW(Grid), TW(Grid @ univ.) 2003TW(Grid)TW(BioGrid)

12 ASGC Computing Center Total Capacity 2MW, 400 tons AHUs 93 racks ~ 800 m 2 Resources 20,000 CPU Cores 12.5 PB Disk 4 PB Tape Rack Space Usage (Racks) AS e-Science: 54.1 (58.4%) ASCC: 8.4 (9.0%) IPAS: 6.0 (6.3%) Free: 24.5 (26.3%) Monitoring the power consumption and temperature of every piece of equipment every 10 seconds. 12 Cooling Power : CPU Power Summer 1 : 1.4 Winter 1 : 2

13 ASGC Resources (Jan. 2015)13 Resource Groups CPU (#Cores) Disk (TB) Tape (TB) Inter- connection User Groups World Wide Grid 5,5086,8004,000 10GbE Storage Server WLCG, EUAsiaGrid, EGI, e-Science HPC10,2123,3560 10GbE+IB (DDR, QDR): 5120 HPC-10G: 1956 NUWA: 1984 TCCIP: 1152 HPC, ES, EC, Physics, LS Cloud & Elastic System 4,0763,8000 10GbE + 10GbE Storage Cloud and Elastic Resources: AMS

14 Automatic Deployment NCU NDHU CERN Asian Sites and Commercial Clouds Persistent Replication of Software Environment everywhere within hours and with less efforts Software Distribution as a Global Service VM images, Executable, Library, Database Configuration Management

15 Castor Reached 15 Single server almost reached the limit of 8Gb FC (800 MB/s) ~ 770 MB/s 15

16 Storage System Performance Single storage server reached the 10Gb network bandwidth limit. Aggregated system performance achieved 6 + GB/s throughput. ATLAS+CMS > 6 GB/s read 1 GB/s read 6.0G 1.0G

17 System Efficiency Increasing both performance and reliability – Auto tuning, anomaly detection and failure process: based on identified key metrics and monitoring system – Throughput analysis from daily operations and understanding bottlenecks Growing the system intelligence based on system efficiency optimization and auto-control Data Center: power, thermal Scientific computing and HPC – Networking – Computing – Storage Resource Usage & Throughput Failure Rate Maximize Memory Usage Kernel Parameter Tuning Cache Strategy by Access Patterns Make Best Use of Resources …

18 Monitoring & Control Anomaly Detection, Failure Job analysis, Usage Pattern Exploration Supporting Efficiency Optimization By Distributed Analytics Engine for Everyday Large Logs Failure Job Analysis Resource Usage Analysis by user or status over time

19 Asset Management Query by IP Range Query by HW Spec Query by Rack Location Query by Cluster Change Managemen Configuration Management Visualization … Asset Management System Core Information System supporting rapid asset status query, control and dynamic resource arrangement

20 ASGCNet Infrastructure

21

22 ESnet USA MAN LAN (New York) BNL-T1 Internet2 USA Harvard CANARIE Canada UVic SimFraU TRIUMF-T1 SCINET (UTor) McGill PNWG (Seattle) ASGC Taiwan ASGC-T1 KREONET2 Korea KNU DFN Germany DESY RWTH DE-KIT-T1 GARR Italy CNAF-T1 RedIRIS Spain RENATER France IN2P3 (10 sites) WIX (Washington) CC-IN2P3-T1 CEA (IRFU) SLAC GLakes NE SoW Geneva FNAL-T1 UFlorida PurU UWisc NetherLight (Amsterdam) CENIC USA ASGC2 Wup.U Pacificwave (Los Angeles) Vanderbilt MIT AGLT2 UM AGLT2 MSU Indiana GigaPoP UNL GSI GLORIAD (global) KIAE/ Kurchatov T1 ICEPP U Tokyo NCU NTU SINET Japan PNU KCMS KISTI –T1 NKN India CERN TIFR CSTNet/CERNet2 China IHEP-ATLAS IHEP-CMS GÉANT UCSD JANET UK KEK T1 PNNL-T1 TEIN (proposed) INFN (7 sites) PIC-T1 LHCONE VRF domain UChi Chicago Communication links: 1, 10, 20/30/40, and 100Gb/s Regional R&E communication nexus or link/VLAN provider LHCONE VRF aggregator network Sites that manage their own LHCONE routing See http://lhcone.net for details.http://lhcone.net UNL 28 May 2015 – WEJohnston, ESnet, wej@es.net PNU Belle II Tier 1/2 CUDI Mexico UNAM yellow outline indicates LHC+Belle II site } KEK IC LHC Tier 1/2/3 ALTAS and CMS LHC ALICE or LHCb KIT INFN Napoli INFN Pisa ARNES Solvenia SiGNET LHCONE: A global infrastructure for the High Energy Physics (LHC and Belle II) data management CESNET Czech praguelcg2 NORDUnet Nordic NDGF-T1 NDGF-T1b NDGF-T1c RoEduNet Romania NIHAM NIPNE x3 ISS ITIM UAIC CERN India Korea CERN PacWave (Sunnyvale) CIC OmniPoP (Chicago) Starlight (Chicago) MREN (Chicago) GÉANT Europe Caltech UChi (MWT2) UIUC (MWT2) IU (MWT2) NL-T1 SURFsara Nikhef Netherlands CERN (CERNLight) Geneva CERN-T1 KISTI Korea TIFR India AMPATH (Miami) RNP/ANSP Brazil CBPF SAMPA (USP) HEPGrid (UERJ) SPRACE PSNC Poland PSNC

23 23 Asia R&E Network backbone 23

24 24

25 LHC Network Challenges in Asia Routing Complexity BGP peering can be realized among NRENs, if agreed bilaterally Network Performance TCP Throughput <= TCPWinSize/RTT Asian TierXs must tune server and client TCP kernel parameters to get better throughput LHCONE L3VPN could help resolving the application traffic

26 TAIWAN Global R&E Network 10G Chicago TAIWAN Hong Kong Amsterdam Geneva 2.5G Tokyo 10G 2.5G LA Palo Alto New York 2.5G 10G 2.5G TANet TWAREN ASNet ASGCNet 2.5G 5G 2.5G 622M 2.5G

27 ASGC e-Science Global Network SINET JP TWAREN/ TANET KREONET2 20G 2.5G 10G ASGC TW HK US CERNET CN 20G 2.5G GEANT Internet2 CERN 20G NREN EU AMS-IX NORDUnet ESNet LHCONE GE GE*2 10G GE*2 LHC OPN TEIN MYREN MY AARNet AU EARNet IN CSTNET CN ASTI PH INHEREN ID LEARN LX NREN NP PERN PK PREGINet PH SingaRE N SG UNINe tTH VinaRen VN NISN (NASA) GE 10G SURFnet CESnet STARLight GE*2 HKIX 10G ASNet HARNet HK GE CA*Net GE NL APAN- JP 2.5G

28 Upgrade Link: Recolocate and DWDM LAN-PHY interface : LHCONE 10G TWAREN v ASCC 10G ASGC Data Center 10Gx2 StarLight FNAL Internet2 ESNet GEANT Chicago HongKong 10G LHCOPN CERN Amsterdam 10G AMS-IX SURFnet CESnet CZ NODUnet 1G TaipeiTEIN 10G Force1 0 TANET JP 10G CERN Light 10G HKIX 10G CERNET 1G CSTNET KR Palo Alto New York BNL LBL SLAC 10G ASGCNet International Network 10G

29 LHCONE networking in Asia ASGC could provide the 20Gb global backbone (TW-US-EU) for Asian HEP communities ASGC has an open policy to support the networking for all LHC experiments (ATLAS, CMS, Alice, LHCb) in Asia ASGC has good connectivity with APAN (inc. KR, JP, CN, etc.) and TEIN (inc. IN, TH, SG, PK etc.) L3VPN Asia hub on HK

30 LHCONE VRF on ASGC LHCONE VRF between GEANT and ASGC at AMS Implement the connect the CERN LHCONE VRF at AMS Implement the LHCONE VRF connecting the Internet2 and Esnet at CHI Implement the LHCONE VRF connecting Asia Tier-Xs at HK

31 LHCONE Asia Workshops Kick-off Workshop: Co-locating with the APAN 38 th Meeting (11~15 August 2014, Nantou, Taiwan) Advantage: All Asia Pacific NRENs been here LHCONE Asia Workshop: Co-locating with the annual ISGC (International Symposium on Grids & Clouds) in March Advantage: (1) Easier for Asian NRENs to get together (2) User community, Technology Development & Service Provider ISGC 2015: 13 ~ 18 Mar. 2015, Academia Sinica, Taipei

32 Thanks! 감사합니다 謝謝


Download ppt "Simon C. Lin and Hsin-Yen Chen Academia Sinica Grid Computing Centre (ASGC) Taiwan ATCF, KISTI 23 Sep. 2015."

Similar presentations


Ads by Google