Presentation is loading. Please wait.

Presentation is loading. Please wait.

Building Virtual Scientific Computing Environment with Openstack Yaodong Cheng, CC-IHEP, CAS ISGC 2015.

Similar presentations


Presentation on theme: "Building Virtual Scientific Computing Environment with Openstack Yaodong Cheng, CC-IHEP, CAS ISGC 2015."— Presentation transcript:

1 Building Virtual Scientific Computing Environment with Openstack Yaodong Cheng, CC-IHEP, CAS chyd@ihep.ac.cn ISGC 2015

2 2/25 International Symposium on Grids and Clouds (ISGC) 2015 Contents  Requirements of scientific computing  IHEP cloud platform  Virtual machine types  Virtual computing cluster  Dirac distributed computing  Conclusion

3 3/25 International Symposium on Grids and Clouds (ISGC) 2015 Large science facilities  IHEP: The largest fundamental research center in China  IHEP serves as the backbone of China’s large science facilities  Beijing Electron Positron Collider BEPCII/BESIII  Yangbajing Cosmic Ray Observatory: ASg & ARGO  Daya Bay Neutrino Experiment  China Spallation Neutron Source (CSNS)  Hard X-ray Modulation Telescope(HXMT)  Accelerator-driven Sub-critical System (ADS)  Jiangmen Neutrino Underground Observatory (JUNO)  Under planning: BAPS, LHAASO, XTP, HERD, …

4 4/25 International Symposium on Grids and Clouds (ISGC) 2015 BEPCII/BESIII  36 Institutions from China, US, Germany, Russian, Japan,…  > 5PB in next 5 years  ~ 5000 CPU cores  simulation, reconstruction, analysis, …  long-term data preservation  data sharing between partners

5 5/25 International Symposium on Grids and Clouds (ISGC) 2015 Other experiments  Daya Bay Neutrino Experiment  ~200TB per year  JUNO: Jiangmen Neutrino Experiment  ~500TB per year  LHAASO  2PB per year after 2017,  accumulate 20PB+ in 10 years  Atlas and CMS Tier2 site  940TB disk, 1088 CPU cores  CSNS, HXMT, … 5PB data one year!!

6 6/25 International Symposium on Grids and Clouds (ISGC) 2015 Computing resources status  ~ 12000 CPU cores  ~ 50 queues, managed by Torque/PBS  difficult to share  ~ 5PB disk  Lustre, Glustre, dCache/DPM, …  ~ 5PB LTO4 tape  two IBM 3584 tape libraries  modified CERN CASTOR 1.7 PC farm built with blades Tape libraries

7 7/25 International Symposium on Grids and Clouds (ISGC) 2015 In the future, …  More HEP experiments, need to manage twice or more servers as today  but, no possibility of significant increase in staff numbers  Is cloud a good solution ?  Is cloud suitable for Scientific Computing?  Time to change IT strategy!!

8 8/25 International Symposium on Grids and Clouds (ISGC) 2015 What is Cloud?  NIST: Best Definitions  Essential characteristics  On-demand self-service  Broad network access  Resource pooling  Rapid elasticity  Measured service  Service models  IaaS, PaaS, SaaS  Deployment models  Public, private, hybrid http://csrc.nist.gov/publ ications/nistpubs/800- 145/SP800-145.pdf Is cloud beneficial to scientific computing ?

9 9/25 International Symposium on Grids and Clouds (ISGC) 2015 Easy to Maintain  Hardware: services become independent of underlying physical machine  Cloud services: single set of services for managing access to computing resources  Scientific platforms: become separate layer deployed, controlled and managed by domain experts

10 10/25 International Symposium on Grids and Clouds (ISGC) 2015 Customized Environment  Operating systems suited to your application  Your applications preinstalled and preconfigured  CPU, memory, and swap sized for your needs

11 11/25 International Symposium on Grids and Clouds (ISGC) 2015 Dynamic Provisioning  New storage and compute resources in minutes (or less)  Resources freed just as quickly to facilitate sharing  Create temporary platforms for variable workloads

12 12/25 International Symposium on Grids and Clouds (ISGC) 2015 IHEPCloud: a Private IaaS platform  Launched in May 2014  Three use scenario  User self-Service virtual machine platform  User register and destroy VM on-demand  Virtual Computing Cluster  Job will be allocated to virtual queue automatically when physical queue is busy  Distributed computing system  Work as a cloud site: Dirac call cloud interface to start or stop virtual work nodes http://cloud.ihep.ac.cn

13 13/25 International Symposium on Grids and Clouds (ISGC) 2015 IHEPCloud services  Who can use?  any user who has IHEP email account  How many resources for user?  By default, each user has 3 CPU cores and 15GB memory  VM types  testing machine  full root privilege, no public storage  UI node  AFS authentication, No root privilege, public storage  No some limitations like memory, CPU time, process  OS types  SL 55, SL 58, SL 65, SL 7 64 bits, SL 65 32 bits, Win7 64 bits  add new types depends on user requirement  VM IP address  internal IP address (192.168.*.*) is allocated automatically  foreign IP address (202.122.35.*) need the approval of administrator

14 14/25 International Symposium on Grids and Clouds (ISGC) 2015 Why does end user need IHEPCloud?  Virtual testing machine  Develop program or do some testing  generates VM in a few minutes  Login VM via ssh/VNC, remote desktop  Virtual UI node  debug program in computing environment  login node: lxslcxx.ihep.ac.cn  Limitations: Memory, CPU time, user processes, … cputime > 45m && %CPU > 60% KILL it!  Affected by other users  VMs: owned only by one user; no limitations

15 15/25 International Symposium on Grids and Clouds (ISGC) 2015 Virtual computing cluster  If a job queue is busy, the jobs can be allocated to a virtual queue  plan to run the service this year junoq: 128 CPU cores Submit job Cloud Scheduler Check Queue load IHEPCloud Start/stop VM Virtual queue Forward job Detailed: see haibo’s talk

16 16/25 International Symposium on Grids and Clouds (ISGC) 2015 Distributed computing  Distributed computing has integrated cloud resources based on pilot schema, implementing dynamic scheduling  Cloud resources used can be shrunk and extended dynamically according to job requirements VM1, VM2, … Cloud Distributed Computing Job1, Job2, Job3… Cloud Distributed Computing No Jobs User Job Submission Create Get Job VM Cloud Distributed Computing No Jobs Job Finished Delete

17 17/25 International Symposium on Grids and Clouds (ISGC) 2015 Cloud sites  5 cloud sites from Torino, JINR, CERN and IHEP have been set up and connected to distributed computing system  About 320 CPU cores, 400GB Memory, 10TB disk

18 18/25 International Symposium on Grids and Clouds (ISGC) 2015 Cloud tests  More than 4500 jobs have been done with 96% success rate  Failure reason is lack of disk space  Disk space will be extended in IHEP cloud

19 19/25 International Symposium on Grids and Clouds (ISGC) 2015 Performance and Physics validation  Performance tests has shown that running time in the cloud sites are comparable with other production sites  Simulation, Reconstruction, Download random trigger data  Physics validation has proved that physics results are highly consistent between clusters and cloud sites

20 20/25 International Symposium on Grids and Clouds (ISGC) 2015 OpenStack Dashboard CEPH NetworkDB Dirac Virtual Cluster API LDAP UMT (IHEP EMAIL ) API interactive Push info. Get info. Storage path DNS API Backend storage Configuration management Register Puppet Get VM info. Register DNS Register Nagios Log Analysis Host Monitor Service monitor authentication UMT (CAS CLOUD) Interoperation Architecture

21 21/25 International Symposium on Grids and Clouds (ISGC) 2015 IHEPCloud components  Core middleware: openstack  open source cloud management system  most popular  Configuration management tool: Puppet  create VM image  manage applications in VM  keep the consistency of VM and computing environment  Authentication  IHEP Email account and password  AFS authentication for UI node  Network management  centric NetworkDB record MAC, IP, hostname, user, …  each VM has a hostname, *.v.ihep.ac.cn  network traffic accounting  External storage  Currently, images and instances are stored in local disk  evaluating CEPH to support GLANCE, NOVA and Cinder

22 22/25 International Symposium on Grids and Clouds (ISGC) 2015 Network in IHEPCloud  multiple IP subnets on one physical machine  Vlan mode (Just L2, no router) in Openstack neutron  IP gateway and 802.1Q in hardware switch  Problem: trunck  Big mac table  Pre-config  Risk of Broadcast storm  Future network  Core layer: Vxlan(hardware-based)  Access layer: Openstack vlan mode

23 23/25 International Symposium on Grids and Clouds (ISGC) 2015 IHEPCloud Current status  Released in 18 November, 2014  Built on openstack icehouse  1 control node  7 computing nodes  112 physical CPU cores / 224 Virtual CPU cores, 896GB memory totally  Active 96 VMs, 172 CPU cores, 628GB memory by 11 March Control Node Computing Node VM

24 24/25 International Symposium on Grids and Clouds (ISGC) 2015 Conclusion  cloud computing is widely accepted by industrial and scientific domain  Scientific computing are preparing the move to cloud  IHEPCloud aims at providing self-service virtual machine platform for IHEP user  IHEPCloud also supports virtual cluster and distributed computing  One small Cloud platform has been built and open to IHEP user freely  More resources (1000+) will be added to IHEPCloud this year  Investigate shibboleth to build federated cloud

25 Thank you! Any Questions?


Download ppt "Building Virtual Scientific Computing Environment with Openstack Yaodong Cheng, CC-IHEP, CAS ISGC 2015."

Similar presentations


Ads by Google