Download presentation
Presentation is loading. Please wait.
Published byVincent Black Modified over 6 years ago
1
Cloud Updates at IN2P3-CC for the Atlas VO FR-cloud Regional Centers Meeting
Vamvakopoulos Emmanouil University of Tokyo 10 Dec 2014
2
Existing Openstack clusters
Deployment : Scientific Linux 6 Griddynamics, then EPEL and now RDO Configured with Puppet Core services, hosting cluster : DELL C6100 and R610 compute nodes (400 cores for test/dev VMs, 100 cores for infrastructure services) DELL PE R720xd (30TB Cinder volumes ~ Amazon EBS) DELL PE R720xd and GPFS for instances shared storage Computing cluster : 128 DELL M610 (compute nodes, recycled batch WNs) Capacity : ~1000 VMs (1 core, 20GB disk, 6GB RAM) FR-cloud Regional Centers Meeting 10/12/2014
3
Openstack Context diagram
FR-cloud Regional Centers Meeting 10/12/2014
4
Implementing Openstack HA
Keepalived → frontend high availability failover (using IPVS/VRRP) Haproxy → load balancing amongst redundant services - APIs services (keystone, glance, nova, cinder, neutron...) - schedulers - glance registry DBMS (Galera cluster) HA Frontend (Keepalived + Haproxy) Virtual Public IP cckeystone.in2p3.fr ccnovaapi.in2p3.fr cccinderapi.in2p3.fr Openstack controler services Openstack clients Failover Openstack controler services Active MQ (RabbitMQ cluster) HA Frontend (Keepalived + Haproxy) → No SPOF → Load balanced workload - APIs services (keystone, glance, nova, cinder, neutron...) - schedulers - glance registry FR-cloud Regional Centers Meeting 10/12/2014
5
and Interfaced those VMs with the Grid System
VO needs and objective Atlas would like to use oportunistic Cloud Resources Case for Simulation (Single and MultiCore jobs) Deploy Virtual Machines as «Worker Nodes» on a Cloud IAAS (infrastructure as a Service) and Interfaced those VMs with the Grid System Computing on the grid and in the cloud chaired by Laurence Field (CERN) Tuesday, 22 July 2014 from 16:00 to 17:30 (Europe/Zurich) at CERN ( IT Amphitheatre ) Computing on the grid and in the cloud FR-cloud Regional Centers Meeting 10/12/2014
6
AtlasCloud news and Updates
AtlasCloudOperations Group Deploy the available, established solutions in order to Bridge the Grid Computing to Cloud Computing Ensure Cloud Resource smooth operation FR-cloud Regional Centers Meeting 10/12/2014
7
Review : Context Diagram
BNL Elastic Cluster BNL/Amazon AWS Pilot FR-cloud Regional Centers Meeting 10/12/2014
8
AtlasCloud Operation Recommendations
Cloud operations Use of CernVM (3.X) Standard Cloud-init contexualization templates Publish to Federated Ganglia Monitor Tool ( Central AutoListener: «CloudScheduler» Not 100% madantory FR-cloud Regional Centers Meeting 10/12/2014
9
Atlas Ganglia Monitoring
FR-cloud Regional Centers Meeting 10/12/2014
10
WLCG Cloud Usage Resource dashboard
Mixted Sources (VO-WMS/EGI Apel/Ganglia monitoing) FR-cloud Regional Centers Meeting 10/12/2014
11
Cloud Resources Intergation at IN2P3-CC
IN2P3-CC-T3 --> ATLAS SITE (Agis) DDM-ENDPOINT-LESS SITE ( use of the T1's storage (dCache) Panda queue IN2P3-CC-T3-VM01 (on- line) ANALYSIS_IN2P3-CC-T3-VM01 (off- line) Seperate view on Historical View or SSB DashBoard FR-cloud Regional Centers Meeting 10/12/2014
12
BNL Elastic Cluster Solution: APF + condorEC2
HC test Framework:User Condor Cluster Panda – VO WMS # running jobs at Condor Cluster according to the # of Activated jobs # of Activated Jobs : Ready to dispatch Submit Pilots Submit VMs APF+CondorEc2 Running Jobs FR-cloud Regional Centers Meeting 10/12/2014
13
FR-cloud Regional Centers Meeting
Benchmark : Atlas MC Intel/Xeon 16 CPU (hypertheading) 48GB RAM 160GB scratch Atlas MC JOBs VMs on Real WNs CERN-P1 StressTest - mc12 AtlasG4_trf (512) AtlasProduction/17.2.2 mc12_8TeV Herwigpp_pMSSM_DStau_MSL_120_M1_000.evgen.EVNT.e1707_tid _00_derHCBM Overheads (wallclock) ~ 30 % *16-VCPU/32GB-RAM/160GB-Scratch 16 corrurent job On both partition For ~7 circles FR-cloud Regional Centers Meeting 10/12/2014
14
FR-cloud Regional Centers Meeting
Benchmark : HepSpec06 Integer Benchmarks 471.omnetpp C++ Discrete Event Simulation 473.astar C++ Path-finding Algorithms 483.xalancbmk C++ XML Processing Floating Point Benchmarks 444.namd C++ Biology / Molecular Dynamics 447.dealII C++ Finite Element Analysis 450.soplex C++ Linear Programming, Optimization 453.povray C++ Image Ray-tracing Total overheads (on HepSpec06 mark) ~17% The individual overheads (WallClocks) per HepSpec06 component exhibit large dispersion Floatpoint vs integer Application ? FR-cloud Regional Centers Meeting 10/12/2014
15
Memory performance and Numa Topology
Intel® Xeon® Processor E5540 (8M Cache, 2.53 GHz, 5.86 GT/s Intel® QPI) « Non-uniform memory access (NUMA) is a computer memory design used in Multiproccesing, where the memory access time depends on the memory location relative to the processor. » Demostration of Non-uniform memory access Pattern e.g. numactl --membind 1 --physcpubind 0 ./ramsmp -b 6 -m 32 -p 1 -l 3 (*) Local --> ~ MB/s Remote --> ~ MB/s * FR-cloud Regional Centers Meeting 10/12/2014
16
FR-cloud Regional Centers Meeting
Next Steps Verify BenchMark with latest Hardware Optimization of Memory performance : e.g. Numa topology vs the Size of the Image Use the Global Validation Kit ( Alessandro De Salvo ) Deploy CernVm 3.x image solution ( PaaS ) Establish a local fairshare policy Finalized last operation details Log and Tracking Activity FR-cloud Regional Centers Meeting 10/12/2014
17
FR-cloud Regional Centers Meeting
Backup Slides FR-cloud Regional Centers Meeting 10/12/2014
18
FR-cloud Regional Centers Meeting
KVM test + NUMA vamvakop]# virsh numatune cloudatlas numa_mode : strict numa_nodeset : 0 vamvakop]# virsh numatune cloudatlas2 numa_nodeset : 1 Per-node process memory usage (in MBs) PID Node Node Total 17998 (qemu-kvm) 18039 (qemu-kvm) Total virsh vcpuinfo cloudatlas |grep CPU: |paste - - CPU: CPU: VCPU: CPU: VCPU: CPU: VCPU: CPU: VCPU: CPU: VCPU: CPU: VCPU: CPU: VCPU: CPU: Cgroup enabled Memory cpuset FR-cloud Regional Centers Meeting 10/12/2014
19
Image Softwate Stack and Size
BoxGrider SL 6.5 HepOS lib CVMFS HTCondor Cloud init / modules Condor Ganglia Vo Profile from CVMFS Site Configuration from CVMFS/Agis Valid Defaults only for CC Image Softwate Stack and Size I 32GB RAM, 16VCPU, 160GB scratch space FR-cloud Regional Centers Meeting 10/12/2014
20
FR-cloud Regional Centers Meeting
Cern Vm 3 and µCernVM « µCernVM is the heart of the CernVM 3 virtual appliance. It is based on Scientific Linux 6 combined with a custom, virtualization-friendly Linux kernel. This image is also fully RPM based; you can use yum and rpm to install additional packages ». Small image size ~ 20MB Include cvmfs, condor, ganglia Contexualization : Hepix, EC2,Cloud-Init Strip kernel with latest paravirtualized drivers Versioned OS on CERVM-FS Support System update (via UnionFS) Blomer et al.; 2014 J. Phys.: Conf. Ser "Micro-CernVM: slashing the cost of building and deploying virtual machines FR-cloud Regional Centers Meeting 10/12/2014
21
Openstack arch ( folsom)
FR-cloud Regional Centers Meeting 10/12/2014
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.