Presentation is loading. Please wait.

Presentation is loading. Please wait.

Elastic Computing Resource Management Based on HTCondor

Similar presentations


Presentation on theme: "Elastic Computing Resource Management Based on HTCondor"— Presentation transcript:

1 Elastic Computing Resource Management Based on HTCondor
Qiulan Huang On behalf of IHEP Computer Center CHEP2016 San Francisco,

2 Elastic Computing Resource Management Based on HTCondor
Outline Background VCondor: Elastic Computing Resource Management System Based on HTCondor Running Status and Performance Analysis in IHEPCloud Summary 2016/10/12 Elastic Computing Resource Management Based on HTCondor

3 Traditional Architecture of HEP Computing Cluster
Cluster Resource Provided by Independent HEP Application Multiple Independent Application vs Multiple Independent Computing Queue 2016/10/12 Elastic Computing Resource Management Based on HTCondor

4 Existing Trouble in Traditional HEP Cluster Computing
Computing queues don’t share resource Different queues have different peak times of resource usage Massive jobs with a little resource or vice versa The overall resource utilization is low Virtual Computing Cluster Elastic computing resource management Elastic resource pool, scale up and down dynamically according to HEP Application jobs to improve resource utilization 2016/10/12 Elastic Computing Resource Management Based on HTCondor

5 Openstack/OpenNebula
Run jobs on Virtual Computing Cluster A virtual layer is constructed between the physical machines and the RMS (resource management system such as HTCondor or PBS) In the case of free resources in the pool, a job queue can use more than it owns WLCG Grid RMS VCondor/VPBS Virtual machines Openstack/OpenNebula Dedicated SGE working physical nodes VMM VMM VMM VMM Physical machines 2016/10/12 Elastic Computing Resource Management Based on HTCondor

6 Resource pool expansion
Condor Job Submit Node Openstack OpenNebula SN Job Compute Job 7 WN Condor Worker Node Compute Job Transfer VM Virtual Machine VCondor Workflow 5 VCondor 8 VM Creation 6 4 VM SN 1 3 VMQuota Job LHAASO 9 2 SN Job Job Job Job Job Job 11 10 WN WN JUNO Central Manager Job Job CONDOR POOL OF JOBS CONDOR POOL OF NODES 2016/10/12 Elastic Computing Resource Management Based on HTCondor

7 Elastic Computing Resource Management Based on HTCondor
Resource pool shrink Condor Job Submit Node Compute Job Openstack OpenNebula SN Job 7 WN Condor Worker Node Compute Job Transfer VM Virtual Machine VCondor Workflow 5 VCondor 8 VM Deletion 6 4 VM 3 VMQuota SN LHAASO 9 1 SN 2 Job Job Job Job Job 10 WN WN JUNO Central Manager 11 CONDOR POOL OF JOBS CONDOR POOL OF NODES 2016/10/12 Elastic Computing Resource Management Based on HTCondor

8 Elastic Computing Resource Management Based on HTCondor
VCondor’s Components Job submission (1)vcondor: JobMonitor: query and record job information and queue length changes NodeManager: use openstack api to create and destroy virtual machines DAEMON: Main module, periodically executed (2)VMQuota: Computing resource share management system 2016/10/12 Elastic Computing Resource Management Based on HTCondor

9 (1) JobMonitor: Analyze resource requirements
IHEPCloud: Virtual computing cluster 28 computing nodes,672 CPU cores HTCondor version:8.2.5 Provide support for IHEP experiment such as LHAASO, JUNO, BES, CEPC accelerator design 2016/10/12 Elastic Computing Resource Management Based on HTCondor

10 (2)NodeManager: VM creation and deletion
VM Class —Store vm properties, such as hostname, job queue name, idle or busy Icluster Abstract Class —Implementation of the interface support a variety of cloud computing platforms, like Openstack, OpenNebula, AWS EC2 Openstack Cluster Class —Communicate with the controller Nova through a REST client to create or destroy VMs on Openstack ResourcePool Class —Provide support for resource management across multiple cloud platforms Openstack VM Manager Create/Delete VM VM info <object> <object> REST Client Post,Put,Delete Get REST API Nova Keystone Neutron …… OpenStack VM VM VM VM 2016/10/12 Elastic Computing Resource Management Based on HTCondor

11 (3)VMQuota: Resource share management
Set up virtual computing queues for each HEP application, like LHAASO and JUNO Manage the upper thresholds, lower thresholds, and resource reservation time for each queue Queue Lower thresholds Upper thresholds Available resource Resource reservation time(seconds) LHAASO 100 400 200 600 JUNO 300 vcondor [{“ResID”:”juno”}] VMQuota Sending Requests Socket Socket Sending Replies [{“ResID”:”juno”,”MIN”:100,”AVAILABLE”:50}] VMQuota 2016/10/12 Elastic Computing Resource Management Based on HTCondor

12 Dynamic Scheduling Effect
Upper thresholds Jobs queuing, automatically adding virtual computing nodes Lower thresholds LHAASO Resource Pool: Automatically Scale up and down on demand 2016/10/12 Elastic Computing Resource Management Based on HTCondor

13 Elastic Computing Resource Management Based on HTCondor
VCondor:How to use Download VCondor from Make sure HTCondor and Openstack or OpenNebula are well configured Setup a VM Image with Condor installed Setup a VM Template with Image in the above The VCondor configuration file allows you to configure most of its functionality, open it up to get a usable installation Start VCondor and submit jobs, resource pool scale up and down dynamically 2016/10/12 Elastic Computing Resource Management Based on HTCondor

14 Elastic Computing Resource Management Based on HTCondor
Summary VCondor enables elastic resource management Has begun trial operation in IHEPCloud Current experiment: LHAASO, JUNO Provide support for HEP application resource sharing plan Future Achieve preemptive scheduling based on HTCondor Improve scheduling algorithm further: 目前的调度算法是FCFS,比较简单 Welcome to Contact us ! 2016/10/12 Elastic Computing Resource Management Based on HTCondor


Download ppt "Elastic Computing Resource Management Based on HTCondor"

Similar presentations


Ads by Google