Presentation is loading. Please wait.

Presentation is loading. Please wait.

Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.

Similar presentations


Presentation on theme: "Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff."— Presentation transcript:

1 Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff Meeting Fermilab August 19, 2010

2 Overview Develop and deploy virtualization technologies at TeraGrid and OSG sites to support scientific applications. Joint effort: Purdue (TG), Clemson (OSG) 0.5 FTE at Purdue Leverage Purdue’s work on virtualization under TeraGrid Support science with virtualization tools and resources –CMS end users –CMS Tier-3 administrators –STAR end users –TeraGrid end users –TeraGrid resource providers –Other apps that come along the way

3 Proposed work (1) Virtual machine configuration and deployment at OSG and TeraGrid sites to enable interoperation VO configured virtual machine at Purdue and Clemson STAR experiment – Its data processing framework has recently been packaged into a Xen virtual machine Has demonstrated running on Amazon’s EC2 and on the Science Clouds running the Nimbus software. Currently running at Clemson with its own application environment packaged as a virtual machine –Under ExTENCI: we will deploy the STAR VM at Purdue to provide a STAR cloud that interoperates across TG and OSG.

4 CMS experiment –Virtualize a CMS-capable worker node Software stack is large Existing work by Vanderbilt, targeting EC2 –Nodes look like a CMS site – such as Purdue or Vanderbilt, fronted by their own CE –Provision CMS VM worker nodes at Clemson to demonstrate how a grid site can gain additional resources on the cloud. Conference deadlines High-priority or time-crunch for simulation or reconstruction

5 Hypervisor deployment on a large scale alongside traditional cluster operations. –Currently tens to several hundred hypervisors at sites (e.g Purdue, Clemson, U of Wisconsin-Madison, UC, FermiLab) but nothing close to the overall size of the clusters that are on TG or OSG. –We will demonstrate how this can be done by planning and executing the deployment of a large cluster with hypervisors. –Purdue will deploy thousands of hypervisors on the campus teaching lab systems and research clusters. Enable applications, including STAR and other TeraGrid applications to run at a much large scale than what they are able to do currently, potentially enabling additional resource providers to join the TG/OSG interoperation cloud. –Ongoing work at Clemson, international efforts (See Sebastien’s presentation)

6 (2) Virtualization to reduce Tier-3 sites support cost Many LHC institutions want to provide resources as a Tier-3 site –Not every group can/should/wants to run a cluster and CE/SE! –Working with OSG and US-CMS Tier-3 support staff to Provide pre-configured CMS virtual machines for Tier-3 sites to access the CMS data and run analysis and other applications without dedicated IT support personnel –Assumption: VM deployment is being supported by most IT departments today –Provide pre-packaged middleware systems that can be deployed on campuses to connect everyone to the national CI.

7 Two types of CMS VM appliances will be created –CMS user interface appliance Connect to CMS data and tools from user’s own computer (e.g, running Windows) –CRAB client, DBS client, VDT client, etc. –CMS tier-3 compute appliance Much like the CMS VM mentioned earlier Investigate using a VM appliance to allow a site to either –Virtualize a CE –Provide virtual worker nodes in the cloud running in a VM hypervisor, e.g. Xen, KVM, Vmware, or VirtualBox »Minimal support needed by the site!

8 Demonstrate how virtualization can help turn Windows resources into useful scientific computing engines by deploying VMware and/or Virtual Box on Windows resources at Purdue and Clemson. –TG work is already under way at Purdue demonstrating how VMware based Linux VMs can be run on Windows lab machines and join a campus grid using virtual networks. –Will expand on this to build a Windows based cloud across TeraGrid (Purdue) and OSG (Clemson)

9 (3) Extending access to larger resources through virtualization Leverage Wispy, an experimental cloud resource on TeraGrid. Will extend the Nimbus cloud computing solution with “pilot” codes to enable a resource provider to use large Linux clusters and their batch systems as a cloud infrastructure. Both PBS clusters and large Condor pools (using Condor’s Virtual Machine Universe) will be connected to Nimbus resources with the pilot code. This work will provide a much larger pool of resources accessible to such applications as STAR and CMS experiments, improving the productivity of research communities. Will also create and test a TeraGrid VM appliance to run on a non- TeraGrid system (Clemson) to investigate the potential of smaller sites interoperating with the TeraGrid.

10 Applications –Testing will include STAR experiment using the Nimbus EC2-like interface to submit the VO configured VM to run on this cloud. –Coordinate with US-CMS Monte Carlo production coordinator to use the Nimbus EC2-like capability to provide additional resources for time-critical data simulation. –simulation of the entire human body arterial tree (Brown U) –The build and deployment process will be documented and shared with TeraGrid and OSG resource providers.

11 Related work on TG/OSG interoperability Wispy –128 cores (32 nodes - 4 cores, 16GB RAM per node) –Nimbus 2.4, KVM hypervisor Purdue presented the “Wispy” Nimbus cloud at 2009 OSG All-Hands meeting –Also, Purdue staff member led a hands-on tutorial for 20 participants at the OSG Site Administrator's workshop focused on virtual machines and cloud infrastructure Bringing “OSG jobs” to TG resources –Purdue assisted OSG engagement users in using the Steele TeraGrid resource to run “high-throughput HPC” (HTHPC) jobs: single node, 8-cpu tasks. –This effort leverages previous Purdue and OSG work to enable easier submission of MPI jobs. These OSG HTHPC jobs have been successfully run on TG Steele. E.g., a total of 1765 OSG MPI jobs consumed 138,122 wall clock hours (Steele) April-June 2010.

12 Purdue added Condor VM universe support in the Condor pool, and has deployed Linux VMs into a select set of lab machines –Changes allow OSG users to utilize Windows systems without porting their Linux codes, and to submit their own VMs to the Condor pool. Purdue staff were involved in a joint OSG/TeraGrid activity to provide a standard method for OSG sites to advertise MPI capabilities, and subsequently compile and execute MPI codes. –Purdue completed work with the Open Science Grid to provide this capability.


Download ppt "Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff."

Similar presentations


Ads by Google