Presentation is loading. Please wait.

Presentation is loading. Please wait.

Globus Virtual Workspaces OOI Cyberinfrastructure Design Meeting, San Diego, 17-19 October Kate Keahey University of Chicago Argonne National Laboratory.

Similar presentations


Presentation on theme: "Globus Virtual Workspaces OOI Cyberinfrastructure Design Meeting, San Diego, 17-19 October Kate Keahey University of Chicago Argonne National Laboratory."— Presentation transcript:

1 Globus Virtual Workspaces OOI Cyberinfrastructure Design Meeting, San Diego, 17-19 October Kate Keahey University of Chicago Argonne National Laboratory keahey@mcs.anl.gov

2 10/18/07, ORION meetingVirtual Workspaces: http://workspace.globus.org Why Virtual Workspaces? 1)Configuration: finding environment tailored to my application 2) Leasing: negotiating a resource allocation tailored to my needs ?

3 10/18/07, ORION meetingVirtual Workspaces: http://workspace.globus.org Why Virtual Workspaces: Challenges l Quality of Service u We get: best-effort provisioning (one size fits all) u We need: advance reservations, urgent computing, periodic, best-effort, and others l Quality of Life u Commonly heard: “I have 512 nodes I cannot use” u We need nodes we can use l Separating environment/resource provisioning from job execution is simply a good idea u E.g. workflow based applications

4 10/18/07, ORION meetingVirtual Workspaces: http://workspace.globus.org Elastic Computing l Leases with clearly defined and enforceable service terms u when you need them and how you need them l A variety of lease shapes u Short-term as well as long-term leases u No “one size fits all”: suitable availability l Extending, reducing, renegotiating leases based on need l Workspaces: resources you can use u Configured with your environment

5 10/18/07, ORION meetingVirtual Workspaces: http://workspace.globus.org What are Virtual Workspaces? l A dynamically provisioned environment u Environment definition: we get exactly the (software) environment we need on demand. u Resource allocation: Provision the resources the workspace needs (CPUs, memory, disk, bandwidth, availability), allowing for dynamic renegotiation to reflect changing requirements and conditions. l Implementation u Traditional means: publishing, automated configuration, coarse-grained enforcement u Virtual Machines: encapsulated configuration and fine-grained enforcement Paper: “Virtual Workspaces: Achieving Quality of Service and Quality of Life in the Grid”

6 10/18/07, ORION meetingVirtual Workspaces: http://workspace.globus.org Virtual Machines Hardware Virtual Machine Monitor (VMM) / Hypervisor Guest OS (Linux) Guest OS (NetBSD) Guest OS (Windows) VM App Xen VMWare UML KVM etc. Parallels l Encapsulate the environment l Fast to deploy, enables short-term leasing l Excellent enforcement and performance isolation l Very good isolation l Also: suspend/resume -> migration

7 10/18/07, ORION meetingVirtual Workspaces: http://workspace.globus.org Deploying Workspaces Remotely Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Workspace -Workspace metadata -Pointer to the image -Logistics information -Deployment request -CPU, memory, node count, etc. VWS Service

8 10/18/07, ORION meetingVirtual Workspaces: http://workspace.globus.org Interacting with Workspaces Pool node Trusted Computing Base (TCB) Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node The workspace service publishes information on each workspace as standard WSRF Resource Properties. Users can query those properties to find out information about their workspace (e.g. what IP the workspace was bound to) Users can interact directly with their workspaces the same way the would with a physical machine. VWS Service

9 10/18/07, ORION meetingVirtual Workspaces: http://workspace.globus.org Workspace Service Components Pool node Trusted Computing Base (TCB) Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node VWS Service Workspace WSRF front-end that allows clients to deploy and manage virtual workspaces Resource manager for a pool of physical nodes Deploys and manages Workspaces on the nodes Each node must have a VMM (Xen) ‏ installed, as well as the workspace control program that manages individual nodes along with the workspace backend (software that manages individual nodes) ‏ Contextualization creates a common context for a virtual cluster

10 10/18/07, ORION meetingVirtual Workspaces: http://workspace.globus.org Workspace Service Components l GT4 WSRF front-end u Leverages GT core and services, notifications, security, etc. u Follows the OGF WS-Agreement provisioning model l Publishes available lease terms u Provides lease descriptions l Workspace Resource Manager (back-end) u Currently focused on Xen u Implements multiple deployment modes l Contextualization u Put the virtual appliance in its deployment context l Current release 1.2.3, available at: u http://workspace.globus.org http://workspace.globus.org

11 10/18/07, ORION meetingVirtual Workspaces: http://workspace.globus.org Workspace Resource Managers l Default resource manager (basic slot fitting) u Commercial datacenter technology would also fit l Amazon Elastic Compute Cloud (EC2) u Selling cycles as Xen VMs u Software similar to Workspace Service l No virtual clusters, contextualization, fine-grain allocations, etc. u Grid credential admission -> EC2 charging model l Workspace Pilot u Integrating VMs into current provisioning models (essentially a PBS glidein) l Long-term solutions u Interleaving soft and hard leases u Providing better articulated leasing models u Developed in the context of existing schedulers

12 10/18/07, ORION meetingVirtual Workspaces: http://workspace.globus.org Providing Resources: The Workspace Pilot l Challenge: find the simplest way to integrate VMs into current provisioning models l Glide-ins (Condor): poor man’s resource leasing u Best-effort semantics: submit a job “pilot” that claims resources but does not run a job l The Workspace Pilot u Resources run dom0 u Pilot adjusts memory u VWS leases “slots” to VMs u Kill-all facility

13 10/18/07, ORION meetingVirtual Workspaces: http://workspace.globus.org Workspace Resource Management (long term solutions) l Challenge: How can we provide semantically rich leases in a cost-effective way?

14 SHORT-TERM LEASE SHORT-TERM LEASE Scheduling the lease without using virtualization : Scheduling the lease using virtualization:

15 10/18/07, ORION meetingVirtual Workspaces: http://workspace.globus.org Interleaving Soft and Hard Leases Injected leases are short (1h-2h), very frequent (every 4 to 8 hours), large (number of nodes between 1/3 and ½ of the cluster)‏ Not using VMs (even with backfilling) results in a noticeable hit on runtime. In this case, the scheduler cannot readily start large parallel jobs because of the resource leases. With VMs, these can be started, and suspended before the leases start.

16 10/18/07, ORION meetingVirtual Workspaces: http://workspace.globus.org Middleware Development: Contextualization l Challenge: Putting a VM in the deployment context of the Grid, site, and other VMs u Assigning and sharing IP addresses, name resolution, application- level configuration, etc. l Management of Common Context Paper: “A Scalable Approach To Deploying And Managing Appliances”, TeraGrid conference 2007 u Configuration-dependent l provides&requires u Common understanding between the image “vendor” and deployer u Mechanisms for securely delivering the required information to images across different implementations contextualization agent Common Context IP hostname pk

17 10/18/07, ORION meetingVirtual Workspaces: http://workspace.globus.org Workspace Ecosystem Resource Providers: Local clusters, Grid resource providers (TeraGrid, OSG) Commercial providers: EC2, Sun, slicehost, Provisioning a resource, not a platform Appliance Providers: OSFarm, rPath, CohesiveFT, bcfg2, etc. marketplaces of all kinds Virtual Organizations: configuration, attestation, maintenance Middleware: appliances --> resources manage appliance deployment Combining networks and storage VWSEC2In-Vigo

18 10/18/07, ORION meetingVirtual Workspaces: http://workspace.globus.org STAR: Why Workspaces? l STAR: a high energy nuclear physics application l Complex experimental application codes u Developed over more than 10 years, by more than 100 scientists, comprises ~2 M lines of C++ and Fortran code l Require complex, customized environments u Rely on the right combination of compiler versions and available libraries u Dynamically load external libraries depending on the task to be performed l Environment validation u To ensure reproducibility and result uniformity across environments u Regression tests cannot be done on all OS flavors due to simple manpower considerations

19 10/18/07, ORION meetingVirtual Workspaces: http://workspace.globus.org Virtual Workspaces for STAR l STAR image configuration u A virtual cluster composed of an OSG headnode and STAR worker nodes l Using the workspace service over EC2 to provision resources u Allocations of up to 100 nodes u Dynamically contextualized for out of the box cluster

20 10/18/07, ORION meetingVirtual Workspaces: http://workspace.globus.org Running jobs : 300 PDSF Fermi VWS/EC2BNL Running jobs : 230 Running jobs : 150 Running jobs : 50 Running jobs : 150 Running jobs : 300Running jobs : 282Running jobs : 243Running jobs : 221Running jobs : 195Running jobs : 140Running jobs : 76Running jobs : 0 Running jobs : 200 Running jobs : 50 Running jobs : 150Running jobs : 142Running jobs : 124Running jobs : 109Running jobs : 94Running jobs : 73Running jobs : 42 Running jobs : 195Running jobs : 183Running jobs : 152Running jobs : 136Running jobs : 96Running jobs : 54Running jobs : 37Running jobs : 0 Running jobs : 42Running jobs : 39Running jobs : 34Running jobs : 27Running jobs : 21Running jobs : 15Running jobs : 9Running jobs : 0 Job Completion : File Recovery : WSU with thanks to Jerome Lauret and Doug Olson of the STAR project

21 10/18/07, ORION meetingVirtual Workspaces: http://workspace.globus.org Nersc PDSF EC2 (via Workspace Service) WSU Accelerated display of a workflow job state Y = job number, X = job state with thanks to Jerome Lauret and Doug Olson of the STAR project

22 10/18/07, ORION meetingVirtual Workspaces: http://workspace.globus.org Parting Thoughts l Division of labor u Resource providers u Appliance vendors l Outsourcing of resource provisioning u We need rich lease semantics l Moving toward a Grid where your environment can run on the available resources u Configure once, run many times l Local resources, Grid resources, commercially available resources

23 10/18/07, ORION meetingVirtual Workspaces: http://workspace.globus.org Credits l Workspace team: u Tim Freeman, u Borja Sotomayor l Guest appearances u Ian Foster, Frank Siebenlist l STAR collaborators: u Jerome Lauret (BNL), Doug Olson (LBNL)


Download ppt "Globus Virtual Workspaces OOI Cyberinfrastructure Design Meeting, San Diego, 17-19 October Kate Keahey University of Chicago Argonne National Laboratory."

Similar presentations


Ads by Google