Division of Labor: Tools for Growing and Scaling Grids Tim Freeman, Kate Keahey, Ian Foster, Abhishek Rana, Frank Wuerthwein, Borja Sotomayor.

Slides:



Advertisements
Similar presentations
11 Application of CSF4 in Avian Flu Grid: Meta-scheduler CSF4. Lab of Grid Computing and Network Security Jilin University, Changchun, China Hongliang.
Advertisements

Open Science Grid Living on the Edge: OSG Edge Services Framework Kate Keahey Abhishek Rana.
A Scalable Approach to Deploying and Managing Appliances Kate Keahey Rick Bradshaw, Narayan Desai, Tim Freeman Argonne National Lab, University of Chicago.
On-Demand Virtual Workspaces: Quality of Life in the Grid Kate Keahey Argonne National Laboratory.
Virtual Appliances for Scientific Applications Kate Keahey Argonne National Laboratory University of Chicago.
Virtual Workspaces State of the Art and Current Directions Borja Sotomayor University of Chicago (Dept. of CS) Kate Keahey ANL/UC.
Virtualization: Towards More Flexible and Efficient Grids Kate Keahey Argonne National Laboratory.
From Sandbox to Playground: Virtual Environments and Quality of Service in the Grids Kate Keahey Argonne National Laboratory.
Working Spaces: Virtual Machines in the Grid Kate Keahey Argonne National Laboratory Tim Freeman, Frank Siebenlist
Enabling Cost-Effective Resource Leases with Virtual Machines Borja Sotomayor University of Chicago Ian Foster Argonne National Laboratory/
Workspaces for CE Management Kate Keahey Argonne National Laboratory.
The VM deployment process has 3 major steps: 1.The client queries the VM repository, sending a list of criteria describing a workspace. The repository.
Wei Lu 1, Kate Keahey 2, Tim Freeman 2, Frank Siebenlist 2 1 Indiana University, 2 Argonne National Lab
From Sandbox to Playground: Dynamic Virtual Environments in the Grid Kate Keahey Argonne National Laboratory Karl Doering University.
Virtual Workspaces in the Grid Kate Keahey Argonne National Laboratory Ian Foster, Tim Freeman, Xuehai Zhang, Daniel Galron.
Globus Virtual Workspaces An Update SC 2007, Reno, NV Kate Keahey Argonne National Laboratory University of Chicago
Current status of grids: the need for standards Mike Mineter TOE-NeSC, Edinburgh.
Global Grid Forum GridWorld GGF15 Boston USA October Abhishek Singh Rana and Frank Wuerthwein UC San Diegowww.opensciencegrid.org The Open Science.
Virtual Switching Without a Hypervisor for a More Secure Cloud Xin Jin Princeton University Joint work with Eric Keller(UPenn) and Jennifer Rexford(Princeton)
Virtual Machine Technology Dr. Gregor von Laszewski Dr. Lizhe Wang.
PlanetLab Architecture Larry Peterson Princeton University.
Virtualization and Cloud Computing. Definition Virtualization is the ability to run multiple operating systems on a single physical system and share the.
Implementing Finer Grained Authorization in the Open Science Grid Gabriele Carcassi, Ian Fisk, Gabriele, Garzoglio, Markus Lorch, Timur Perelmutov, Abhishek.
Xen , Linux Vserver , Planet Lab
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
24 Sept 2007 ADASS XVII : London, UK1 Cloudspace: virtual environments in the VO Matthew J. Graham (Caltech) Roy Williams (Caltech) T HE US N ATIONAL V.
Grid Computing, B. Wilkinson, 20046c.1 Globus III - Information Services.
Virtualization for Cloud Computing
Adaptive Server Farms for the Data Center Contact: Ron Sheen Fujitsu Siemens Computers, Inc Sever Blade Summit, Getting the.
Kate Keahey Argonne National Laboratory University of Chicago Globus Toolkit® 4: from common Grid protocols to virtualization.
Additional SugarCRM details for complete, functional, and portable deployment.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Yury Kissin Infrastructure Consultant Storage improvements Dynamic Memory Hyper-V Replica VM Mobility New and Improved Networking Capabilities.
Virtual Infrastructure in the Grid Kate Keahey Argonne National Laboratory.
Nimbus & OpenNebula Young Suk Moon. Nimbus - Intro Open source toolkit Provides virtual workspace service (Infrastructure as a Service) A client uses.
OSG Middleware Roadmap Rob Gardner University of Chicago OSG / EGEE Operations Workshop CERN June 19-20, 2006.
Improving Network I/O Virtualization for Cloud Computing.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
May 8, 20071/15 VO Services Project – Status Report Gabriele Garzoglio VO Services Project – Status Report Overview and Plans May 8, 2007 Computing Division,
Large Scale Sky Computing Applications with Nimbus Pierre Riteau Université de Rennes 1, IRISA INRIA Rennes – Bretagne Atlantique Rennes, France
Globus Virtual Workspaces OOI Cyberinfrastructure Design Meeting, San Diego, October Kate Keahey University of Chicago Argonne National Laboratory.
1 Week #10Business Continuity Backing Up Data Configuring Shadow Copies Providing Server and Service Availability.
Global Grid Forum GridWorld GGF15 Boston USA October Abhishek Singh Rana and Frank Wuerthwein UC San Diegowww.opensciencegrid.org The Open Science.
Virtual Workspaces Kate Keahey Argonne National Laboratory.
EVGM081 Multi-Site Virtual Cluster: A User-Oriented, Distributed Deployment and Management Mechanism for Grid Computing Environments Takahiro Hirofuchi,
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
Ian Gable University of Victoria 1 Deploying HEP Applications Using Xen and Globus Virtual Workspaces A. Agarwal, A. Charbonneau, R. Desmarais, R. Enge,
Trusted Virtual Machine Images a step towards Cloud Computing for HEP? Tony Cass on behalf of the HEPiX Virtualisation Working Group October 19 th 2010.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
VO Privilege Activity. The VO Privilege Project develops and implements fine-grained authorization to grid- enabled resources and services Started Spring.
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.
Dynamic Creation and Management of Runtime Environments in the Grid Kate Keahey Matei Ripeanu Karl Doering.
April 25, 2006Parag Mhashilkar, Fermilab1 Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
OSG Status and Rob Gardner University of Chicago US ATLAS Tier2 Meeting Harvard University, August 17-18, 2006.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
1 PERFORMANCE DIFFERENTIATION OF NETWORK I/O in XEN by Kuriakose Mathew ( )‏ under the supervision of Prof. Purushottam Kulkarni and Prof. Varsha.
CHEP 2006 Mumbai INDIA February Frank Würthwein and Abhishek Singh Rana Edge Services Framework for EGEE, LCG and OSGwww.opensciencegrid.org The.
Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.
Towards Dynamic Database Deployment LCG 3D Meeting November 24, 2005 CERN, Geneva, Switzerland Alexandre Vaniachine (ANL)
Workspace Management Services Kate Keahey Argonne National Laboratory.
Dynamic Accounts: Identity Management for Site Operations Kate Keahey R. Ananthakrishnan, T. Freeman, R. Madduri, F. Siebenlist.
Towards GLUE Schema 2.0 Sergio Andreozzi INFN-CNAF Bologna, Italy
Management of Virtual Machines in Grids Infrastructures
Management of Virtual Machines in Grids Infrastructures
GGF15 – Grids and Network Virtualization
Managing Clouds with VMM
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
Grid & Virtualization Working Group
Presentation transcript:

Division of Labor: Tools for Growing and Scaling Grids Tim Freeman, Kate Keahey, Ian Foster, Abhishek Rana, Frank Wuerthwein, Borja Sotomayor

12/05/06ICSOC 06 Division of Labor How can we implement division of labor in Grid computing? The greatest improvements in the productive powers of labour, and the greater part of the skill, dexterity, and judgment with which it is anywhere directed, or applied, seem to have been the effects of the division of labour. (Adam Smith) requirements for an abstraction tools to implement an abstraction

12/05/06ICSOC 06 Overview l Problem Definition u The Edge Service Use Case l Workspace Service u Overview of the workspace service u Extensions to workspace service l Implementation and Evaluation u CPU enforcement u Network Enforcement l Status of the Edge Services Project l Conclusions

12/05/06ICSOC 06 Overview l Problem Definition u The Edge Service Use Case l Workspace Service u Overview of the workspace service u Extensions to workspace service l Implementation and Evaluation u CPU enforcement u Network Enforcement l Status of the Edge Services Project l Conclusions

12/05/06ICSOC 06 Providers and Consumers Resource providerResource consumers Has a limited number of resources Want the resources when they need them & as much as they need Has to balance the software needs of multiple users Want to use specific software packages Has to provide a limited execution environment for security reasons Wants as much control as possible over resources

12/05/06ICSOC 06 The Edge Service Use Case

12/05/06ICSOC 06 Edge Services: Challenges l VO-specific Edge Services u Each VO has very specific configuration requirements l Resource management u The VOs would like to provide quality of service to their users u The resource needs of the VOs are change dynamically l Dynamic, policy-based deployment and management of Edge Services u Updates, ephemeral edge services, infrastructure testing, short-term usage

12/05/06ICSOC 06 Division of Labor Dimensions l Environment and Configuration l Isolation u Critical from the point of view of the provider if the VOs are to be allowed some independence l Resource usage and accounting u Application-independent u Management along different resource aspects u Dynamically renegotiable/adaptable

12/05/06ICSOC 06 Overview l Problem Definition u The Edge Service Use Case l Workspace Service u Overview of the workspace service u Extensions to workspace service l Implementation and Evaluation u CPU enforcement u Network Enforcement l Status of the Edge Services Project l Conclusions

12/05/06ICSOC 06 GT4 workspace service l The GT4 Virtual Workspace Service (VWS) allows an authorized client to deploy and manage workspaces on-demand. u GT4 WSRF front-end u Leverages multiple GT services u Currently implements workspaces as VMs l Uses the Xen VMM but others could also be used u Current release (December, 06) u

12/05/06ICSOC 06 Workspace Service Usage Scenario Pool node Trusted Computing Base (TCB) Image Node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node The workspace service has a WSRF frontend that allows users to deploy and manage virtual workspaces The VWS manages a set of nodes inside the TCB (typically a cluster). This is called the node pool. Each node must have a VMM (Xen) installed, along with the workspace backend (software that manages individual nodes) VM images are staged to a designated image node inside the TCB VWS Node VWS Service

12/05/06ICSOC 06 Image Node Deploying Workspaces Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Workspace - Workspace metadata - Resource Allocation VWS Service l Adapter-based implementation model u Transport adapters l Default scp, then gridftp u Control adapters l Default ssh l Deprecated: PBS, SLURM u VW deployment adapter l Xen l Previous versions: VMware

12/05/06ICSOC 06 Image Node Interacting with Workspaces Pool node Trusted Computing Base (TCB) Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node The workspace service publishes information on each workspace as standard WSRF Resource Properties. Users can query those properties to find out information about their workspace (e.g. what IP the workspace was bound to) Users can interact directly with their workspaces the same way the would with a physical machine. VWS Service

12/05/06ICSOC 06 Deployment Request Arguments l A workspace, composed of: u VM image u Workspace metadata l XML document l Includes deployment-independent information: u VMM and kernel requirements u NICs + IP configuratoin u VM image location u Need not change between deployments l Resource Allocation u Specifies availability, memory, CPU%, disk u Changes during or between deployments

12/05/06ICSOC 06 Workspace Service Interfaces Workspace Service Workspace Factory Service Create() Workspace Meta-data/Image Resource Allocation inspect & manage notify Workspace Resource Instance authorize & instantiate Workspace Service Handles creation of workspaces. Also publishes information on what types of workspaces it can support Handles management of each created workspace (start, stop, pause, migrate, inspecting VW state,...) Resource Properties publish the assigned resource allocation, how VW was bound to metadata (e.g. IP address), duration, and state

12/05/06ICSOC 06 Extensions to Resource Allocation

12/05/06ICSOC 06 Overview l Problem Definition u The Edge Service Use Case l Workspace Service u Overview of the workspace service u Extensions to workspace service l Implementation and Evaluation u CPU resource allocation u Network resource allocation l Status of the Edge Services Project l Conclusions

12/05/06ICSOC 06 Edge Services Today GRAM VO1 VO2 VO jpm 8 jpm Job throughput is low as both VOs are equally impacted by the high VO1 traffic Both VOs share the same resource Compute Element (CE) implemented as GT GRAM

12/05/06ICSOC 06 Allocating Resources for Edge Services GRAM 4.18 jpm jpm GRAM VO1 VO2 VO1 Job throughput for VO2 is high as it is unimpacted by the high VO1 traffic Workspace Service Resource Allocation: MEM: 896 MB CPU: CPU %: 45% CPU arch: AMD Athlon Resource Allocation: MEM: 896 MB CPU: CPU %: 45% CPU arch: AMD Athlon Dom0 CPU %: 10%

12/05/06ICSOC 06 Tracking Requests Overtime - Histogram of request throughput - Resource usage is enforced on an as needed basis

12/05/06ICSOC 06 Increasing Load on VO1 - Histogram of request throughput - The load on VO1 increases 2x and 3x - Request throughput for VO2 is unimpacted

12/05/06ICSOC 06 Network Resource Allocation l Processing network traffic requires CPU l In Xen: for both dom0 and guest domains u CPU allocation tradeoffs u Scheduling frequency l The mechanism is general u Save for direct drivers B dom0 domU

12/05/06ICSOC 06 Network Resource Allocation l Network Allocation Implementation u CPU allocations based on a parameter sweep l Close to maximum bandwidth u Linux network shaping tools l Negotiating network resource allocations u Policy: accepting only CPU allocations that match the bandwidth

12/05/06ICSOC 06 Storage Element (SE) Edge Service VO2 GridFTP VO1 GridFTP VO1 VO2 Workspace Service Resource Allocation: MEM: 128 MB CPU: CPU %: 6% CPU arch: AMD Athlon NIC: Incoming: 4.1 MB/s Resource Allocation: MEM: 128 MB CPU: CPU %: 6% CPU arch: AMD Athlon NIC: Incoming: 4.1 MB/s Dom0 CPU %: 22%

12/05/06ICSOC 06 Negotiating Bandwidth

12/05/06ICSOC 06 Renegotiating CPU and Bandwidth VO2 GridFTP VO1 GridFTP Workspace Service Resource Allocation: MEM: 128 MB CPU: CPU %: 6% CPU arch: AMD Athlon NIC: Incoming: 4.1 MB/s Resource Allocation: MEM: 128 MB CPU: CPU %: 6% CPU arch: AMD Athlon NIC: Incoming: 4.1 MB/s Resource Allocation: MEM: 128 MB CPU: CPU %: 14% CPU arch: AMD Athlon NIC: Incoming: 8.2 MB/s Dom0 CPU %: 22%

12/05/06ICSOC 06 Renegotiating CPU and Bandwidth

12/05/06ICSOC 06 Renegotiating CPU VO2 GridFTP VO1 GridFTP Workspace Service Resource Allocation: MEM: 128 MB CPU: CPU %: 6% CPU arch: AMD Athlon NIC: Incoming: 4.1 MB/s Resource Allocation: MEM: 128 MB CPU: CPU %: 14% CPU arch: AMD Athlon NIC: Incoming: 8.2 MB/s Resource Allocation: MEM: 128 MB CPU: CPU %: 34% CPU arch: AMD Athlon NIC: Incoming: 8.2 MB/s Dom0 CPU %: 22%

12/05/06ICSOC 06 Renegotiating CPU

12/05/06ICSOC 06 Edge Services: Status l OSG activity u l Edge Services in use (database caches) u ATLAS: mysql-gsi db built by the DASH project u CMS: frontier database l Base Image library u SDSC: SL3.0.3, FC4, CentOS4.1 u FNAL: SL3.0.3, SL4, LTS 3, LTS 4 l Sites u Production: SDSC u also testing at FNAL, UC and ANL

12/05/06ICSOC 06 Related Work l Edge Service efforts u VO boxes, EGEE u APAC, static Edge Services u Grid-Ireland, static Edge Services l OGF efforts: WS-Agreement, JSDL l Managed Services l QoS with Xen u Padma Apparo, Intel (VTDC paper) u Rob Gardner & team, HP u Credit-based scheduler l Grid computing and virtualization u Work at University of Florida, Purdue, Northwestern, Duke and others

12/05/06ICSOC 06 Conclusions l VM-based workspaces are a promising tool to implement division of labor l Renegotiation is an important resource management tool u Protocols u Enforcement methods: dynamic reallocation, migration, etc. l Aggregate resource allocations u Different resource aspects influence each other l More work on managing VM resources is needed