Presentation is loading. Please wait.

Presentation is loading. Please wait.

Division of Labor: Tools for Growing and Scaling Grids Tim Freeman, Kate Keahey, Ian Foster, Abhishek Rana, Frank Wuerthwein, Borja Sotomayor.

Similar presentations


Presentation on theme: "Division of Labor: Tools for Growing and Scaling Grids Tim Freeman, Kate Keahey, Ian Foster, Abhishek Rana, Frank Wuerthwein, Borja Sotomayor."— Presentation transcript:

1 Division of Labor: Tools for Growing and Scaling Grids Tim Freeman, Kate Keahey, Ian Foster, Abhishek Rana, Frank Wuerthwein, Borja Sotomayor

2 12/05/06ICSOC 06 Division of Labor How can we implement division of labor in Grid computing? The greatest improvements in the productive powers of labour, and the greater part of the skill, dexterity, and judgment with which it is anywhere directed, or applied, seem to have been the effects of the division of labour. (Adam Smith) requirements for an abstraction tools to implement an abstraction

3 12/05/06ICSOC 06 Overview l Problem Definition u The Edge Service Use Case l Workspace Service u Overview of the workspace service u Extensions to workspace service l Implementation and Evaluation u CPU enforcement u Network Enforcement l Status of the Edge Services Project l Conclusions

4 12/05/06ICSOC 06 Overview l Problem Definition u The Edge Service Use Case l Workspace Service u Overview of the workspace service u Extensions to workspace service l Implementation and Evaluation u CPU enforcement u Network Enforcement l Status of the Edge Services Project l Conclusions

5 12/05/06ICSOC 06 Providers and Consumers Resource providerResource consumers Has a limited number of resources Want the resources when they need them & as much as they need Has to balance the software needs of multiple users Want to use specific software packages Has to provide a limited execution environment for security reasons Wants as much control as possible over resources

6 12/05/06ICSOC 06 The Edge Service Use Case

7 12/05/06ICSOC 06 Edge Services: Challenges l VO-specific Edge Services u Each VO has very specific configuration requirements l Resource management u The VOs would like to provide quality of service to their users u The resource needs of the VOs are change dynamically l Dynamic, policy-based deployment and management of Edge Services u Updates, ephemeral edge services, infrastructure testing, short-term usage

8 12/05/06ICSOC 06 Division of Labor Dimensions l Environment and Configuration l Isolation u Critical from the point of view of the provider if the VOs are to be allowed some independence l Resource usage and accounting u Application-independent u Management along different resource aspects u Dynamically renegotiable/adaptable

9 12/05/06ICSOC 06 Overview l Problem Definition u The Edge Service Use Case l Workspace Service u Overview of the workspace service u Extensions to workspace service l Implementation and Evaluation u CPU enforcement u Network Enforcement l Status of the Edge Services Project l Conclusions

10 12/05/06ICSOC 06 GT4 workspace service l The GT4 Virtual Workspace Service (VWS) allows an authorized client to deploy and manage workspaces on-demand. u GT4 WSRF front-end u Leverages multiple GT services u Currently implements workspaces as VMs l Uses the Xen VMM but others could also be used u Current release (December, 06) u

11 12/05/06ICSOC 06 Workspace Service Usage Scenario Pool node Trusted Computing Base (TCB) Image Node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node The workspace service has a WSRF frontend that allows users to deploy and manage virtual workspaces The VWS manages a set of nodes inside the TCB (typically a cluster). This is called the node pool. Each node must have a VMM (Xen) installed, along with the workspace backend (software that manages individual nodes) VM images are staged to a designated image node inside the TCB VWS Node VWS Service

12 12/05/06ICSOC 06 Image Node Deploying Workspaces Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Workspace - Workspace metadata - Resource Allocation VWS Service l Adapter-based implementation model u Transport adapters l Default scp, then gridftp u Control adapters l Default ssh l Deprecated: PBS, SLURM u VW deployment adapter l Xen l Previous versions: VMware

13 12/05/06ICSOC 06 Image Node Interacting with Workspaces Pool node Trusted Computing Base (TCB) Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node The workspace service publishes information on each workspace as standard WSRF Resource Properties. Users can query those properties to find out information about their workspace (e.g. what IP the workspace was bound to) Users can interact directly with their workspaces the same way the would with a physical machine. VWS Service

14 12/05/06ICSOC 06 Deployment Request Arguments l A workspace, composed of: u VM image u Workspace metadata l XML document l Includes deployment-independent information: u VMM and kernel requirements u NICs + IP configuratoin u VM image location u Need not change between deployments l Resource Allocation u Specifies availability, memory, CPU%, disk u Changes during or between deployments

15 12/05/06ICSOC 06 Workspace Service Interfaces Workspace Service Workspace Factory Service Create() Workspace Meta-data/Image Resource Allocation inspect & manage notify Workspace Resource Instance authorize & instantiate Workspace Service Handles creation of workspaces. Also publishes information on what types of workspaces it can support Handles management of each created workspace (start, stop, pause, migrate, inspecting VW state,...) Resource Properties publish the assigned resource allocation, how VW was bound to metadata (e.g. IP address), duration, and state

16 12/05/06ICSOC 06 Extensions to Resource Allocation

17 12/05/06ICSOC 06 Overview l Problem Definition u The Edge Service Use Case l Workspace Service u Overview of the workspace service u Extensions to workspace service l Implementation and Evaluation u CPU resource allocation u Network resource allocation l Status of the Edge Services Project l Conclusions

18 12/05/06ICSOC 06 Edge Services Today GRAM VO1 VO2 VO jpm 8 jpm Job throughput is low as both VOs are equally impacted by the high VO1 traffic Both VOs share the same resource Compute Element (CE) implemented as GT GRAM

19 12/05/06ICSOC 06 Allocating Resources for Edge Services GRAM 4.18 jpm jpm GRAM VO1 VO2 VO1 Job throughput for VO2 is high as it is unimpacted by the high VO1 traffic Workspace Service Resource Allocation: MEM: 896 MB CPU: CPU %: 45% CPU arch: AMD Athlon Resource Allocation: MEM: 896 MB CPU: CPU %: 45% CPU arch: AMD Athlon Dom0 CPU %: 10%

20 12/05/06ICSOC 06 Tracking Requests Overtime - Histogram of request throughput - Resource usage is enforced on an as needed basis

21 12/05/06ICSOC 06 Increasing Load on VO1 - Histogram of request throughput - The load on VO1 increases 2x and 3x - Request throughput for VO2 is unimpacted

22 12/05/06ICSOC 06 Network Resource Allocation l Processing network traffic requires CPU l In Xen: for both dom0 and guest domains u CPU allocation tradeoffs u Scheduling frequency l The mechanism is general u Save for direct drivers B dom0 domU

23 12/05/06ICSOC 06 Network Resource Allocation l Network Allocation Implementation u CPU allocations based on a parameter sweep l Close to maximum bandwidth u Linux network shaping tools l Negotiating network resource allocations u Policy: accepting only CPU allocations that match the bandwidth

24 12/05/06ICSOC 06 Storage Element (SE) Edge Service VO2 GridFTP VO1 GridFTP VO1 VO2 Workspace Service Resource Allocation: MEM: 128 MB CPU: CPU %: 6% CPU arch: AMD Athlon NIC: Incoming: 4.1 MB/s Resource Allocation: MEM: 128 MB CPU: CPU %: 6% CPU arch: AMD Athlon NIC: Incoming: 4.1 MB/s Dom0 CPU %: 22%

25 12/05/06ICSOC 06 Negotiating Bandwidth

26 12/05/06ICSOC 06 Renegotiating CPU and Bandwidth VO2 GridFTP VO1 GridFTP Workspace Service Resource Allocation: MEM: 128 MB CPU: CPU %: 6% CPU arch: AMD Athlon NIC: Incoming: 4.1 MB/s Resource Allocation: MEM: 128 MB CPU: CPU %: 6% CPU arch: AMD Athlon NIC: Incoming: 4.1 MB/s Resource Allocation: MEM: 128 MB CPU: CPU %: 14% CPU arch: AMD Athlon NIC: Incoming: 8.2 MB/s Dom0 CPU %: 22%

27 12/05/06ICSOC 06 Renegotiating CPU and Bandwidth

28 12/05/06ICSOC 06 Renegotiating CPU VO2 GridFTP VO1 GridFTP Workspace Service Resource Allocation: MEM: 128 MB CPU: CPU %: 6% CPU arch: AMD Athlon NIC: Incoming: 4.1 MB/s Resource Allocation: MEM: 128 MB CPU: CPU %: 14% CPU arch: AMD Athlon NIC: Incoming: 8.2 MB/s Resource Allocation: MEM: 128 MB CPU: CPU %: 34% CPU arch: AMD Athlon NIC: Incoming: 8.2 MB/s Dom0 CPU %: 22%

29 12/05/06ICSOC 06 Renegotiating CPU

30 12/05/06ICSOC 06 Edge Services: Status l OSG activity u l Edge Services in use (database caches) u ATLAS: mysql-gsi db built by the DASH project u CMS: frontier database l Base Image library u SDSC: SL3.0.3, FC4, CentOS4.1 u FNAL: SL3.0.3, SL4, LTS 3, LTS 4 l Sites u Production: SDSC u also testing at FNAL, UC and ANL

31 12/05/06ICSOC 06 Related Work l Edge Service efforts u VO boxes, EGEE u APAC, static Edge Services u Grid-Ireland, static Edge Services l OGF efforts: WS-Agreement, JSDL l Managed Services l QoS with Xen u Padma Apparo, Intel (VTDC paper) u Rob Gardner & team, HP u Credit-based scheduler l Grid computing and virtualization u Work at University of Florida, Purdue, Northwestern, Duke and others

32 12/05/06ICSOC 06 Conclusions l VM-based workspaces are a promising tool to implement division of labor l Renegotiation is an important resource management tool u Protocols u Enforcement methods: dynamic reallocation, migration, etc. l Aggregate resource allocations u Different resource aspects influence each other l More work on managing VM resources is needed


Download ppt "Division of Labor: Tools for Growing and Scaling Grids Tim Freeman, Kate Keahey, Ian Foster, Abhishek Rana, Frank Wuerthwein, Borja Sotomayor."

Similar presentations


Ads by Google