Presentation on theme: "PlanetLab Operating System support* *a work in progress."— Presentation transcript:
PlanetLab Operating System support* *a work in progress
What is it? A Distributed set of machines that must be shared in an efficient way.. Where “efficient” can mean a varied “lot”..
Goals PlanetLab account, together with associated resources should span through multiple nodes. (SLICE) Distributed Virtualization Unbundled management Infrastructure services (running a platform as opposed to running an application) over a SLICE providing variety of services for the same functionality.
4 main areas.. VM Abstraction - Linux vserver Resource Allocation + Isolation - SCOUT Network virtualization Distributed Monitoring
Full virtualization like Vmware - performance, lot of memory consumed by each memory image Para virtualization like xen - more efficient, a promising solution (but still has memory constraints) Virtualize at system call level like Linux vservers, UML - support large number of slices with reasonable isolation “Node Virtualization”
OS for each VM ? Linux vservers - linux inside linux Each vserver is a directory in a chroot jail. Each virtual server, – share binaries – has its own packages, – has its own services, – is a weaker form of root that provides a local super user, – has its own users, i.e own GID/UID namespace – is confined to using some IP numbers only and, – is confined to some area(s) of the file system.
Communication among ‘vservers’ Not local sockets or IPC but via IP –Simplifies resource management and isolation –Interaction is independent of their locations
Reduced resource usage Physical memory –Copy of write memory segments across unrelated servers Unification (Disk space) –Share files across contexts –Hard linked immutable un-linkable files
Required modifications for vserver Notion of context –Isolate group of processes, –Each vserver is a separate context, –Add context id to all inodes, –Context specific capabilities were added, –Context limits can be specified, –Easy accounting for each contexts.
vserver implementation Initialize vserver –Create a mirror of reference root file system –Create two identical login account Switching from default shell (modified shell) –Switch to the Slice's vserver security context –Chroot to vserver’s root file system –Relinquish subset of true super user privileges –Redirect into other account in that vserver
Overall structuring Central infrastructure services ( Planet Lab Central ) –central database of principles, slices, resource allocation and policies –Creation, deletion of slices through exported interface Node manager –Obtains resource information from central server –Bind resources to local VM that belongs to a slice Rcap -> acquire( Rspecs ) Bind( slice_id, Rcap ) ** Every resource accesses goes through the node manager as system call and validated using Rcap
Implementation Non renewable resources –Disk space, memory pages, file descriptor –Appropriate system calls wrapped to check with per slice resource limits, increment usage. Renewable resources –Fairness and guarantees Hierarchical token bucket queuing discipline –Cap per-vserver total outgoing bandwidth SILK for CPU scheduling –Proportional share scheduling using resource containers
“Network virtualization” Filters on network send and receive - like Exokernel and Nemesis. Sharing and partitioning a single network address space - by using a safe version of raw sockets. Alternative approach (similar to xen) - Assign different IP address to each VM, each using the entire port space and manage its own routing table. The problem is unavailability of enough IPV4 addresses in the order of 1000 per node.
Safe raw sockets The Scout module manages all TCP and UDP ports and ICMP IDs to ensure that there are no collisions between safe raw sockets and TCP/UDP/ICMP sockets For each IP address, all ports are either free or "owned" by a slice. Two slices may split ownership of a port by binding it to different IP addresses. Only two IP addresses for a node as of now.. External IP + loop back address SLICE can reserve port as any other resource (Xclusive) SLICE can open 3 sockets on a port –Error socket, consumer socket, sniffer socket
Monitoring Http Sensor server collects data from sensor interface on each nodes. Clients can query form the sensor database
Scalability Limited by disk space Of course limited by kernel resources –Need to recompile to increase resources