Presentation on theme: "D u k e S y s t e m s Orca internals 101 Jeff Chase Orcafest 5/28/09."— Presentation transcript:
D u k e S y s t e m s Orca internals 101 Jeff Chase Orcafest 5/28/09
2 D u k e S y s t e m s Summary of Earlier Talks Factor actors/roles along the right boundaries. –stakeholders, innovation, tussle Open contracts with delegation –resource leases Orca as control plane for GENI: –Aggregates are first-class (authority) –Slice controllers are first-class (SM) –Clearinghouse brokers enable policies under GSC direction
3 D u k e S y s t e m s For more… For more on all that, see the slides tacked onto the ends of the presentation.
4 D u k e S y s t e m s Orca Internals 101 Leasing core: shirako Plugins and control flow Actor concurrency model Lease state machines Resource representations
5 D u k e S y s t e m s Actors: The Big Picture Broker request ticket redeem lease Slice Controller Authority delegate
6 D u k e S y s t e m s Actors: The Big Picture Broker request ticket redeem lease Slice Controller Authority delegate Integrate experiment control tools here (e.g., Gush and DieselNet tools by XMLRPC to generic slice controller) Integrate substrate here with authority- side handler plugins These are inter-actor RPC calls made automatically: you should not have to mess with them.
7 D u k e S y s t e m s Terminology Slice controller == slice manager == service manager == guest controller –Blur the distinction between the actor and the controller module that it runs. Broker == clearinghouse (or a service within a clearinghouse) –== agent (in SHARP, and in some code) Authority == aggregate manager –Controls some portion of substrate for a site or domain under a Management Authority
8 D u k e S y s t e m s Slice controller monitors the guest and obtains/renews leases to meet demand Slice Controllers Separate application/environment demand management from resource arbitration Experiment Manager Slice Controller Site Authority Aggregate/authorities monitor resources, arbitrate access, and perform placement of guest requests onto resources Experiment control tools (e.g., Gush and DieselNet tools) e.g. with XMLRPC to generic slice controller.
9 D u k e S y s t e m s ProtoGENI? Possibility: export ProtoGENI XMLRPC from a generic slice controller > > (One thing to consider is how close it is to protogeni, which is not that > > complicated, something like:) > From poster on protogeni.net. (We should be able to support it. ) > > - GetCredential() from slice authority > > - CreateSlice() goes to slice authority > > - Register() slice with clearinghouse > > - ListComponents() goes to CH and returns list of AMs and CMs > > - DiscoverResources() to AM or CM returns rspecs > > - RequestTicket goes straight to an AM > > - RedeemTicket also goes to the AM > > - StartSliver to the AM after redeem: "bring sliver to a running state"
10 D u k e S y s t e m s Broker request ticket redeem lease Slice Controller Authority delegate Brokers and Ticketing Sites delegate control of resources to a broker –Intermediary/middleman Factor allocation policy out of the site –Broker arbitrates resources under its control –Sites retain placement policy Federation" –Site autonomy –Coordinated provisioning SHARP [SOSP 2003] w/ Vahdat, Schwab
11 D u k e S y s t e m s Actor structure: symmetry Actors are RPC clients and servers Recoverable: commit state changes –Pluggable DB layer (e.g., mysql) Common structures/classes in all actors –A set of slices each with a set of leases (ReservationSet) in various states. –Different *Reservation* classes with different state machine transitions. –Generic resource encapsulator (ResourceSet and IConcreteSet) –Common kernel: Shirako leasing core
12 D u k e S y s t e m s Actors and containers Actors run within containers –JVM, e.g., Tomcat –Per-container actor registry and keystore –Per-container mysql binding Actor management interface –Useful for GMOC and portals –Not yet remoteable Portal attaches to container –tabs for different actor roles –Dynamic loadable controllers and views (Automat)
14 D u k e S y s t e m s Shirako Kernel Snippet from developer setup guide on the web. The paths changed in RENCI code base: prefix with core/trunk
15 D u k e S y s t e m s Shirako Kernel Events The actor kernel (core) maintains state and processes events: –Local initiate: start request from local actor E.g., from portal command, or a policy –Incoming request from remote actor –Management API E.g., from portal or GMOC –Timer tick –Other notifications come through tick or protocol APIs
16 D u k e S y s t e m s Pluggable Resources and Policies Leasing Core instantiate guests monitoring state storage/recovery negotiate contract terms event handling lease groups Configure resources Policy Modules Controllers Resource Handlers and Drivers
17 D u k e S y s t e m s Kernel control flow All ops come through KernelWrapper and Kernel (.java) –Wrapper: validate request and access Most operations pertain to a single lease state machine (FSM) But many access global state, e.g., alloc resources from shared substrate. –Kernel: execute op with a global core lock –Nonblocking core, at least in principle
18 D u k e S y s t e m s Kernel control flow Acquire core lock Invoke *Reservation* class to transition lease FSM Release core lock Commit new state to actor DB Execute async tasks, e.g. service* methods to invoke plugins, handlers Ticks probe for completions of pending async tasks.
19 D u k e S y s t e m s Nascent Active Ticketed Active Joining Active ExtendingActiveClosed Extending request ticket return ticket redeem ticket close handshake Site policy assigns concrete resources to match ticket. Initialize resources when lease begins (e.g., install nodes). Broker policy selects resource types and sites, and sizes unit quantities. return lease update lease update ticket request ticket extend request lease extend Guest uses resources. Resources join guest application. original lease term Service Manager Broker Site Authority Teardown/reclaim resources after lease expires, or on guest-initiated close. Reservation may change size (flex) on extend. Guest may continue to extend lease by mutual agreement. form resource request Time Priming Ticketed Lease State Machine
20 D u k e S y s t e m s Handlers Invocation upcalls through ResourceSet/ConcreteSet on relevant lease transitions. –Authority: setup/teardown –Slice controller: join/leave –Unlocked async task upcalls Relevant property sets are available to these handler calls –For resource type, configuration, local/unit –Properties ignored by the core ConcreteSet associated with ShirakoPlugin – e.g., COD manages nodes with IP addresses, invokes handlers in Ant
21 D u k e S y s t e m s Drivers Note that a handler runs within the actor So how to run setup/teardown code on the component itself? How to run join/leave code on the sliver? Option 1: handler invokes management interfaces, e.g., XMLRPC, SNMP, ssh Option 2: invoke custom driver in a NodeAgent with secure SOAP
22 D u k e S y s t e m s Example: VM instantiation handlers drivers
23 D u k e S y s t e m s TL1 Driver Framework General TL1 (command line) framework –Substrate component command/response What to expect? –XML file
24 D u k e S y s t e m s Orca: Actors and Protocols Authority Guest Broker extend ticket allocate extend redeem assign formulate requests redeem tickets redeem extendLease ticket extendTicket updateTicketupdateLease core calendar inventory core resource pools      
25 D u k e S y s t e m s Host site (resource pool)Guest application Service Manager Policy Plugin Points Site Authority lease status notify setup/ teardown handlers for resources assignment policy leasing service interface application resource request policy lease event interface join/leave handler for service leasing API Broker plug-in broker policies for resource selection and provisioning Broker service interface Negotiation between policy plugins over allocation and configuration Properties used to guide negotiation
26 D u k e S y s t e m s Host site (resource pool)Guest application Service Manager Property Lists Site Authority lease status notify setup/ teardown handlers for resources assignment policy leasing service interface application resource request policy lease event interface join/leave handler for service leasing API Broker Examples: FCFS,priority,economic Broker service interface Request propertiesResource Properties Configuration properties Unit properties elastic,deferrablemachine.memory machine.clockspeed image.id,public.key host.ip,host.key
27 D u k e S y s t e m s Messaging Model Proxies maintained in actor registry –Asynchronous RPC –Upcall to kernel for incoming ops –Downcall from lease FSM for outgoing Local, SOAP w/ WS-Security, etc. –WSDL protocols, but incomplete Integration (e.g., XMLRPC) –Experiment manager calls into slice controller module –Substrate ops through authority-side handler
28 D u k e S y s t e m s The end, for now Presentation trails off…. Follows are other slides from previous presentations dealing more with the concepts and rationale of Orca, and its use for GENI.
29 D u k e S y s t e m s NSF GENI Initiative Sliverable GENI Substrate (Contributing domains/Aggregates) Wind tunnel Experiments (Guests occupying slices) Embedding Petri dish Observatory
30 D u k e S y s t e m s Dreaming of GENI NSF GENI clearinghouse Component Registry Channel Band Switch Port Fiber ID ρ Optical Switch GID 1. CM self-generates GID: public and private keys 2. CM sends GID to MA; out of band methods are used to validate MA is willing to vouch for component. CM delegates MA the ability to create slices. 3. MA (because it has sufficient credentials) registers name, GID, URIs and some descriptive info. Notes: Identity and authorization are decoupled in this architecture. GIDs are used for identification only. Credentials are used for authorization. I.e., the GID says only who the component is and nothing about what it can do or who can access it. Assuming aggregate MA already has credentials permitting access to component registry Usage Policy Engine 4. MA delegates rights to NSF GENI so that NSF GENI users can create slices. Aggregate Mgmt Authority Aaron Falk, GPO BBN http://groups.geni.net/
31 D u k e S y s t e m s Slivers and Slices Aaron Falk, GPO BBN
32 D u k e S y s t e m s GENI as a Programmable Substrate Diverse and evolving collection of substrate components. –Different owners, capabilities, and interfaces A programmable substrate is an essential platform for R/D in network architecture at higher layers. –Secure and accountable routing plane –Authenticated traffic control (e.g., free of DOS and spam) –Mobile social networking w/ volunteer resources –Utility networking –Deep introspection and adaptivity –Virtual tunnels and bandwidth-provisioned paths
33 D u k e S y s t e m s Some Observations The Classic Internet is just an overlay. –GENI is underlay architecture (underware). Incorporate edge resources: cloud computing + sliverable network Multiple domains (MAD): not a Grid, but something like dynamic peering contracts –Decouple services from substrate; manage the substrate; let the services manage themselves. Requires predictable (or at least discoverable) allocations for reproducibility –QoS at the bottom or not at all?
34 D u k e S y s t e m s Breakable Experimental Network (BEN) BEN is an experimental fiber facility Supports experimentation at metro scale –Distributed applications researchers –Networking researchers Enabling disruptive technologies –Not a production network Shared by the researchers at the three Triangle Universities –Coarse-grained time sharing is the primary mode for usage –Assumes some experiments must be granted exclusive access to the infrastructure
35 D u k e S y s t e m s Resource Control Plane Middleware Cloud AppsServices Node VM Hardware VM Node VM Hardware VM Other Guests Open Resource Control Architecture (Orca) Contract model for resource peering/sharing/management Programmatic interfaces and protocols Automated lease-based allocation and assignment Share substrate among dynamic guest environments http://www.cs.duke.edu/nicl/ Resource Control Plane
36 D u k e S y s t e m s The GENI Control Plane Programmable substrate elements Dynamic end-to-end sliver allocation + control –Delegation of authority etc. –Instrumentation (feedback) Resource representation and exchange –Defining the capabilities of slivers –network virtual resource Foundation for discovery –Of resources, paths, topology r a =(8,4) r b =(4,8) a b c r c =(4,4) 16 CPU shares bandwidth shares
37 D u k e S y s t e m s Define: Control Plane GGF+GLIF: "Infrastructure and distributed intelligence that controls the establishment and maintenance of connections in the network, including protocols and mechanisms to disseminate this information; and algorithms for automatic delivery and on-demand provisioning of an optimal path between end points. s/connections/slices/ s/optimal path/embedded slices provisioning += and programmed instantiation
38 D u k e S y s t e m s Control Plane Architecture Control plane defines the set of entities that interact to establish, maintain, and release resources and provide…(connection/slice) control functions. At bottom this is about modularity and interfaces. If the entities are owned, operated, and developed by different stakeholders over long time spans, then we call it architecture.
39 D u k e S y s t e m s Key Questions Who are the entities (actors)? What are their roles and powers? Whom do they represent? Who says what to whom? What innovation is possible within each entity, or across entities? Control plane defines the set of entities that interact to establish, maintain, and release resources and provide…[connection,slice] control functions.
40 D u k e S y s t e m s Foundational Architecture What is the architecture for the programmable substrate (and its control plane)? Is that an interesting question? Is it a new question? Is the substrate coherent? What makes it coherent? Why is that coherence valuable? Does there have to be just one? Can there be just one? Can we talk about architecture of the substrate independent of its components?
41 D u k e S y s t e m s Design Tensions Governance vs. freedom Coordination vs. autonomy Diversity vs. coherence Assurance vs. robustness Predictability vs. efficiency Quick vs. right Inclusion vs. entanglement Etc. etc. …
42 D u k e S y s t e m s Design Tensions What is standardized vs. what is open to innovation? How can GENI be open to innovation in components/management/control? –We want it to last a long time. –Innovation is what GENI is for. Standardization vs. innovation –Lingua Franca vs. Tower of Babel
43 D u k e S y s t e m s Who Are the Actors? Principle #1: Entities (actors) in the architecture represent the primary stakeholders. 1.Resource owners/providers (site or domain) 2.Slice owners/controllers (guests) 3.The facility itself, or resource scheduling services acting on its behalf. Others (e.g., institutions) are primarily endorsing entities in the trust chains.
44 D u k e S y s t e m s Control Plane Cloud Service Network Service Etc. Resources Infrastructure providers Brokering intermediaries (ClearingHouse) Plug guests, resources, and management policies into the cloud.
45 D u k e S y s t e m s Contracts Principle #2: provide pathways for contracts among actors. –Accountability [SHARP, SOSP 2003] Be open with respect to what promises an actor is permitted to make. –Open innovation for contract languages and tools –Yes, need at least one LCD Rspec > HTML 1.0 Lingua Franca vs. Tower of Babel Resource contracts are easier than service/interface contracts.
46 D u k e S y s t e m s Rules for Resource Contracts Dont make promises you cant keep…but dont hide power. [Lampson] There are no guarantees, ever. –Have a backup plan for what happens if assurances are not kept. Provide sufficient power to represent what promises the actor is explicitly NOT making. –E.g., temporary donation of resources –Best effort, probabilistic overbooking, etc. Incorporate time: start/expiration time –Resource contracts are leases (or tickets).
47 D u k e S y s t e m s Sites public key Guests public key resource description … unique ID at Site Sites signature Guest request grant Provider Site Leases Foundational abstraction: resource leases Contract between provider (site) and guest –Bind a set of resource units from a site to a guest –Specified term (time interval) –Automatic extends (meter feeding) –Various attributes
48 D u k e S y s t e m s Resource Model - CPU shares - storage server shares [Jin04] - Measured, metered, independent units - varying degrees of performance isolation - Could be any other commodity - units are interchangeable… - …for a given (site, type) pair - virtualized networks - Attribute property lists for types and units - Multidimensional partitioning - e.g., slivers hosting virtual machines The substrate consists of pools of typed raw resources distributed across the sites/domains. r a =(8,4) r b =(4,8) a b c r c =(4,4) 16 VCL or Cluster-on-Demand
49 D u k e S y s t e m s A user leases one or more Xen virtual machines provisioned for a selected image.
50 D u k e S y s t e m s Network Description Language? tdm3.amsterdam1.netherlight.net:501/3 1.244E+9 http://networks.internet2.edu/manlan/manlan.rdf Netherlight Optical Exchange 52.3561 4.9527 tdm3.amsterdam1.netherlight.net >
51 D u k e S y s t e m s Brokers and Ticketing Sites delegate control of resources to a broker –Intermediary/middleman Factor allocation policy out of the site –Broker arbitrates resources under its control –Sites retain placement policy Federation" –Site autonomy –Coordinated provisioning Broker request ticket redeem lease Service Manager Site Authority delegate
52 D u k e S y s t e m s Delegation Principle #3: Contracts enable delegation of powers. –Delegation is voluntary and provisional. It is a building block for creating useful concentrations of power. –Creates a potential for governance –Calendar scheduling, reservation –Double-edged sword? Facility can Just Say No
53 D u k e S y s t e m s Aggregation Principle #4: aggregate the resources for a site or domain. –Primary interface is domain/site authority Abstraction/innovation boundary –Keep components simple –Placement/configuration flexibility for owner –Mask unscheduled outages by substitution –Leverage investment in technologies for site/domain management
54 D u k e S y s t e m s BEN fiberplant Combination of NCNI fiber and campus fiber Possible fiber topologies:
55 D u k e S y s t e m s Infinera DTN PIC-bases solution 100Gbps DLM (digital line module) –Circuits provisioned at 2.5G granularity Automatic optical layer signal management (gain control etc.) GMPLS-based control plane Optical express –All-optical node bypass
56 D u k e S y s t e m s Experimentation on BEN Extend Orca to enable slivering of –Network elements: Fiber switches DWDM equipment Routers Adapt mechanisms to enable flexible description of network slices –NDL Demonstrate end-to-end slicing on BEN –Create realistic slices containing compute, storage and network resources –Run sample experiments on them
57 D u k e S y s t e m s BEN Usage Experimental equipment connected to the BEN fiberplant at BEN points-of- presence Use MEMS fiber switches to switch experimental equipment in and out –Based on the experiment schedule By nature of the facility, experiments running on it may be disruptive to the network BEN Points of presence located at the RENCI engagement sites and RENCI anchor site
58 D u k e S y s t e m s BEN Redux Reconfigurable optical plane –We will be seeking out opportunities to expand the available fiber topology Researcher equipment access at all layers –From dark fiber up Coarse-grained scheduled Researcher-controlled No single-vendor lock-in Equipment with exposable APIs Connectivity with substantial non-production resources
59 D u k e S y s t e m s Elements of Orca Research Agenda Automate management inside the cloud. –Programmable guest setup and provisioning Architect a guest-neutral platform. –Plug-in new guests through protocols; dont hard-wire them into the platform. Design flexible security into an open control plane. Enforce fair and efficient sharing for elastic guests. Incorporate diverse networked resources and virtual networks. Mine instrumentation data to pinpoint problems and select repair actions. Economic models and sustainability.
60 D u k e S y s t e m s Leasing Virtual Infrastructure - e.g., CPU shares, memory etc. slivers - storage server shares [Jin04] - measured, metered, independent units - varying degrees of performance isolation Policy agents control negotiation/arbitration. - programmatic, service-oriented leasing interfaces - lease contracts The hardware infrastructure consists of pools of typed raw resources distributed across sites. r a =(8,4) r b =(4,8) a b c r c =(4,4) 16 Guest request grant Provider Site Sites public key Guests public key resource description … unique ID at Site Sites signature
61 D u k e S y s t e m s Summary Factor actors/roles along the right boundaries. –stakeholders, innovation, tussle Open contracts with delegation Specific recommendations for GENI: –Aggregates are first-class entities –Component interface: permit innovation –Clearinghouse: enable policies under GSC direction
62 D u k e S y s t e m s Modularize Innovation Control plane design should enable local innovation within each entity. Can GENI be a platform for innovation of platforms? Management services? –How to carry forward the principle that PlanetLab calls unbundled management? E.g., how to evolve standards for information exchange and contracts. –Lingua Franca or Tower of Babel?
63 D u k e S y s t e m s Slices: Questions What helper tools/interfaces must we have and what do they require from the control plane? Will GENI enable research on new management services and control plane? –If software is the secret weapon, what parts of the platform are programmable/replaceable? Co-allocation/scheduling of an end-to-end slice? –What does predictable and repeatable mean? –What assurances are components permitted to offer? What level of control/stability do we assume over the substrate?
64 D u k e S y s t e m s Focus questions Specify/design the core services: –Important enough and hard enough to argue about –Must be part of facilities planning –Directly motivated by usage scenarios –Deliver maximum bang for ease-of-use –User-centric, network-centric Enable flowering of extensions/plugins –Find/integrate technology pieces of value What requirements do these services place on other WGs?