A BRIEF INTRODUCTION TO HIGH ASSURANCE CLOUD COMPUTING WITH ISIS2 Ken Birman 1 Cornell University.

Slides:



Advertisements
Similar presentations
COM vs. CORBA.
Advertisements

Isis 2 Design Choices A few puzzles to think about when considering use of Isis 2 in your work.
Replication. Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
CAN CLOUD COMPUTING SYSTEMS OFFER HIGH ASSURANCE WITHOUT LOSING KEY CLOUD PROPERTIES? Ken Birman, Cornell University CS
Implementing A Simple Storage Case Consider a simple case for distributed storage – I want to back up files from machine A on machine B Avoids many tricky.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
YOUR FIRST ISIS2 GROUP Ken Birman 1 Cornell University.
Ch 7-1 Working with workgroups-1. Objectives Working with workgroups Creating a workgroup Determining whether to use centralized or group sharing.
Distributed Systems 2006 Group Communication I * *With material adapted from Ken Birman.
EEC 688/788 Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Distributed Processing, Client/Server, and Clusters
Component Patterns – Architecture and Applications with EJB copyright © 2001, MATHEMA AG Component Patterns Architecture and Applications with EJB JavaForum.
A Dependable Auction System: Architecture and an Implementation Framework
Virtual Synchrony Jared Cantwell. Review Multicast Causal and total ordering Consistent Cuts Synchronized clocks Impossibility of consensus Distributed.
Technical Architectures
CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA.
Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP.
Scripting Languages For Virtual Worlds. Outline Necessary Features Classes, Prototypes, and Mixins Static vs. Dynamic Typing Concurrency Versioning Distribution.
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #13 Web Caching Protocols ICP, CARP.
Ken Birman Professor, Dept. of Computer Science.  Today’s cloud computing platforms are best for building “apps” like YouTube, web search  Highly elastic,
Distributed Systems 2006 Retrofitting Reliability* *With material adapted from Ken Birman.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering.
Distributed Systems 2006 Group Membership * *With material adapted from Ken Birman.
A Routing Control Platform for Managing IP Networks Jennifer Rexford Princeton University
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
PRASHANTHI NARAYAN NETTEM.
SM3121 Software Technology Mark Green School of Creative Media.
Lecture 8 Epidemic communication, Server implementation.
WHAT IS THE Isis2 LIBRARY?
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
A BRIEF INTRODUCTION TO HIGH ASSURANCE CLOUD COMPUTING WITH ISIS2 Ken Birman 1 Cornell University.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
What is Architecture  Architecture is a subjective thing, a shared understanding of a system’s design by the expert developers on a project  In the.
CH2 System models.
Team Skill 6: Building the Right System From Use Cases to Implementation (25)
CLOUD COMPUTING Lecture 27: CS 2110 Spring Computing has evolved...  Fifteen years ago: desktop/laptop + clusters  Then  Web sites  Social networking.
Exercises for Chapter 2: System models
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 3: Operating-System Structures System Components Operating System Services.
CAN CLOUD COMPUTING SYSTEMS OFFER HIGH ASSURANCE WITHOUT LOSING KEY CLOUD PROPERTIES? Ken Birman, Cornell University 1 CS6410.
The Alternative Larry Moore. 5 Nodes and Variant Input File Sizes Hadoop Alternative.
Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.
LRPC Firefly RPC, Lightweight RPC, Winsock Direct and VIA.
REPLICATING FILES AND OTHER BIG OBJECTS “OUT OF BAND” WITH ISIS2 Ken Birman 1 Cornell University.
COORDINATION, SYNCHRONIZATION AND LOCKING WITH ISIS2 Ken Birman 1 Cornell University.
Replication (1). Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems.
Lecture 4 Page 1 CS 111 Online Modularity and Virtualization CS 111 On-Line MS Program Operating Systems Peter Reiher.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
Group Communication Theresa Nguyen ICS243f Spring 2001.
Component Patterns – Architecture and Applications with EJB copyright © 2001, MATHEMA AG Component Patterns Architecture and Applications with EJB Markus.
WHAT ARE THE RIGHT ROLES FOR FORMAL METHODS IN HIGH ASSURANCE CLOUD COMPUTING? Ken Birman, Cornell University Princeton Nov
EEC 688/788 Secure and Dependable Computing Lecture 9 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Fault Tolerance (2). Topics r Reliable Group Communication.
ORDERING AND DURABILITY IN ISIS 2 Ken Birman 1 Cornell University.
CS5412: VIRTUAL SYNCHRONY Ken Birman 1 CS5412 Spring 2016 (Cloud Computing: Birman) Lecture XIV.
Powerpoint Templates Data Communication Muhammad Waseem Iqbal Lecture # 07 Spring-2016.
Replication & Fault Tolerance CONARD JAMES B. FARAON
Modularity Most useful abstractions an OS wants to offer can’t be directly realized by hardware Modularity is one technique the OS uses to provide better.
CS5412: Virtual Synchrony Lecture XIV Ken Birman
Distributed Cache Technology in Cloud Computing and its Application in the GIS Software Wang Qi Zhu Yitong Peng Cheng
Replication Middleware for Cloud Based Storage Service
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT -Sumanth Kandagatla Instructor: Prof. Yanqing Zhang Advanced Operating Systems (CSC 8320)
EEC 688/788 Secure and Dependable Computing
CS5412: Virtual Synchrony Lecture XIV Ken Birman
EEC 688/788 Secure and Dependable Computing
CS514: Intermediate Course in Operating Systems
EEC 688/788 Secure and Dependable Computing
Presentation transcript:

A BRIEF INTRODUCTION TO HIGH ASSURANCE CLOUD COMPUTING WITH ISIS2 Ken Birman 1 Cornell University

Isis 2 System 2  A prebuilt technology that automates many of the hard tasks involved in replicating services and the data on which they depend  Targets cloud computing settings  Available in open-source from isis2.codeplex.com  Intended to be easy to use…  … but lacks commercial support, and also lacks some of the polish of a commercial product

Isis 2 System  Elasticity (sudden scale changes)  Potentially heavily loads  High node failure rates  Concurrent (multithreaded) apps  Long scheduling delays, resource contention  Bursts of message loss  Need for very rapid response times  Community skeptical of “assurance properties”  C# library (but callable from any.NET language) offering replication techniques for cloud computing developers  Based on a model that fuses virtual synchrony and state machine replication models  Goal is to make it easy for users to deal with some very hard problems seen in distributed settings 3

Supported languages… 4  The main library is written in.NET / C#  …and so it is best used from C# Evolved from Java  IronPython also works, and you can use a version of C++ called C++/CLI .NET runs on Windows  Our developers use Visual Studio  … but also can be used on Linux  You work with Mono and use the Mono developer environment or compile using “mcs” at the command line

Other ways to access Isis2… 5  New “outboard” option  A recent addition…. It allows native C++ and C users to access a subset of the Isis 2 functionality  Initial focus is on file replication for memory-mapped files and other big memory-mapped objects  This outboard feature can be accessed from a Isis 2 command-line tool  To do so you run the Isis 2 system as a server  Commands to it can then be issued from any language

Why Isis 2 ? 6  The new version of Isis is a completely new system, but builds on a long history  First version was the Isis Toolkit (~ )  That system was used to create the French Air Traffic Control system, the NYSE trading platform, the US Navy AEGIS core communications library and many other mission-critical applications  After the Isis Toolkit, we created Horus and Ensemble (focus was on ties to formal methods)  Isis 2 is our latest and best technology. Pronounced “Isis two”, not “Isis squared”

What does Isis 2 do? 7  A library to help you create highly resilient, secure, scalable applications. The methods automate important tasks  You can run your program on a few machines of your own, or on a big cluster, or in an enterprise network, or even on a cloud platform like Amazon EC2  Isis 2 is a “specialist” in solving problems involving data replication within “groups” of programs

Imagine that some collection of programs are running on some set of machines. Data is said to be replicated if each of them has a copy, and sees the updates to that copy Key Concept: Replication 8

Time Replication in pictures 9  This is an example of a non-replicated server with a client who issues some form of request Update the monitoring and alarms criteria for Mrs. Marsh as follows… Confirmed Response delay seen by end-user would include Internet latencies Service response delay Service instance

Replication in pictures 10  Here we replicate data on multiple programs. The group of programs offers better reliability than a program, and could perform better too Update the monitoring and alarms criteria for Mrs. Marsh as follows… Confirmed Response delay seen by end-user would include Internet latencies Service response delay Service group

Things to notice 11  Although there is a group of programs, the client still talks to some representative.  Standard “web services” requests work in the usual way  The group members are just normal programs and could be running on the same machines, or different machines. They could be running the same code, or different code.  They cooperate to replicate data  No variables or memory is physically shared by the programs.  Instead, each has a private replica. When an update happens, a message is sent to tell all the group members update their copies

The big picture… Steps in using Isis 2 12

First questions 13  Step back and draw a sketch of your application  Client systems, if it has remote access  The service you want to build that will use Isis 2  Data files the service may need, or databases  Other subsystems or services it will interact with

Our example from earlier 14 Response Request Computes Service group Updates that modify replicated data

Where will the service run? 15  Will you run your replicated service…  Next to its clients, on the same machines?  On EC2, RackSpace, GooglePlex, MSFT Azure, ….?  In a cluster of computers up the hallway?  How will clients connect to it?  Web Services remote method invocation?  CORBA RPC? TCP connections?

Now build a non-replicated instance 16  Before adding Isis 2 mechanisms, build a single instance of your server and test access to it from your client, or whatever else might access it  Debug this code.  Adding Isis 2 functionality after the fact isn’t hard and may really be much easier than using it from the outset!  But planning ahead can help: using C#, C++/CLI or IronPython will be far easier than other options

… some people get stuck at step 1! 17  Building a cloud-hosted service involves steps you may never have tried before  For example, with Visual Studio, you create a new kind of service instance, and need to tell it which methods are remotely available, and then need to install and run it on your machine, which may involve firewall changes to allow clients to access it…  … Isis 2 can’t really help with that. But it isn’t terribly hard. Just do a small step at a time following the instructions for the package you decide to use

Why not native C or native C++? 18  We do support a stripped-down way to use Isis 2 from these and other languages, but you only can access part of the functionality  We’ll discuss these features later, but not today  If you know Java, you will find it easy to migrate to C#, and we would recommend that route  C# works on both Windows and Linux (via Mono)

Designing the replicated functions 19  Now you are ready to design the replication features of your service  Ask what data needs to be replicated?  How will it be loaded initially? How will it be updated?  Will you do load-balanced queries (if so they can just access any service member), or parallel queries?  What synchronization or coordination is needed?  Start with a good image of how you want it to work

State machine model 20  Think about a state machine  Some sort of object or class, with state in it (data)  The state is updated deterministically with events occur (updates…), and it can be checkpointed and later loaded back in from the checkpoint  With Isis 2 we simply replicate a state machine!  You design an object or subsystem that has the replicated data in it  The updates become multicasts, totally ordered, and also group membership changes

Replication in pictures 21  By the time you are ready to code, you’ll have a picture similar to this one, specific to your setting Update the monitoring and alarms criteria for Mrs. Marsh as follows… Confirmed Response delay seen by end-user would include Internet latencies Service response delay Service group Hosted on EC2 Ordered multicast to replicate update

From this picture… 22  Now your needs are more concrete  How will I represent the replicated state in my group?  How can I copy the current data to a joining member?  I don’t want to reply to the client until all the replicas have been updated. What’s the best way to do that?  I’m worried about speed. What will limit response delays? How fast can a service like this run?  This set of MOOC modules will help you answer such questions, but experiments and optimization of your solution are certainly needed too…

(some data…) Let’s replicate! 23

How would you solve this problem? 24  Assume you have access to a client-server technology, like the ones used to build web services  You would need a list of group members  Then when an update happens, the program receiving the request could update the others

Replication version 0 25 X=5 Set x=x+150 X=155

Things you’ll need to think about 26  Keep track of group membership. Can it change?  If new members join, how do they learn the initial value of x, or other “replicated state”?  When someone joins, how do we update the group membership list?  If a member fails, how are they dropped from the group?  Implement the 1-to-(N-1) “multicast”  Handle reliability issues: what if a request times out?  Handle security: should we use SSL? Something else?  What if two conflicting updates happen concurrently?

This sounds hard! 27  … in fact it is very hard  Any form of replication is equivalent to a famous problem called distributed consensus  There are many published papers on this topic; you’ll want to use a good solution  Once you solve the basic issues you’ll still face many challenges of performance, scale, portability, respecting rules for the particular runtime setting…

We say that replicated data is consistent if multiple users accessing it can’t detect whether or not it was replicated. Key Concept: Consistency 28

Consistent replicated data 29 X=5 Set x=x+150 X=155 Set x=22 X=22

Inconsistent replicated data 30 X=5 Set x=x+150 X=155 Set x=22 X=22 Thinks x=155 Thinks x=22 Isis 2 never lets this happen. It automatically enforces ordering for you

Possible sources of inconsistency? 31  Conflicting updates could reach members in different orders  Maybe someone dropped an update (network message loss), or applied one twice.  Update sender could have failed while sending the updates, so that some copies were sent, but others weren’t sent  Confusion about membership: perhaps some member was joining and the update initiator didn’t realize it, hence didn’t send the update to it, or the initial state was “wrong”

A subtle risk 32  Failure handling poses hard problems!  Suppose that process A is supposed to send an update reliably to processes B, C and D  A might use TCP, or some other reliable protocol. But if process C fails, A needs to give up.  What if the network temporarily fails, and A can’t reach C? This is indistinguishable from C failing, except that later, C will be accessible again!

Split brain syndrome 33  We say that the network “partitioning” issue causes a “split brain” problem  Some members of our group think that A and B the healthy members. Others think that C and D are healthy and that A and B have failed.  Which “side” is right?  Who should be in charge if this group is doing something critical, like deciding which plane can land on a particular runway?

Split brain syndrome 34  Inconsistency can be very dangerous! Flight AA 27 clear to land on runway 3-B Flight US 1827 clear to take off on runway 3-B

Split brain syndrome 35 Flight US 1827 clear to take off on runway 3-B ABCD  Consider this air traffic control “server”. It helps make sure runways are used safely.

Split brain syndrome 36 Flight AA 27 clear to land on runway 3-B Flight US 1827 clear to take off on runway 3-B ABCD Temporary network failure: A and B can talk to each other but can’t reach C or D, and vice-versa Isis 2 never lets this happen. It automatically prevents split-brain behavior

Other settings that need consistency 37  Medical systems that give information about patient status and current drug regimen  Smart power grid system that controls power infrastructure components such as transformers  Self-driving vehicles that coordinate while driving at high speed on highways  Control systems for chemical plants and refineries  Corporate accounting systems, and payroll  … you can make quite a long list.

Cloud computing: CAP theorem 38  The most common tools for building cloud computing solutions don’t help with consistency  They treat the property as a special need and assume you’ll find your own ways to do this  This reflects the so-called CAP theorem  “You can have at most two out of these three: Consistency, Availability and Partition Tolerance.” Eric Brewer, Berkeley  Cloud platforms assume you are not sophisticated enough to make tradeoffs. They weaken consistency to get the fastest possible response times under the widest possible range of operating conditions.

By using a preexisting library like Isis 2 you don’t need to solve these problems yourself The system solves them for you Isis 2 to the rescue! 39 In Egyptian mythology, Isis rescued Osiris after he was torn to pieces in an epic battle with Horus. She restored him to life and he went on to rule the underworld. Their child, Set, later defeated Horus and banished him. The story illustrates the value of fault-tolerance

Isis 2 makes developer’s life easier  Formal model permits us to achieve correctness  Isis 2 is based on protocols that can be expressed mathematically and proved correct  Think of Isis 2 as a collection of modules, each with rigorously stated properties  Isis 2 implementation needs to be fast, lean, easy to use  Developer must see it as easier to use Isis 2 than to build from scratch  Seek great performance under “cloudy conditions”  Forced to anticipate many styles of use Benefits of Using Formal model Importance of Sound Engineering 40

Isis 2 makes developer’s life easier Group g = new Group(“myGroup”); Dictionary Values = new Dictionary (); g.ViewHandlers += delegate(View v) { Console.Title = “myGroup members: “+v.members; }; g.Handlers[UPDATE] += delegate(string s, double v) { Values[s] = v; }; g.Handlers[LOOKUP] += delegate(string s) { g.Reply(Values[s]); }; g.Join(); g.OrderedSend(UPDATE, “Harry”, 20.75); List resultlist = new List (); nr = g.Query(ALL, LOOKUP, “Harry”, EOL, resultlist);  First sets up group  Join makes this entity a member. State transfer isn’t shown  Then can multicast, query. Runtime callbacks to the “delegates” as events arrive  Easy to request security (g.SetSecure), persistence  “Consistency” model dictates the ordering aseen for event upcalls and the assumptions user can make 41

Isis 2 makes developer’s life easier Group g = new Group(“myGroup”); Dictionary Values = new Dictionary (); g.ViewHandlers += delegate(View v) { Console.Title = “myGroup members: “+v.members; }; g.Handlers[UPDATE] += delegate(string s, double v) { Values[s] = v; }; g.Handlers[LOOKUP] += delegate(string s) { g.Reply(Values[s]); }; g.Join(); g.OrderedSend(UPDATE, “Harry”, 20.75); List resultlist = new List (); nr = g.Query(ALL, LOOKUP, “Harry”, EOL, resultlist);  First sets up group  Join makes this entity a member. State transfer isn’t shown  Then can multicast, query. Runtime callbacks to the “delegates” as events arrive  Easy to request security (g.SetSecure), persistence  “Consistency” model dictates the ordering seen for event upcalls and the assumptions user can make 42

Isis 2 makes developer’s life easier Group g = new Group(“myGroup”); Dictionary Values = new Dictionary (); g.ViewHandlers += delegate(View v) { Console.Title = “myGroup members: “+v.members; }; g.Handlers[UPDATE] += delegate(string s, double v) { Values[s] = v; }; g.Handlers[LOOKUP] += delegate(string s) { g.Reply(Values[s]); }; g.Join(); g.OrderedSend(UPDATE, “Harry”, 20.75); List resultlist = new List (); nr = g.Query(ALL, LOOKUP, “Harry”, EOL, resultlist);  First sets up group  Join makes this entity a member. State transfer isn’t shown  Then can multicast, query. Runtime callbacks to the “delegates” as events arrive  Easy to request security (g.SetSecure), persistence  “Consistency” model dictates the ordering seen for event upcalls and the assumptions user can make 43

Isis 2 makes developer’s life easier Group g = new Group(“myGroup”); Dictionary Values = new Dictionary (); g.ViewHandlers += delegate(View v) { Console.Title = “myGroup members: “+v.members; }; g.Handlers[UPDATE] += delegate(string s, double v) { Values[s] = v; }; g.Handlers[LOOKUP] += delegate(string s) { g.Reply(Values[s]); }; g.Join(); g.OrderedSend(UPDATE, “Harry”, 20.75); List resultlist = new List (); nr = g.Query(ALL, LOOKUP, “Harry”, EOL, resultlist);  First sets up group  Join makes this entity a member. State transfer isn’t shown  Then can multicast, query. Runtime callbacks to the “delegates” as events arrive  Easy to request security (g.SetSecure), persistence  “Consistency” model dictates the ordering seen for event upcalls and the assumptions user can make 44

Isis 2 makes developer’s life easier Group g = new Group(“myGroup”); Dictionary Values = new Dictionary (); g.ViewHandlers += delegate(View v) { Console.Title = “myGroup members: “+v.members; }; g.Handlers[UPDATE] += delegate(string s, double v) { Values[s] = v; }; g.Handlers[LOOKUP] += delegate(string s) { g.Reply(Values[s]); }; g.Join(); g.OrderedSend(UPDATE, “Harry”, 20.75); List resultlist = new List (); nr = g.Query(ALL, LOOKUP, “Harry”, EOL, resultlist);  First sets up group  Join makes this entity a member. State transfer isn’t shown  Then can multicast, query. Runtime callbacks to the “delegates” as events arrive  Easy to request security (g.SetSecure), persistence  “Consistency” model dictates the ordering seen for event upcalls and the assumptions user can make 45

Each replica is a completely standard program with its own private, local variables. Nothing is “physically shared”! So in particular, the variable “Values” is local: each program has its own copy of this variable The different copies get the same messages, in the same order. This allows them to apply the same updates. Key Concept: Local Data 46

C# Dictionary Generic Type 47  C# supports generic types  Objects in which some other type is a parameter: Dictionary Values = new Dictionary ();  A C# Dictionary maps a key to a value. This Dictionary maps strings to doubles.  The variable name is Values. Each program has its own private copy.

Isis 2 makes developer’s life easier Group g = new Group(“myGroup”); Dictionary Values = new Dictionary (); g.ViewHandlers += delegate(View v) { Console.Title = “myGroup members: “+v.members; }; g.Handlers[UPDATE] += delegate(string s, double v) { Values[s] = v; }; g.Handlers[LOOKUP] += delegate(string s) { g.Reply(Values[s]); }; g.Join(); g.OrderedSend(UPDATE, “Harry”, 20.75); List resultlist = new List (); nr = g.Query(ALL, LOOKUP, “Harry”, EOL, resultlist);  First sets up group  Join makes this entity a member. State transfer isn’t shown  Then can multicast, query. Runtime callbacks to the “delegates” as events arrive  Easy to request security (g.SetSecure), persistence  “Consistency” model dictates the ordering seen for event upcalls and the assumptions user can make 48

Isis 2 makes developer’s life easier Group g = new Group(“myGroup”); Dictionary Values = new Dictionary (); g.ViewHandlers += delegate(View v) { Console.Title = “myGroup members: “+v.members; }; g.Handlers[UPDATE] += delegate(string s, double v) { Values[s] = v; }; g.Handlers[LOOKUP] += delegate(string s) { g.Reply(Values[s]); }; g.Join(); g.Send(UPDATE, “Harry”, 20.75); List resultlist = new List (); nr = g.Query(ALL, LOOKUP, “Harry”, EOL, resultlist);  First sets up group  Join makes this entity a member. State transfer isn’t shown  Then can multicast, query. Runtime callbacks to the “delegates” as events arrive  Easy to request security (g.SetSecure), persistence  “Consistency” model dictates the ordering seen for event upcalls and the assumptions user can make 49

Concept: A “multi-query”  Our lookup is  Multicast to the group  All members respond  A chance for parallelism  Each can do part of the job: e.g. search 1/n th of a database  Reduces response delays 50 Lookup “Harry” in the Ithaca phone directory Front end With n replicas we get an n times speedup! Names with Harry in them:....

Isis 2 makes developer’s life easier Group g = new Group(“myGroup”); Dictionary Values = new Dictionary (); g.ViewHandlers += delegate(View v) { Console.Title = “myGroup members: “+v.members; }; g.Handlers[UPDATE] += delegate(string s, double v) { Values[s] = v; }; g.Handlers[LOOKUP] += delegate(string s) { g.Reply(Values[s]); }; g.Join(); g.OrderedSend(UPDATE, “Harry”, 20.75); List resultlist = new List (); nr = g.OrderedQuery(ALL, LOOKUP, “Harry”, EOL, resultlist);  First sets up group  Join makes this entity a member. State transfer isn’t shown  Then can multicast, query. Runtime callbacks to the “delegates” as events arrive  Easy to request security (g.SetSecure), persistence  “Consistency” model dictates the ordering seen for event upcalls and the assumptions user can make 51 Tells Isis 2 which kind of multicast order and durability you need

RPC versus multi-query  In a remote procedure call, the client asks the server to do something  A message is sent  Server replies in a second message  Result returned much as from a local method  With a multi-query  A multicast is sent  Servers respond  A list of results is returned  If the group had N members when the query was done, we get N responses 52

But all the replies will be the same! 53  In our example, Values is identical in all replicas  … so what would be the point in asking for ALL replies?  In fact Isis 2 has many options  You could just ask one group member to do the query. With N members, the group can handle N times the load  You could ask the k’th member to search just 1/k’th of the Values array. This would be N times faster  You could combine these ideas and treat your group as if it had N members in N/K subsets of size K each…

Adding security: Just one line! Group g = new Group(“myGroup”); Dictionary Values = new Dictionary (); g.ViewHandlers += delegate(View v) { Console.Title = “myGroup members: “+v.members; }; g.Handlers[UPDATE] += delegate(string s, double v) { Values[s] = v; }; g.Handlers[LOOKUP] += delegate(string s) { g.Reply(Values[s]); }; g.SetSecure(myKey); g.Join(); g.OrderedSend(UPDATE, “Harry”, 20.75); List resultlist = new List (); nr = g.OrderedQuery(ALL, LOOKUP, “Harry”, EOL, resultlist);  First sets up group  Join makes this entity a member. State transfer isn’t shown  Then can multicast, query. Runtime callbacks to the “delegates” as events arrive  Easy to request security, persistence, UNICAST_ONLY, etc  “Consistency” model dictates the ordering seen for event upcalls and the assumptions user can make 54

Picking a multicast primitive  Optimistic delivery  RawSend, Send, CausalSend, OrderedSend  Use Flush(k) prior to talking to an external entity like a client  Very fast, ideal for services with “soft state”  Very pessimistic  SafeSend: a version of Paxos, like OrderedSend but with stronger durability  Active “disk logger” for strongest assurance  Slow, but suitable for updating a replicated database or persistent external files 55 We’ll discuss these issues in detail later

Our example was overly simple 56  it didn’t show the “state transfer” code  Corresponds to the “white arrows” in time-line figure  In Isis 2 we have a way to make checkpoints  State transfer: Some active member makes a checkpoint, and the joiner loads the state from it.  The code looks like other operations in our example  Checkpoints can also be used to save group state during periods when all members are inactive Time 

Some uses for process groups  To replicate data maintained by the members in memory  To replicate actions taken on an external service such as a replicated database  To ensure that all replicas are configured the same way  To coordinate the processing of requests and load-balance  To offer a way to parallelize processing by having each group member do part of the work  Fault-tolerance via a backup scheme 57

Isis 2 Summary 58  A library that you can invoke from a normal program written in a normal way  It does the work of creating groups and sending multicasts and ensuring that the consistency model will be enforced  The developer just tells it what to do.  She thinks about a parallel distributed application.  Virtual synchrony eliminates many hard problems

Why not build it yourself from scratch? 59 Isis 2 user object Isis 2 library Group instances and multicast protocols Flow Control Membership Oracle Large Group LayerUDP overlayDr. MulticastPlatform Security Reliable SendingFragmentationGroup Security Sense Runtime Environment Self-stabilizing Bootstrap Protocol Socket Mgt/Send/Rcv Send CausalSend OrderedSend SafeSend Query.... Message Library “Wrapped” locks Bounded Buffers Oracle Membership Group membership Report suspected failures Other group members SafeSend and Send are two of the protocol components hosted over what we call the large-scale properties sandbox. The sandbox addresses issues like flow control, security, etc. All protocols share and benefit from those properties  These systems are complex, especially if you want to run on platforms like EC2  By using Isis 2 you “inherit” 30 years of research on how to make it work OOB File ReplicationKey-Value Store Views The SandBox itself is mostly composed of “convergent” protocols that use probabilistic methods

Isis 2 : Cornell’s cloud solution 60  Isis 2 makes it easy to write programs in a standard way, using the methods in its library, that replicate data and can coordinate their actions  The system automates tasks needed to ensure consistency  Without the system, consistency, security and other such issues are very hard to solve  But you still need to decide where to run your programs, make sure they can find one-another, etc. Isis 2 focuses mostly on keeping them consistent.