Presentation is loading. Please wait.

Presentation is loading. Please wait.

A BRIEF INTRODUCTION TO HIGH ASSURANCE CLOUD COMPUTING WITH ISIS2 Ken Birman 1 Cornell University.

Similar presentations


Presentation on theme: "A BRIEF INTRODUCTION TO HIGH ASSURANCE CLOUD COMPUTING WITH ISIS2 Ken Birman 1 Cornell University."— Presentation transcript:

1 A BRIEF INTRODUCTION TO HIGH ASSURANCE CLOUD COMPUTING WITH ISIS2 Ken Birman 1 Cornell University

2 Isis 2 System 2  A prebuilt technology that automates many of the hard tasks involved in replicating services and the data on which they depend  Targets cloud computing settings  Available in open-source from isis2.codeplex.com  Intended to be easy to use…  … but lacks commercial support, and also lacks some of the polish of a commercial product

3 Isis 2 System  Elasticity (sudden scale changes)  Potentially heavily loads  High node failure rates  Concurrent (multithreaded) apps  Long scheduling delays, resource contention  Bursts of message loss  Need for very rapid response times  Community skeptical of “assurance properties”  C# library (but callable from any.NET language) offering replication techniques for cloud computing developers  Based on a model that fuses virtual synchrony and state machine replication models  Goal is to make it easy for users to deal with some very hard problems seen in distributed settings 3

4 Supported languages… 4  The main library is written in.NET / C#  …and so it is best used from C# Evolved from Java  IronPython also works, and you can use a version of C++ called C++/CLI .NET runs on Windows  Our developers use Visual Studio  … but also can be used on Linux  You work with Mono and use the Mono developer environment or compile using “mcs” at the command line

5 Other ways to access Isis2… 5  New “outboard” option  A recent addition…. It allows native C++ and C users to access a subset of the Isis 2 functionality  Initial focus is on file replication for memory-mapped files and other big memory-mapped objects  This outboard feature can be accessed from a Isis 2 command-line tool  To do so you run the Isis 2 system as a server  Commands to it can then be issued from any language

6 Why Isis 2 ? 6  The new version of Isis is a completely new system, but builds on a long history  First version was the Isis Toolkit (~1985-1996)  That system was used to create the French Air Traffic Control system, the NYSE trading platform, the US Navy AEGIS core communications library and many other mission-critical applications  After the Isis Toolkit, we created Horus and Ensemble (focus was on ties to formal methods)  Isis 2 is our latest and best technology. Pronounced “Isis two”, not “Isis squared”

7 What does Isis 2 do? 7  A library to help you create highly resilient, secure, scalable applications. The methods automate important tasks  You can run your program on a few machines of your own, or on a big cluster, or in an enterprise network, or even on a cloud platform like Amazon EC2  Isis 2 is a “specialist” in solving problems involving data replication within “groups” of programs

8 Imagine that some collection of programs are running on some set of machines. Data is said to be replicated if each of them has a copy, and sees the updates to that copy Key Concept: Replication 8

9 Time Replication in pictures 9  This is an example of a non-replicated server with a client who issues some form of request Update the monitoring and alarms criteria for Mrs. Marsh as follows… Confirmed Response delay seen by end-user would include Internet latencies Service response delay Service instance

10 Replication in pictures 10  Here we replicate data on multiple programs. The group of programs offers better reliability than a program, and could perform better too Update the monitoring and alarms criteria for Mrs. Marsh as follows… Confirmed Response delay seen by end-user would include Internet latencies Service response delay Service group

11 Things to notice 11  Although there is a group of programs, the client still talks to some representative.  Standard “web services” requests work in the usual way  The group members are just normal programs and could be running on the same machines, or different machines. They could be running the same code, or different code.  They cooperate to replicate data  No variables or memory is physically shared by the programs.  Instead, each has a private replica. When an update happens, a message is sent to tell all the group members update their copies

12 The big picture… Steps in using Isis 2 12

13 First questions 13  Step back and draw a sketch of your application  Client systems, if it has remote access  The service you want to build that will use Isis 2  Data files the service may need, or databases  Other subsystems or services it will interact with

14 Our example from earlier 14 Response Request Computes Service group Updates that modify replicated data

15 Where will the service run? 15  Will you run your replicated service…  Next to its clients, on the same machines?  On EC2, RackSpace, GooglePlex, MSFT Azure, ….?  In a cluster of computers up the hallway?  How will clients connect to it?  Web Services remote method invocation?  CORBA RPC? TCP connections?

16 Now build a non-replicated instance 16  Before adding Isis 2 mechanisms, build a single instance of your server and test access to it from your client, or whatever else might access it  Debug this code.  Adding Isis 2 functionality after the fact isn’t hard and may really be much easier than using it from the outset!  But planning ahead can help: using C#, C++/CLI or IronPython will be far easier than other options

17 … some people get stuck at step 1! 17  Building a cloud-hosted service involves steps you may never have tried before  For example, with Visual Studio, you create a new kind of service instance, and need to tell it which methods are remotely available, and then need to install and run it on your machine, which may involve firewall changes to allow clients to access it…  … Isis 2 can’t really help with that. But it isn’t terribly hard. Just do a small step at a time following the instructions for the package you decide to use

18 Why not native C or native C++? 18  We do support a stripped-down way to use Isis 2 from these and other languages, but you only can access part of the functionality  We’ll discuss these features later, but not today  If you know Java, you will find it easy to migrate to C#, and we would recommend that route  C# works on both Windows and Linux (via Mono)

19 Designing the replicated functions 19  Now you are ready to design the replication features of your service  Ask what data needs to be replicated?  How will it be loaded initially? How will it be updated?  Will you do load-balanced queries (if so they can just access any service member), or parallel queries?  What synchronization or coordination is needed?  Start with a good image of how you want it to work

20 State machine model 20  Think about a state machine  Some sort of object or class, with state in it (data)  The state is updated deterministically with events occur (updates…), and it can be checkpointed and later loaded back in from the checkpoint  With Isis 2 we simply replicate a state machine!  You design an object or subsystem that has the replicated data in it  The updates become multicasts, totally ordered, and also group membership changes

21 Replication in pictures 21  By the time you are ready to code, you’ll have a picture similar to this one, specific to your setting Update the monitoring and alarms criteria for Mrs. Marsh as follows… Confirmed Response delay seen by end-user would include Internet latencies Service response delay Service group Hosted on EC2 Ordered multicast to replicate update

22 From this picture… 22  Now your needs are more concrete  How will I represent the replicated state in my group?  How can I copy the current data to a joining member?  I don’t want to reply to the client until all the replicas have been updated. What’s the best way to do that?  I’m worried about speed. What will limit response delays? How fast can a service like this run?  This set of MOOC modules will help you answer such questions, but experiments and optimization of your solution are certainly needed too…

23 (some data…) Let’s replicate! 23

24 How would you solve this problem? 24  Assume you have access to a client-server technology, like the ones used to build web services  You would need a list of group members  Then when an update happens, the program receiving the request could update the others

25 Replication version 0 25 X=5 Set x=x+150 X=155

26 Things you’ll need to think about 26  Keep track of group membership. Can it change?  If new members join, how do they learn the initial value of x, or other “replicated state”?  When someone joins, how do we update the group membership list?  If a member fails, how are they dropped from the group?  Implement the 1-to-(N-1) “multicast”  Handle reliability issues: what if a request times out?  Handle security: should we use SSL? Something else?  What if two conflicting updates happen concurrently?

27 This sounds hard! 27  … in fact it is very hard  Any form of replication is equivalent to a famous problem called distributed consensus  There are many published papers on this topic; you’ll want to use a good solution  Once you solve the basic issues you’ll still face many challenges of performance, scale, portability, respecting rules for the particular runtime setting…

28 We say that replicated data is consistent if multiple users accessing it can’t detect whether or not it was replicated. Key Concept: Consistency 28

29 Consistent replicated data 29 X=5 Set x=x+150 X=155 Set x=22 X=22

30 Inconsistent replicated data 30 X=5 Set x=x+150 X=155 Set x=22 X=22 Thinks x=155 Thinks x=22 Isis 2 never lets this happen. It automatically enforces ordering for you

31 Possible sources of inconsistency? 31  Conflicting updates could reach members in different orders  Maybe someone dropped an update (network message loss), or applied one twice.  Update sender could have failed while sending the updates, so that some copies were sent, but others weren’t sent  Confusion about membership: perhaps some member was joining and the update initiator didn’t realize it, hence didn’t send the update to it, or the initial state was “wrong”

32 A subtle risk 32  Failure handling poses hard problems!  Suppose that process A is supposed to send an update reliably to processes B, C and D  A might use TCP, or some other reliable protocol. But if process C fails, A needs to give up.  What if the network temporarily fails, and A can’t reach C? This is indistinguishable from C failing, except that later, C will be accessible again!

33 Split brain syndrome 33  We say that the network “partitioning” issue causes a “split brain” problem  Some members of our group think that A and B the healthy members. Others think that C and D are healthy and that A and B have failed.  Which “side” is right?  Who should be in charge if this group is doing something critical, like deciding which plane can land on a particular runway?

34 Split brain syndrome 34  Inconsistency can be very dangerous! Flight AA 27 clear to land on runway 3-B Flight US 1827 clear to take off on runway 3-B

35 Split brain syndrome 35 Flight US 1827 clear to take off on runway 3-B ABCD  Consider this air traffic control “server”. It helps make sure runways are used safely.

36 Split brain syndrome 36 Flight AA 27 clear to land on runway 3-B Flight US 1827 clear to take off on runway 3-B ABCD Temporary network failure: A and B can talk to each other but can’t reach C or D, and vice-versa Isis 2 never lets this happen. It automatically prevents split-brain behavior

37 Other settings that need consistency 37  Medical systems that give information about patient status and current drug regimen  Smart power grid system that controls power infrastructure components such as transformers  Self-driving vehicles that coordinate while driving at high speed on highways  Control systems for chemical plants and refineries  Corporate accounting systems, and payroll  … you can make quite a long list.

38 Cloud computing: CAP theorem 38  The most common tools for building cloud computing solutions don’t help with consistency  They treat the property as a special need and assume you’ll find your own ways to do this  This reflects the so-called CAP theorem  “You can have at most two out of these three: Consistency, Availability and Partition Tolerance.” Eric Brewer, Berkeley  Cloud platforms assume you are not sophisticated enough to make tradeoffs. They weaken consistency to get the fastest possible response times under the widest possible range of operating conditions.

39 By using a preexisting library like Isis 2 you don’t need to solve these problems yourself The system solves them for you Isis 2 to the rescue! 39 In Egyptian mythology, Isis rescued Osiris after he was torn to pieces in an epic battle with Horus. She restored him to life and he went on to rule the underworld. Their child, Set, later defeated Horus and banished him. The story illustrates the value of fault-tolerance

40 Isis 2 makes developer’s life easier  Formal model permits us to achieve correctness  Isis 2 is based on protocols that can be expressed mathematically and proved correct  Think of Isis 2 as a collection of modules, each with rigorously stated properties  Isis 2 implementation needs to be fast, lean, easy to use  Developer must see it as easier to use Isis 2 than to build from scratch  Seek great performance under “cloudy conditions”  Forced to anticipate many styles of use Benefits of Using Formal model Importance of Sound Engineering 40

41 Isis 2 makes developer’s life easier Group g = new Group(“myGroup”); Dictionary Values = new Dictionary (); g.ViewHandlers += delegate(View v) { Console.Title = “myGroup members: “+v.members; }; g.Handlers[UPDATE] += delegate(string s, double v) { Values[s] = v; }; g.Handlers[LOOKUP] += delegate(string s) { g.Reply(Values[s]); }; g.Join(); g.OrderedSend(UPDATE, “Harry”, 20.75); List resultlist = new List (); nr = g.Query(ALL, LOOKUP, “Harry”, EOL, resultlist);  First sets up group  Join makes this entity a member. State transfer isn’t shown  Then can multicast, query. Runtime callbacks to the “delegates” as events arrive  Easy to request security (g.SetSecure), persistence  “Consistency” model dictates the ordering aseen for event upcalls and the assumptions user can make 41

42 Isis 2 makes developer’s life easier Group g = new Group(“myGroup”); Dictionary Values = new Dictionary (); g.ViewHandlers += delegate(View v) { Console.Title = “myGroup members: “+v.members; }; g.Handlers[UPDATE] += delegate(string s, double v) { Values[s] = v; }; g.Handlers[LOOKUP] += delegate(string s) { g.Reply(Values[s]); }; g.Join(); g.OrderedSend(UPDATE, “Harry”, 20.75); List resultlist = new List (); nr = g.Query(ALL, LOOKUP, “Harry”, EOL, resultlist);  First sets up group  Join makes this entity a member. State transfer isn’t shown  Then can multicast, query. Runtime callbacks to the “delegates” as events arrive  Easy to request security (g.SetSecure), persistence  “Consistency” model dictates the ordering seen for event upcalls and the assumptions user can make 42

43 Isis 2 makes developer’s life easier Group g = new Group(“myGroup”); Dictionary Values = new Dictionary (); g.ViewHandlers += delegate(View v) { Console.Title = “myGroup members: “+v.members; }; g.Handlers[UPDATE] += delegate(string s, double v) { Values[s] = v; }; g.Handlers[LOOKUP] += delegate(string s) { g.Reply(Values[s]); }; g.Join(); g.OrderedSend(UPDATE, “Harry”, 20.75); List resultlist = new List (); nr = g.Query(ALL, LOOKUP, “Harry”, EOL, resultlist);  First sets up group  Join makes this entity a member. State transfer isn’t shown  Then can multicast, query. Runtime callbacks to the “delegates” as events arrive  Easy to request security (g.SetSecure), persistence  “Consistency” model dictates the ordering seen for event upcalls and the assumptions user can make 43

44 Isis 2 makes developer’s life easier Group g = new Group(“myGroup”); Dictionary Values = new Dictionary (); g.ViewHandlers += delegate(View v) { Console.Title = “myGroup members: “+v.members; }; g.Handlers[UPDATE] += delegate(string s, double v) { Values[s] = v; }; g.Handlers[LOOKUP] += delegate(string s) { g.Reply(Values[s]); }; g.Join(); g.OrderedSend(UPDATE, “Harry”, 20.75); List resultlist = new List (); nr = g.Query(ALL, LOOKUP, “Harry”, EOL, resultlist);  First sets up group  Join makes this entity a member. State transfer isn’t shown  Then can multicast, query. Runtime callbacks to the “delegates” as events arrive  Easy to request security (g.SetSecure), persistence  “Consistency” model dictates the ordering seen for event upcalls and the assumptions user can make 44

45 Isis 2 makes developer’s life easier Group g = new Group(“myGroup”); Dictionary Values = new Dictionary (); g.ViewHandlers += delegate(View v) { Console.Title = “myGroup members: “+v.members; }; g.Handlers[UPDATE] += delegate(string s, double v) { Values[s] = v; }; g.Handlers[LOOKUP] += delegate(string s) { g.Reply(Values[s]); }; g.Join(); g.OrderedSend(UPDATE, “Harry”, 20.75); List resultlist = new List (); nr = g.Query(ALL, LOOKUP, “Harry”, EOL, resultlist);  First sets up group  Join makes this entity a member. State transfer isn’t shown  Then can multicast, query. Runtime callbacks to the “delegates” as events arrive  Easy to request security (g.SetSecure), persistence  “Consistency” model dictates the ordering seen for event upcalls and the assumptions user can make 45

46 Each replica is a completely standard program with its own private, local variables. Nothing is “physically shared”! So in particular, the variable “Values” is local: each program has its own copy of this variable The different copies get the same messages, in the same order. This allows them to apply the same updates. Key Concept: Local Data 46

47 C# Dictionary Generic Type 47  C# supports generic types  Objects in which some other type is a parameter: Dictionary Values = new Dictionary ();  A C# Dictionary maps a key to a value. This Dictionary maps strings to doubles.  The variable name is Values. Each program has its own private copy.

48 Isis 2 makes developer’s life easier Group g = new Group(“myGroup”); Dictionary Values = new Dictionary (); g.ViewHandlers += delegate(View v) { Console.Title = “myGroup members: “+v.members; }; g.Handlers[UPDATE] += delegate(string s, double v) { Values[s] = v; }; g.Handlers[LOOKUP] += delegate(string s) { g.Reply(Values[s]); }; g.Join(); g.OrderedSend(UPDATE, “Harry”, 20.75); List resultlist = new List (); nr = g.Query(ALL, LOOKUP, “Harry”, EOL, resultlist);  First sets up group  Join makes this entity a member. State transfer isn’t shown  Then can multicast, query. Runtime callbacks to the “delegates” as events arrive  Easy to request security (g.SetSecure), persistence  “Consistency” model dictates the ordering seen for event upcalls and the assumptions user can make 48

49 Isis 2 makes developer’s life easier Group g = new Group(“myGroup”); Dictionary Values = new Dictionary (); g.ViewHandlers += delegate(View v) { Console.Title = “myGroup members: “+v.members; }; g.Handlers[UPDATE] += delegate(string s, double v) { Values[s] = v; }; g.Handlers[LOOKUP] += delegate(string s) { g.Reply(Values[s]); }; g.Join(); g.Send(UPDATE, “Harry”, 20.75); List resultlist = new List (); nr = g.Query(ALL, LOOKUP, “Harry”, EOL, resultlist);  First sets up group  Join makes this entity a member. State transfer isn’t shown  Then can multicast, query. Runtime callbacks to the “delegates” as events arrive  Easy to request security (g.SetSecure), persistence  “Consistency” model dictates the ordering seen for event upcalls and the assumptions user can make 49

50 Concept: A “multi-query”  Our lookup is  Multicast to the group  All members respond  A chance for parallelism  Each can do part of the job: e.g. search 1/n th of a database  Reduces response delays 50 Lookup “Harry” in the Ithaca phone directory Front end With n replicas...... we get an n times speedup! Names with Harry in them:....

51 Isis 2 makes developer’s life easier Group g = new Group(“myGroup”); Dictionary Values = new Dictionary (); g.ViewHandlers += delegate(View v) { Console.Title = “myGroup members: “+v.members; }; g.Handlers[UPDATE] += delegate(string s, double v) { Values[s] = v; }; g.Handlers[LOOKUP] += delegate(string s) { g.Reply(Values[s]); }; g.Join(); g.OrderedSend(UPDATE, “Harry”, 20.75); List resultlist = new List (); nr = g.OrderedQuery(ALL, LOOKUP, “Harry”, EOL, resultlist);  First sets up group  Join makes this entity a member. State transfer isn’t shown  Then can multicast, query. Runtime callbacks to the “delegates” as events arrive  Easy to request security (g.SetSecure), persistence  “Consistency” model dictates the ordering seen for event upcalls and the assumptions user can make 51 Tells Isis 2 which kind of multicast order and durability you need

52 RPC versus multi-query  In a remote procedure call, the client asks the server to do something  A message is sent  Server replies in a second message  Result returned much as from a local method  With a multi-query  A multicast is sent  Servers respond  A list of results is returned  If the group had N members when the query was done, we get N responses 52

53 But all the replies will be the same! 53  In our example, Values is identical in all replicas  … so what would be the point in asking for ALL replies?  In fact Isis 2 has many options  You could just ask one group member to do the query. With N members, the group can handle N times the load  You could ask the k’th member to search just 1/k’th of the Values array. This would be N times faster  You could combine these ideas and treat your group as if it had N members in N/K subsets of size K each…

54 Adding security: Just one line! Group g = new Group(“myGroup”); Dictionary Values = new Dictionary (); g.ViewHandlers += delegate(View v) { Console.Title = “myGroup members: “+v.members; }; g.Handlers[UPDATE] += delegate(string s, double v) { Values[s] = v; }; g.Handlers[LOOKUP] += delegate(string s) { g.Reply(Values[s]); }; g.SetSecure(myKey); g.Join(); g.OrderedSend(UPDATE, “Harry”, 20.75); List resultlist = new List (); nr = g.OrderedQuery(ALL, LOOKUP, “Harry”, EOL, resultlist);  First sets up group  Join makes this entity a member. State transfer isn’t shown  Then can multicast, query. Runtime callbacks to the “delegates” as events arrive  Easy to request security, persistence, UNICAST_ONLY, etc  “Consistency” model dictates the ordering seen for event upcalls and the assumptions user can make 54

55 Picking a multicast primitive  Optimistic delivery  RawSend, Send, CausalSend, OrderedSend  Use Flush(k) prior to talking to an external entity like a client  Very fast, ideal for services with “soft state”  Very pessimistic  SafeSend: a version of Paxos, like OrderedSend but with stronger durability  Active “disk logger” for strongest assurance  Slow, but suitable for updating a replicated database or persistent external files 55 We’ll discuss these issues in detail later

56 Our example was overly simple 56  it didn’t show the “state transfer” code  Corresponds to the “white arrows” in time-line figure  In Isis 2 we have a way to make checkpoints  State transfer: Some active member makes a checkpoint, and the joiner loads the state from it.  The code looks like other operations in our example  Checkpoints can also be used to save group state during periods when all members are inactive Time 

57 Some uses for process groups  To replicate data maintained by the members in memory  To replicate actions taken on an external service such as a replicated database  To ensure that all replicas are configured the same way  To coordinate the processing of requests and load-balance  To offer a way to parallelize processing by having each group member do part of the work  Fault-tolerance via a backup scheme 57

58 Isis 2 Summary 58  A library that you can invoke from a normal program written in a normal way  It does the work of creating groups and sending multicasts and ensuring that the consistency model will be enforced  The developer just tells it what to do.  She thinks about a parallel distributed application.  Virtual synchrony eliminates many hard problems

59 Why not build it yourself from scratch? 59 Isis 2 user object Isis 2 library Group instances and multicast protocols Flow Control Membership Oracle Large Group LayerUDP overlayDr. MulticastPlatform Security Reliable SendingFragmentationGroup Security Sense Runtime Environment Self-stabilizing Bootstrap Protocol Socket Mgt/Send/Rcv Send CausalSend OrderedSend SafeSend Query.... Message Library “Wrapped” locks Bounded Buffers Oracle Membership Group membership Report suspected failures Other group members SafeSend and Send are two of the protocol components hosted over what we call the large-scale properties sandbox. The sandbox addresses issues like flow control, security, etc. All protocols share and benefit from those properties  These systems are complex, especially if you want to run on platforms like EC2  By using Isis 2 you “inherit” 30 years of research on how to make it work OOB File ReplicationKey-Value Store Views The SandBox itself is mostly composed of “convergent” protocols that use probabilistic methods

60 Isis 2 : Cornell’s cloud solution 60  Isis 2 makes it easy to write programs in a standard way, using the methods in its library, that replicate data and can coordinate their actions  The system automates tasks needed to ensure consistency  Without the system, consistency, security and other such issues are very hard to solve  But you still need to decide where to run your programs, make sure they can find one-another, etc. Isis 2 focuses mostly on keeping them consistent.


Download ppt "A BRIEF INTRODUCTION TO HIGH ASSURANCE CLOUD COMPUTING WITH ISIS2 Ken Birman 1 Cornell University."

Similar presentations


Ads by Google