Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 New Directions for NEST Research Nancy Lynch MIT NEST Annual P.I. Meeting Bar Harbor, Maine July 12, 2002 …

Similar presentations


Presentation on theme: "1 New Directions for NEST Research Nancy Lynch MIT NEST Annual P.I. Meeting Bar Harbor, Maine July 12, 2002 …"— Presentation transcript:

1 1 New Directions for NEST Research Nancy Lynch MIT NEST Annual P.I. Meeting Bar Harbor, Maine July 12, 2002 …

2 2 My Group’s Work and NEST New building blocks (global services and distributed algorithms) for dynamic, fault-prone, distributed systems. Interacting state machine semantic models, including timing, hybrid continuous/discrete, probabilistic behavior. Composition, abstraction. Formal methods/tools to support reasoning about distributed systems: Conditional performance analysis methods, IOA language/tools. System modeling.

3 3 A Suggestion for NEST Research: a Middleware Service Catalog Virtually everyone here is developing middleware services: –Clock synchronization, location services, routing, reliable communication, consensus, group membership, group communication, object management services, publish-subscribe, network surveillance, reconfiguration, authentication, key distribution,… But it’s not always obvious exactly what these services guarantee: –API, functionality, conditional performance guarantees, fault- tolerance guarantees

4 4 Middleware Service Catalog Idea: Create and maintain a catalog of specifications for NEST middleware services. –High-level descriptions of requirements Assumptions and guarantees API, functionality, conditional performance, fault-tolerance Formal, informal –Models of the distributed algorithms used in the various implementations. –Claims about the properties satisfied by the algorithms. –Models for the underlying platforms. Why this would be useful: –Another kind of output, complementary to demos. –Basis for discussion/clarification/comparison. –Will help bring implementations together. –Basis for formal analysis. –Can help in developing algorithmic theory for NEST-like systems.

5 5 Building Blocks for High-Performance, Fault-Tolerant Distributed Systems Nancy Lynch MIT NEST Annual P.I. Meeting Bar Harbor, Maine July 12, 2002 …

6 6 Our Current Project (NSF-ITR and AFOSR) Design and analyze building blocks for computing in highly dynamic distributed settings: –Global service specifications: –Distributed algorithms that implement them: Dynamic systems: –Internet, mobile computing –Joins, leaves, failures –Contrast: Traditional theory of distributed systems deals mostly with static systems, with fixed sets of processes. … Net … … Servic e

7 7 Our Project We present everything rigorously, using mathematical interacting state machine models (I/O automata). –Formal service specifications –Formal algorithm descriptions –Formal models for applications –Prove correctness, using invariants and simulation relations –Analyze performance, fault-tolerance Develop supporting theory Apply the theory to software systems … Net …

8 8 Current Subprojects Scalable group communication [Khazan, Keidar, Lynch, Shvartsman] Dynamic Atomic Broadcast [Bar-Joseph, Keidar, Lynch] Reconfigurable Atomic Memory [Lynch, Shvartsman] Communication protocols [Livadas, Lynch, Keidar, Bakr] Peer-to-peer computing [Lynch, Malkhi, Ratajczak, Stoica] Fault-tolerant consensus [Keidar, Rajsbaum] Foundations: [Lynch, Segala, Vaandrager, Kirli] Applications: –Toy helicopter [Mitra, Wang, Feron], –Video streaming[Livadas, Nguyen, Zakhor], –Unmanned flight control [Ha,Kochocki,Tanzman], –Agent programming [Kawabe]

9 9 People Project leader: Nancy Lynch Postdocs: Idit Keidar, Dilsun Kirli PhD students: Roger Khazan, Carl Livadas, Ziv Bar- Joseph, Rui Fan, Sayan Mitra, Seth Gilbert MEng students: Omar Bakr, Matt Bachmann, Vida Ha Other collaborators: Alex Shvartsman, Dahlia Malkhi, David Ratajczak, Ion Stoica, Sergio Rajsbaum, Roberto Segala, Frits Vaandrager, Yong Wang, Eric Feron, Thinh Nguyen, Avideh Zakhor, Joe Kochocki, Alan Tanzman, Yoshinobu Kawabe…

10 10 This talk: 1.Scalable Group Communication 2.Dynamic Atomic Broadcast 3.Reconfigurable Atomic Memory

11 11 1. Scalable Group Communication [Keidar, Khazan 00, 02] [Khazan 02] [K,K,Lynch, Shvartsman 02] … GCS

12 12 Group Communication Services Cope with changing participants using abstract groups of client processes with changing membership sets. Processes communicate with group members indirectly, by sending messages to the group as a whole. GC services support management of groups: –Maintain membership information. Form new views in response to changes. –Manage communication. Communication respects views. Provide guarantees about ordering, reliability of message delivery. Virtual synchrony Systems; Isis, Transis, Totem, Ensemble,… GCS

13 13 Group Communication Services Advantages: –High-level programming abstraction –Hides complexity of coping with changes Disadvantages: –Can be costly, especially when forming new views. –May have problems scaling to large networks. Applications: –Managing replicated data –Distributed multiplayer interactive games –Multi-media conferencing, collaborative work

14 14 New GC Service for WANs [Khazan] New specification, including virtual synchrony. New algorithm: –Uses separate scalable membership service, implemented on a small set of membership servers [Keidar, Sussman, Marzullo, Dolev]. –Multicast implemented on all the nodes. –View change uses only one round for state exchange, in parallel with membership service’s agreement on views. –Participants can join during view formation. GCS Net Memb GCS

15 15 New GC Service for WANs Distributed implementation [Tarashchanskiy] Safety proofs, using new incremental proof methods [Keidar, Khazan, Lynch, Shvartsman 00]. Liveness proofs Performance analysis –Analyze time from when network stabilizes until GCS announces new views. –Analyze message latency. –Conditional analysis, based on input, failure, and timing assumptions. –Compositional analysis, based on performance of Membership Service and Net. Also modeled and analyzed data-management application running on top of the new GCS. SS’ AA’

16 16 2. Early-Delivery Dynamic Atomic Broadcast [Bar-Joseph, Keidar, Lynch, DISC 02] DAB

17 17 Dynamic Atomic Broadcast Atomic broadcast with latency guarantees, in a dynamic setting where processes may join, leave, or fail. We define the DAB problem, and present and analyze a new distributed algorithm to solve it. In the absence of failures: Constant latency, even when participants join and leave. With failures: Latency linear in the number of failures. Uses a new distributed consensus service, in which participants do not know who the other participants are. We define the CUP problem, and present and analyze a new algorithm to solve it. Algorithm improves upon previously-suggested algorithms using group communication.

18 18 The DAB Problem Problem: Guarantee participants receive consistent sequences of messages. Fast delivery, even with joins, leaves. Safety: Sending, receiving orders are consistent with a single global message ordering S. No gaps. Liveness: Eventual join-ack, leave-ack. Eventual delivery, including the first message the process itself sends. Application: Distributed multiplayer interactive games. join leave mcast(m) join-ack leave-ack rcv(m) join-ack join DAB …

19 19 Implementing DAB Processes: –Timing-dependent, have approximately-synchronized clocks. Net: –Dynamic network, pairwise FIFO delivery –Low latency –Does not guarantee a single total order, nor that all processes see the same messages from a failing process. join net-join DAB Net

20 20 Implementing DAB Key difficulties: –Network doesn’t guarantee a single total order. –Different processes may receive different final messages from a failed process. So, processes coordinate message delivery: –Divide time into slots using local clock, assign each message to a slot. –Deliver messages in order of (slot, sender id). –Determine members of each slot, deliver only from members. Processes must agree on slot membership –Joining (leaving) process selects join-slot (leave-slot), informs other processes. –Failed process triggers consensus.

21 21 Using Consensus for DAB When process j fails, a consensus service is used to agree on j’s failure slot. Requires a new kind of consensus service, which: –Does not assume participants are known a priori; lets each participant say who it thinks the other participants are. –Allows processes to abstain. –Example: i joins around when consensus starts. j1 thinks i is participating, j2 thinks not. i cannot participate as usual, because j2 ignores it, but cannot be silent, because j1 waits for it. So i abstains. We define new Consensus with Unknown Participants (CUP) service. Use separate CUP(j) service to decide on failure slot for j.

22 22 DAB i1 DAB i2 CUP(j) DAB Net fail The DAB Algorithm Using CUP

23 23 The CUP Problem Guarantees agreement, validity, termination. Assumes submitted worlds are “close”: –Process that initiates is in other processes’ worlds –Process in anyone’s world initiates, abstains, leaves, or fails. CUP decide(v) init(v,W) abstain leave leave-detect(j) fail-detect(j)

24 24 The CUP Algorithm CUP Net We give a new early-stopping consensus algorithm. –Similar to previous algorithms, e.g., [Dolev, Reischuk, Strong 90]. –But tolerates: Uncertainty about participants, Processes leaving. Terminates in two rounds when failures stop (even if leaves continue). Latency linear in number of actual failures

25 25 DAB i1 DAB i2 CUP(j 1 ) DAB Net The DAB Algorithm Using CUP

26 26 Discussion: DAB Modular: DAB algorithm, CUP, Network Modularity needed for keeping the complexity under control. Initial presentation was intertwined, not modular. Correctness of CUP (agreement, validity, termination) used to prove correctness of DAB (atomic broadcast safety and liveness guarantees). Latency bounds for CUP used to prove latency bounds for DAB.

27 27 3. Reconfigurable Atomic Memory for Basic Objects [Lynch, Shvartsman, DISC 02] RAMBO

28 28 RAMBO Defined new service: Reconfigurable Atomic Memory for Basic Objects (dynamic atomic read/write shared memory). Developed new, efficient, modular distributed algorithm to implement RAMBO. Highly survivable; tolerates joins, leaves, failures. Tolerates short-term changes by using quorums. Tolerates long-term changes by reconfiguring. –Reconfigures on-the-fly; no heavyweight view change. –Maintains atomicity across configuration changes. Can be used in mobile or peer-to-peer settings. Applications: Battle data for teams of soldiers, game data for players in multiplayer game.

29 29 Static Quorum-Based Atomic Read/Write Memory Implementation [Attiya, Bar-Noy, Dolev] Read, Write use two phases: –Phase 1: Read (value, tag) from a read-quorum –Phase 2: Write (value,tag) to a write-quorum Write determines largest tag in phase 1, picks a larger one, writes new (value, tag) in phase 2. Read determines latest (value,tag) in phase 1, propagates it in phase 2, then returns the value. –Could return unconfirmed value after phase 1. Highly concurrent. Quorum intersection property implies atomicity.

30 30 How to make this dynamic? Quorum members may join, leave, fail; need to reconfigure. Idea: Any member of current quorum configuration can propose a new configuration. Questions: –How to agree on new configuration? –How to install it? –How to preserve atomicity of data during reconfiguration? –How to avoid stopping Reads/Writes in progress?

31 31 Our RAMBO Algorithm Uses a separate reconfiguration service. Recon recon read, write Net new-config Recon

32 32 Recon Using Consensus Recon service uses (static) consensus services to determine new configurations 1, 2, 3,… Consensus is a fairly heavyweight mechanism, but: –Only used for reconfigurations, which are presumably infrequent. –Does not delay Read/Write operations (unlike GCS approaches). Consensus Recon Net recon recon-ack

33 33 Consensus Implementation Use a variant of Paxos algorithm [Lamport] Agreement, validity guaranteed absolutely. Termination guaranteed when underlying system stabilizes. Leader chosen using failure detectors; conducts two- phase algorithm with retries. decide(v) init(v) Consensus

34 34 Read/Write Algorithm using Recon Read/write processes run two-phase static quorum-based algorithm, using current configuration. Use gossiping and fixed point tests rather than highly structured communication. When Recon provides new configuration, R/W uses both. Do not abort R/W in progress, but do extra work to access additional processes needed for new quorums. read, write Net Recon new-config

35 35 Removing Old Configurations Read/Write algorithm removes old configurations by garbage-collecting them in the background. Two-phase garbage-collection procedure: –Phase 1: Inform write-quorum of old configuration about the new configuration. Collect latest value from read-quorum of old configuration. –Phase 2: Inform write-quorum of new configuration about latest value. Garbage-collection concurrent with Reads/Writes. Implemented using gossiping and fixed points.

36 36 Discussion: RAMBO Highly modular: R/W algorithm, Recon service, Consensus, Leader election, Network Modularity needed for keeping the complexity under control. Correctness proofs: –Atomicity of Reads and Writes Latency bounds: –For reading, writing, garbage-collection. –Under various assumptions about timing, joins, failures, and rate of reconfiguration. LAN implementations begun.

37 37 Foundations: Hybrid, Timed, Probabilistic Models

38 38 Hybrid I/O Automata (HIOA) [Lynch, Segala, Vaandrager 01, 02] Mathematical model for hybrid (continuous/discrete) system components. Discrete actions, continuous trajectories Supports composition, levels of abstraction. Case studies: –Automated transportation systems –Quanser helicopter system [Mitra, Wang, Feron, Lynch] P C AS

39 39 Timed I/O Automata, Probabilistic,… Timed I/O Automata [Lynch, Segala, Vaandrager, Kirli]: –For modeling and analyzing timing-based systems, e.g., most of the building blocks of our AFOSR project. –Support composition, abstraction. –Collecting ideas from many research papers. Probabilistic I/O automata [Lynch, Segala, Vaandrager]: –For modeling systems with random behavior. –Composition, abstraction aspects still need development. –Need to be combined with timed/hybrid models.

40 40 Conclusions Three main building blocks (services and algorithms) for dynamic systems: –Scalable Group Communication –Dynamic Atomic Broadcast –Reconfigurable Atomic Memory Auxiliary building blocks: group membership, Consensus with Unknown Participants, reconfiguration Much remains to be done, to produce a “complete” set of useful building blocks for dynamic systems, and a good algorithmic theory for this area. Connections with NEST?


Download ppt "1 New Directions for NEST Research Nancy Lynch MIT NEST Annual P.I. Meeting Bar Harbor, Maine July 12, 2002 …"

Similar presentations


Ads by Google