Presentation is loading. Please wait.

Presentation is loading. Please wait.

HPTS Keynote, October 2001 Invariant Boundaries Dr. Eric A. Brewer Professor, UC Berkeley Co-Founder & Chief Scientist, Inktomi.

Similar presentations


Presentation on theme: "HPTS Keynote, October 2001 Invariant Boundaries Dr. Eric A. Brewer Professor, UC Berkeley Co-Founder & Chief Scientist, Inktomi."— Presentation transcript:

1 HPTS Keynote, October 2001 Invariant Boundaries Dr. Eric A. Brewer Professor, UC Berkeley Co-Founder & Chief Scientist, Inktomi

2 HPTS Keynote, October 2001 Our Perspective u Inktomi builds two distributed systems: –Global Search Engines –Distributed Web Caches u Based on scalable cluster & parallel computing technology u But very little use of classic DS research...

3 HPTS Keynote, October 2001 “Distributed Systems” don’t work... u There exist working DS: –Simple protocols: DNS, WWW –Inktomi search, Content Delivery Networks –Napster, Verisign, AOL u But these are not classic DS: –Not distributed objects –No RPC –No modularity –Complex ones are single owner (except phones)

4 HPTS Keynote, October 2001 Concept: Invariant Boundaries u Claim: we don’t understand boundaries u Solution: make “Invariant Boundary” explictit u Goal: simpler, easier, faster to correctness Invariant Holds Invariant may not hold

5 HPTS Keynote, October 2001 Three Basic Issues u Where is the state? u Consistency vs. Availability u Communication Boundaries

6 HPTS Keynote, October 2001 Santa Clara Cluster Very uniform No monitors No people No cables Working power Working A/C Working BW

7 HPTS Keynote, October 2001 Boundaries for Kinds of State u Stateless –Front ends u Immutable state u Soft State (rebuild on restart) u Durable Single-writer (e.g. user data) –Emissaries, horizontal partitioning u Durable MW –Fiefdoms, DBMS

8 HPTS Keynote, October 2001 Persistent State is HARD u Classic DS focus on the computation, not the data –this is WRONG, computation is the easy part u Data centers exist for a reason –can’t have consistency or availability without them u Other locations are for caching only: –proxies, basestations, set-top boxes, desktops –phones, PDAs, … u Distributed systems can’t ignore location u Invariant Boundary is small

9 AP Active Proxy: Bootstraps thin devices into infrastructure, runs mobile code AP Workstations & PCs Berkeley Ninja Architecture Base: Scalable, highly- available platform for persistent-state services Internet PDAs (e.g. IBM Workpad) Cellphones, Pagers, etc.

10 AP Active Proxy: Bootstraps thin devices into infrastructure, runs mobile code AP Workstations & PCs Berkeley Ninja Architecture Base: Scalable, highly- available platform for persistent-state services Internet PDAs (e.g. IBM Workpad) Cellphones, Pagers, etc. Consistent Shared Soft-state, immutable state Single user

11 HPTS Keynote, October 2001 Three Basic Issues u Where is the state? u Consistency vs. Availability u Communication Boundaries

12 HPTS Keynote, October 2001 Data is only consistent *inside* Data is not consistent (“reference” data) Consistent A

13 HPTS Keynote, October 2001 Data is only consistent *inside* Consistent A Data is not consistent (“reference” data) Consistent B

14 HPTS Keynote, October 2001 The CAP Theorem C onsistency A vailability Tolerance to network P artitions Theorem: You can have at most two of these invariants for any shared-data system

15 HPTS Keynote, October 2001 The CAP Theorem C onsistency A vailability Tolerance to network P artitions Theorem: You can have at most two of these invariants for any shared-data system Corollary: consistency boundary must choose A or P

16 HPTS Keynote, October 2001 Forfeit Partitions C onsistency A vailability Tolerance to network P artitions Examples u Single-site databases u Cluster databases u LDAP u Fiefdoms Traits u 2-phase commit u cache validation protocols u The “inside”

17 HPTS Keynote, October 2001 Forfeit Availability C onsistency A vailability Tolerance to network P artitions Examples u Distributed databases u Distributed locking u Majority protocols Traits u Pessimistic locking u Make minority partitions unavailable

18 HPTS Keynote, October 2001 Forfeit Consistency C onsistency A vailability Tolerance to network P artitions Examples u Coda u Web cachinge u DNS u Emissaries Traits u expirations/leases u conflict resolution u Optimistic u The “outside”

19 HPTS Keynote, October 2001 ACID vs. BASE ACID u Strong consistency u Isolation u Focus on “commit” u Nested transactions u Availability? u Conservative (pessimistic) u Difficult evolution (e.g. schema) u “small” Invariant Boundary u The “inside” BASE u Weak consistency –stale data OK u Availability first u Best effort u Approximate answers OK u Aggressive (optimistic) u “Simpler” and faster u Easier evolution (XML) u “wide” Invariant Boundary u Outside consistency boundary but it’s a spectrum

20 HPTS Keynote, October 2001 Consistency Boundary Summary u Can have consistency & availability within a cluster. No partitions within boundary! u OS/Networking better at A than C u Databases better at C than A u Wide-area databases can’t have both u Disconnected clients can’t have both

21 HPTS Keynote, October 2001 Three Basic Issues u Where is the state? u Consistency vs. Availability u Communication Boundaries

22 HPTS Keynote, October 2001 The Boundary The Boundary u The interface between two modules –client/server, peers, libaries, etc… u Basic boundary = the procedure call –thread traverses the boundary –two sides are in the same address space u What invariants don’t hold across? CS

23 HPTS Keynote, October 2001 Different Address Spaces u What if the two sides are NOT in the same address space? –IPC or LRPC u Can’t do pass-by-reference (pointers) –Most IPC screws this up: pass by value-result –There are TWO copies of args not one u What if they share some memory? –Can pass pointers, but… –Need synchronization between client/server –Not all pointers can be passed

24 HPTS Keynote, October 2001 Partial Failure u Can the two sides fail independently? –RPC, IPC, LRPC u Can’t be transparent (like RPC) !! u New exceptions (other side gone) u Idempotent calls? –Use Transaction Ids (to solve replay problem) u Reclaim local resources –e.g. kernels leak sockets over time => reboot u RPC tries to hide these issues (but fails) u Use Level 4/7 switches to hide failures?

25 HPTS Keynote, October 2001 Resource Allocation u How to reclaim resources allocated for client? –Usually timeout exception cleans up –Release locks? (must track them!) –How to avoid long delays while holding resources? u How long to remember client? –Delayed responses (past timeout) must be ignored u Problem with leases: –Great for servers, but… –Client’s lease may expire mid operation –Hard to make client updates atomic with multiple leases (2PC?) –Which things have leases? (can be hidden)

26 HPTS Keynote, October 2001 Trust the other side? u What if we don’t trust the other side? –Or partial trust (legal contract), or malicious? –Have to check args, no pointer passing u Limited release of information (leaks) u Kernels get this right: –copy/check args –use opaque references (e.g. File Descriptors) u Most systems do not –TCP, Napster, web browsers u Security boundaries tend to be explicit –Holes come from services!

27 HPTS Keynote, October 2001 Multiplexing clients? u Does the server have to: –Deal with high concurrency? –Say “no” sometimes (graceful degradation) –Treat clients equally (fairness) –Bill for resources (and have audit trail) –Isolate clients performance, data, …. u These all affect the boundary definition

28 HPTS Keynote, October 2001 Boundary evolution? u Can the two sides be updated independently? (NO) u The DLL problem... u Boundaries need versions u Negotiation protocol for upgrade? u Promises of backward compatibility? u Affects naming too (version number)

29 HPTS Keynote, October 2001 Example: protocols vs. APIs u Protocols have been more successful than APIs u Some reasons: –protocols are pass by value –protocols designed for partial failure –not trying to look like local procedure calls –explicit state machine, rather than call/return (this exposes exceptions well) u Protocols still not good at trust, billing, evolution

30 HPTS Keynote, October 2001 Example: XML u XML doesn’t solve any of these issues u It is RPC with an extensible type system u It makes evolution better? –two sides need to agree on schema –can ignore stuff you don’t understand –Must agree on meaning, not just tags u Can mislead us to ignore/postpone the real issues

31 HPTS Keynote, October 2001 Example: services u Claim: you can’t magically convert a class to a service u Behavior depends on the boundaries that callers cross…. –Trusted? Multiplexed? Partial failure? Namespaces? u Shouldn’t TRY to be transparent u Instead: make it easier to state boundary assumptions (and check them)

32 HPTS Keynote, October 2001 Annotated Interfaces u IDL can annotate interfaces: –“Timeout” => client may not respond –“Trusted” => no malicious clients –“Malicious’ => client may be hostile –“Multiplexed” => many simultaneous callers –“Idempotent” –Etc. u Could be checked at statically in some cases, dynamically in others u Annotations + Boundaries => fewer bugs

33 HPTS Keynote, October 2001 Lessons for Applications u Make boundaries very explicit –Not just client/server –Independent systems –Third-party software –Third-party services (RPC to vendors/partners) u Have a few big modules… –Otherwise too many boundaries, and no invariants –Examples: Apache, Oracle, Inktomi search engine –Big Modules are well supported –Big Modules justify their cost (Lampson)

34 HPTS Keynote, October 2001 Partial checklist u What is shared? (namespace, schema?) u What kind of state in each boundary? u How would you evolve an API? u Lifetime of references? Expiration impact? u Graceful degradation as modules go down? u External persistent names? u Consistency semantics and boundary?

35 HPTS Keynote, October 2001 Conclusions u Most systems are fragile u Root causes: –False transparency: assuming locality, trust, privacy… –Implicitly changing boundaries: consistency, partial failure, … u Some of the causes: –focus on computation, not data –ignoring location distinctions –overestimating consistency boundary –degraded boundaries (RPC is not a PC) u Invariant Boundaries –Help understanding, documentation –Simplify detection –Simpler and easier than full specifications (but weaker)

36 HPTS Keynote, October 2001 ACID vs. BASE u DBMS research is about ACID (mostly) u But we forfeit “C” and “I” for availability, graceful degradation, and performance This tradeoff is fundamental. BASE: –B asically A vailable –S oft-state –E ventual consistency


Download ppt "HPTS Keynote, October 2001 Invariant Boundaries Dr. Eric A. Brewer Professor, UC Berkeley Co-Founder & Chief Scientist, Inktomi."

Similar presentations


Ads by Google