Presentation is loading. Please wait.

Presentation is loading. Please wait.

BASE Dan Pritchett, Ebay ACM Queue, May/June 2008.

Similar presentations


Presentation on theme: "BASE Dan Pritchett, Ebay ACM Queue, May/June 2008."— Presentation transcript:

1 BASE Dan Pritchett, Ebay ACM Queue, May/June 2008

2 Scaling Web applications grown If application relies on persistent, data storage bottleneck 2 strategies for scaling – Vertical scaling – larger machine Works for data storage, but can outgrow capacity of largest system – Horizontal scaling – more machines More flexible, but more complex

3 Horizontal Partitioning Horizontal partitioning along 2 vectors 1.Functional scaling - Grouping data by function and spread groups across DBs 2. Sharding - splitting data within functional areas across multiple databases Adds second dimension to horizontal scaling Functional areas scaled independently Fig 1

4

5 Functional partitioning – Needed for high degrees of scalability Decompose schema into tables grouped by functionality (normalization?) PK, FK for consistency across functional areas But constraints requires tables to reside on single DB server (why?) – Interferes with growth Move functional groups onto separate DB servers – Schemas that can scale place functionally distinct data on different DB servers – Moves constraints out of DB and into application

6 CAP CAP Theorem – Consistency – client perceives set of operations has occurred all at once – Availability – Every operation must terminate in an intended response – Partition tolerance – Operations will complete even if individual component are unavailable Can only support 2 out of 3

7 ACID DB vendors recognized need for partitioning DBs Introduced 2PC for ACID guarantees Shortcoming to 2PC? – Consistency across partitions – Impacting availability – Availability of system is product of availability required by components required for operation If transaction uses 2 DBs in a 2PC commit, each is available 99.9% – Availability of transaction is 99.9%*99.9% = 99.8% or downtime of 43 minutes per month

8 ACID Alternative BASE – Basically Available, Soft state, Eventually consistency ACID – pessimistic – forces consistency at the end of every operation BASE – optimistic – accepts that DB consistency will be in a state of flux – Leads to levels of scalability cannot attain with ACID

9 BASE Availability of BASE achieved through supporting partial failures without total system failures If partitioned across 5 DB severs – A user DB failure impacts only 20% of users on that host – Leads to higher perceived availability Supposed decompose data into functional groups, then partitioned busiest groups across multiple DBs – How to incorporate BASE into application?

10 Consistency Patterns Need in-depth analysis of operations within logical transactions than for ACID Consistency patterns – – Identify opportunities to relax consistency – Difficult – stakeholders and developers both claim it is needed for success of application – Temporal inconsistency cannot be hidden from the end users

11 Example Sample schema – User table contains total amount bought and sold, running totals – Transaction table relates seller and buyer and amount of transaction – Fig 2

12

13 Example Consistency across functional groups is easier to relax than within functional groups Each time item sold, row added to transaction table and counters for buyer and seller updated SQL in Fig 3

14

15 Relax consistency Now consider bought and sold columns cache of transaction table Relaxed – Running balances do not reflect result of transaction immediately – Expectation of users, e.g. ATM withdrawal and cell phone calls – If running balances are estimates (can miss some), then changes in SQL easy – Fig. 4

16

17 Inconsistency Decoupled updates to user and transaction tables Can have permanent inconsistency This may violate contract If cannot estimate?

18 Persistent Message Queue Introduce persistent message queue Different ways to implement, but must ensure backing persistence on same resource as DB Allows queue to be transactionally committed without involving a 2PC Fig 5

19

20 Persistent Message Queue By queuing persistent message in same transaction as insert, info needed to update running balances captured Transaction contained on single DB instance and will not impact system performance – Same as logs?? Separate message processing component will dequeue each message and apply info to user table Still a problem – Message persistence is on the transaction host to avoid 2PC during queuing – If message is dequeued inside transaction involving user host, still have 2PC situation, if not successful, message lost

21 Persistent Message Queue Solution to problem? – Do nothing If decouple update into separate back-end component are preserving availability of customer facing component Lower availability of message processor may be acceptable for business requirements

22 No 2PC Solution to problem? – If 2PC never acceptable? Update operations rarely idempotent (operation that can be applied one or more times and still give the same result – Incrementing value – no – Setting a value – not idempotent with regard to ordering, not if cannot guarantee updates applied in order received Need a way to track which updates have been applied successfully and those outstanding Use a table Figs. 6&7

23

24 No 2PC Example depends on being able to peek (instead of dequeue) a message in the queue and remove it once successfully processed Can be done with 2 independent transactions – First does update – Second removes from queue Queue operations not committed until DB operations successfully commit Supports partial failures and provides transactional guarantees without 2PC

25 Ordering of Updates Simple way to ensure ordered idempotent updates – Example to track last date of sale and purchase for user – Can update the date with a message – But what if 2 purchases occur in short time window and messages can’t guarantee order? By not allowing last-purchase time to decrease, update operations order independent Instead of time can use monotonically increasing ID for transaction Fig. 8&9

26

27 Ordered message delivery – Can be expensive and unnecessary – Can give false sense of security Showed can relax message ordering and still provide consistent view of DB, eventually – Overhead in nominal, less than enforcing ordering Web application is semantically event-driven – Client requests arrive in arbitrary order – Processing time per request varies – Request scheduling of components of systems is nondeterministic – Result is non-deterministic queuing of messages – Reality is: nondeterministic inputs -> nondeterministic outputs

28 Application design Impact of soft state and eventual consistency on application design – SE view systems as closed loops – Think about predictability of behavior – predictable input -> predictable output – A necessity for creating correct software systems – Using BASE doesn’t change predictability of a system as a closed loop, but requires looking at the behavior in total

29 Example: – Users can transfer asset to other users – Decouple 2 operations: Taking the asset from one user Giving it to the other with a message queue to decouple – Is this system really nondeterministic? Lag between time when asset left one user and not arrived at other yet Size of time window determined by message system – User does not know or care about lag – Neither receiver or sender know when asset has arrived – Lag between sending/receiving of a few seconds is invisible or tolerable

30 What if you need to know when state consistent? – Rely on events generated as state becomes consistent – Can notify user when asset has arrived – EDA (event drive architecture) can provide dramatic improvements in scalability and architectural decoupling

31 Conclusion Scaling systems requires a new way of thinking Transactional models problematic when loads spread across a large number of components Decoupling operations provides improved availability and scale at cost of consistency BASE is a model for thinking about decoupling


Download ppt "BASE Dan Pritchett, Ebay ACM Queue, May/June 2008."

Similar presentations


Ads by Google