Presentation is loading. Please wait.

Presentation is loading. Please wait.

Aaron J. Elmore, Sudipto Das, Divyakant Agrawal, Amr El Abbadi Distributed Systems Lab University of California Santa Barbara.

Similar presentations

Presentation on theme: "Aaron J. Elmore, Sudipto Das, Divyakant Agrawal, Amr El Abbadi Distributed Systems Lab University of California Santa Barbara."— Presentation transcript:

1 Aaron J. Elmore, Sudipto Das, Divyakant Agrawal, Amr El Abbadi Distributed Systems Lab University of California Santa Barbara

2  Serve thousands of applications (tenants) ◦ AppEngine, Azure,  Tenants are (typically) ◦ Small ◦ SLA sensitive ◦ Erratic load patterns ◦ Subject to flash crowds  i.e. the fark, digg, slashdot, reddit effect (for now)  Support for Multitenancy is critical  Our focus: DBMSs serving these platforms Sudipto Das {}

3 What the tenant wants… What the service provider wants… Sudipto Das {}

4 Unused resources Static provisioning for peak is inelastic Traditional InfrastructuresDeployment in the Cloud Demand Capacity Time Resources Demand Capacity Time Resources Slide Credits: Berkeley RAD Lab Sudipto Das {}

5 Database tier Sudipto Das {} Load Balancer Application/ Web/Caching tier

6  Migrate a tenant’s database in a Live system ◦ A critical operation to support elasticity  Different from ◦ Migration between software versions ◦ Migration in case of schema evolution Sudipto Das {}

7  VM migration [Clark et al., NSDI 2005]  One tenant-per-VM ◦ Pros: allows fine-grained load balancing ◦ Cons  Performance overhead  Poor consolidation ratio [Curino et al., CIDR 2011]  Multiple tenants in a VM ◦ Pros: good performance ◦ Cons: Migrate all tenants  Coarse-grained load balancing Sudipto Das {}

8  Multiple tenants share the same database process ◦ Shared process multitenancy ◦ Example systems: SQL Azure, ElasTraS, RelationalCloud, and may more  Migrate individual tenants  VM migration cannot be used for fine-grained migration  Target architecture: Shared Nothing ◦ Shared storage architectures: see our VLDB 2011 Paper Sudipto Das {}


10  How to ensure no downtime?  Need to migrate the persistent database image (tens of MBs to GBs)  How to guarantee correctness during failures?  Nodes can fail during migration  How to ensure transaction atomicity and durability?  How to recover migration state after failure?  Nodes recover after a failure  How to guarantee serializability?  Transaction correctness equivalent to normal operation  How to minimize migration cost? … Sudipto Das {}

11  Downtime ◦ Time tenant is unavailable  Service Interruption ◦ Number of operations failing/transactions aborting  Migration Overhead/Performance impact ◦ During normal operation, migration, and after migration  Additional Data Transferred ◦ Data transferred in addition to DB’s persistent image Sudipto Das {}

12  Migration executed in phases  Starts with transfer of minimal information to destination ( “wireframe” )  Source and destination concurrently execute transactions in one migration phase  Database pages used as granule of migration  Pages “pulled” by destination on-demand  Minimal transaction synchronization  A page is uniquely owned by either source or destination  Leverage page level locking  Logging and handshaking protocols to tolerate failures Sudipto Das {}

13  For this talk ◦ Small tenants  i.e. not sharded across nodes. ◦ No replication ◦ No structural changes to indices  Extensions in the paper ◦ Relaxes these assumptions Sudipto Das {}

14 Owned Pages Active transactions Page owned by Node Page not owned by Node P1P1 P2P2 P3P3 PnPn T S1,…, T Sk Source Destination

15 Sudipto Das {} Owned Pages Active transactions Un-owned Pages Freeze index wireframe and migrate Page owned by Node Page not owned by Node P1P1 P2P2 P3P3 PnPn T S1,…, T Sk Source Destination P1P1 P2P2 P3P3 PnPn

16 Sudipto Das {} SourceDestination

17 Sudipto Das {} Requests for un-owned pages can block Old, still active transactions New transactions Page owned by Node Page not owned by Node P1P1 P2P2 PnPn T Sk+1,…, T Sl T D1,…, T Dm P3P3 P 3 accessed by T Di P 3 pulled from source Source Destination P1P1 P2P2 P3P3 PnPn Index wireframes remain frozen

18 Sudipto Das {} Pages can be pulled by the destination, if needed Completed Page owned by Node Page not owned by Node PnPn Source Destination P1P1 P2P2 P3P3 P 1, P 2, … pushed from source T Dm+1, …, T Dn PnPn P1P1 P2P2 P3P3

19 Sudipto Das {} Page owned by Node Page not owned by Node Source Destination P1P1 P2P2 P3P3 T Dn+1,…, T Dp PnPn Index wireframe un-frozen

20  Once migrated, pages are never pulled back by source ◦ Transactions at source accessing migrated pages are aborted  No structural changes to indices during migration ◦ Transactions (at both nodes) that make structural changes to indices abort  Destination “pulls” pages on-demand ◦ Transactions at the destination experience higher latency compared to normal operation Sudipto Das {}

21  Only concern is “dual mode” ◦ Init and Finish: only one node is executing transactions  Local predicate locking of internal index and exclusive page level locking between nodes  no phantoms  Strict 2PL  Transactions are locally serializable  Pages transferred only once ◦ No T dest  T source conflict dependency  Guaranteed serializability Sudipto Das {}

22  Transaction recovery ◦ For every database page, transactions at source ordered before transactions at destination ◦ After failure, conflicting transactions replayed in the same order  Migration recovery ◦ Atomic transitions between migration modes  Logging and handshake protocols ◦ Every page has exactly one owner  Bookkeeping at the index level Sudipto Das {}

23  In the presence of arbitrary repeated failures, Zephyr ensures: ◦ Updates made to database pages are consistent ◦ A failure does not leave a page without an owner ◦ Both source and destination are in the same migration mode  Guaranteed termination and starvation freedom Sudipto Das {}

24  Replicated Tenants  Sharded Tenants  Allow structural changes to the indices ◦ Using shared lock managers in the dual mode Sudipto Das {}

25  Prototyped using an open source OLTP database H2 ◦ Supports standard SQL/JDBC API ◦ Serializable isolation level ◦ Tree Indices ◦ Relational data model  Modified the database engine ◦ Added support for freezing indices ◦ Page migration status maintained using index ◦ Details in the paper…  Tungsten SQL Router migrates JDBC connections during migration Sudipto Das {}

26  Two database nodes, each with a DB instance running  Synthetic benchmark as load generator ◦ Modified YCSB to add transactions  Small read/write transactions  Compared against Stop and Copy ( S&C ) Sudipto Das {}

27 Metadata Default transaction parameters: 10 operations per transaction 80% Read, 15% Update, 5% Inserts Hardware: 2.4 Ghz Intel Core 2 Quads, 8GB RAM, 7200 RPM SATA HDs with 32 MB Cache Gigabit ethernet Workload: 60 sessions 100 Transactions per session System Controller Migrate Default DB Size: 100k rows (~250 MB)

28  Downtime (tenant unavailability) ◦ S&C: 3 – 8 seconds (needed to migrate, unavailable for updates) ◦ Zephyr: No downtime. Either source or destination is available  Service interruption (failed operations) ◦ S&C: ~100 s – 1,000s. All transactions with updates are aborted ◦ Zephyr: ~10s – 100s. Orders of magnitude less interruption Sudipto Das {}

29  Average increase in transaction latency (compared to the 6,000 transaction workload without migration) ◦ S&C: 10 – 15%. Cold cache at destination ◦ Zephyr: 10 – 20%. Pages fetched on-demand  Data transfer ◦ S&C: Persistent database image ◦ Zephyr: 2 – 3% additional data transfer (messaging overhead)  Total time taken to migrate ◦ S&C: 3 – 8 seconds. Unavailable for any writes ◦ Zephyr: 10 – 18 seconds. No-unavailability Sudipto Das {}

30 Orders of magnitude fewer failed operations

31  Proposed Zephyr, a live database migration technique with no downtime for shared nothing architectures ◦ The first end to end solution with safety, correctness and liveness guarantees  Prototype implementation on a relational OLTP database  Low cost on a variety of workloads Sudipto Das {}


33 SourceDestination Txns

34 Sudipto Das {} SourceDestination Txns

35 Sudipto Das {} SourceDestination Txns

36 Sudipto Das {} SourceDestination Txns

37 Sudipto Das {} 37 SourceDestination Txns

38 Sudipto Das {} SourceDestination Txns

39  Either source or destination is serving the tenant ◦ No downtime  Serializable transaction execution ◦ Unique page ownership ◦ Local multi-granularity locking  Safety in the presence of failures ◦ Transactions are atomic and durable ◦ Migration state is recovered from log  Ensure consistency of the database state Sudipto Das {}

40  Wireframe copy  Typically orders of magnitude smaller than data  Operational overhead during migration  Extra data (in addition to database pages) transferred  Transactions aborted during migration Sudipto Das {}

41 Failures due to attempted modification of Index structure

42 Sudipto Das {}  Only committed transaction reported  Loss of cache for both migration types  Zephyr results in a remote fetch

Download ppt "Aaron J. Elmore, Sudipto Das, Divyakant Agrawal, Amr El Abbadi Distributed Systems Lab University of California Santa Barbara."

Similar presentations

Ads by Google