Consistency and Scalability

Consistency and Scalability
COMP 150-IDS: Internet Scale Distributed Systems (Spring 2017) Consistency and Scalability Noah Mendelsohn Tufts University Web:

What you should get from today’s session
You will explore challenges relating to maintaining data consistency in a computing system You will learn about techniques used to make storage systems more reliable You will learn about transactions and their implementation using logs You will learn about the CAP theorem and why scaling and consistency tend not to come together

A note about scope The challenges & principles we cover today reappear at every level of system design CPU Instruction set and memory Parallel programming languages Single machine databases Distributed applications and databases Today we will focus mainly on larger scale systems

Why Worry About Consistency?

Duplicate information in computing systems
Why complicated things? Mirrored disks for reliability Parallel processing  higher throughput Geographic distribution reduces network delay (one each in Europe, Asia, US) Higher availability  if network crashes, each “partition” may still have a copy Inter-dependent data Bank account records have total for each account Bank record keeps total for all accounts Memory Hierarchies CPU Caches, file system caches, Web proxies, etc. If we allow updates, then maintaining consistency is tricky

Simple Examples: Parallel Disk Systems

Mirrored disks Everything written twice
Better performance on reads (slower on writes) X X Logical disk Mirrored Implementation

Duplicate data and crash recovery
After a crash, data survives X X Logical disk Crash! Mirrored Implementation

Replacement drive can be reconstructed in the background
Mirrored disks X X Logical disk Mirrored Implementation

REVIEW: How is the disk used in Unix / Linux?
Unix Kernel Buffered block r/w: hides timing Sector In-memory Block Cache Block Device Driver Direct read/write of filesystem “blocks” (hides sector size and device geometry) Raw Device Driver Access by cylinder/track/sector Sector Application Filesystem Files/Dirs security, etc

We can use mirrored disks with Unix
Abstraction: The mirrored disk provides the same service as a single disk…just faster and more reliable! Unix Kernel Buffered block r/w: hides timing Sector In-memory Block Cache Block Device Driver Mirrored Implementation Application Filesystem MIRRORED Device Driver Files/Dirs security, etc

Atomicity and update synchronziation
Mirrored writes DO NOT happen at quite the same time Question: when is the update committed? X X X Logical disk Mirrored Implementation

RAID – Reliable Arrays of Inexpensive Disks
Logical disk X RAID Implementation

XOR(X,Y) Logical disk X X Y X RAID Implementation

Much less space overhead than mirroring…but typically slower X Y Z Logical disk X X Y Z XOR(X,Y,Z) RAID Implementation

If any disk is lost…you can reconstruct from information on the others! X Y Z Logical disk Crash! X X Y Z XOR(X,Y,Z) RAID Implementation

Why Consistency is Hard

Synchronization problem
Let’s run code for two deposits in parallel Synchronization problem NA =Access Noah’s Bank account Bal = NA.Balance; NewBalance = Bal + $1000 NA.Balance.Write NewBalance Some code to add money to my account NA =Access Noah’s Bank account Bal = NA.Balance; NewBalance = Bal + $1000 NA.Balance.Write NewBalance Some code to add money to my account Can you see the problem? There’s a risk that both copies will pick up X before either updates. If that happens, I only get $1000 not $2000! 

Only one transaction or thread can hold the lock at a time
Solution - locking Lock Noah’s Bank Account NA =Access Noah’s Bank account Bal = NA.Balance; NewBalance = Bal + $1000 NA.Balance.Write NewBalance Unlock Noah’s Bank Account Some code to add money to my account Now the two copies can’t run at once on the same account…but if each locks a different bank account they can.

Consistency and Crash Recovery
NA =Access Noah’s Bank account YA =Access Your Bank account NBal = NA.Balance; Ybal = YA.Balance; Nbal += $1000; Ybal -= $1000; NA.Balance.Write Nbal YA.Balance.Write Ybal Some code to transfer money This gets lost during crash Can you see the problem? If the system crashes just after writing my balance, the bank loses $1000 (it’s still in your account too)

Transactions

Transactions: automated consistency & crash recovery!
BEGIN_TRANSACTION NA =Access Noah’s Bank account YA =Access Your Bank account NBal = NA.Balance; Ybal = YA.Balance; Nbal += $1000; Ybal -= $1000; NA.Balance.Write Nbal YA.Balance.Write Ybal END_TRANSACTION Some code to transfer money The system guarantees that either everything in the transaction happens, or nothing…and it guarantees more!

ACID Properties of a Transaction
Atomicity Everything happens or nothing Consistency If the database has rules they are obeyed at transaction end (e.g. balance must be < $1,000,000) Isolation Any two parallel transactions act as if serial Most transaction systems do the locking automatically! Durability Once committed, never lost That seems almost magic…how can we achieve all this?

How to implement transactions - logging
The key idea: a shared log records information needed to undo any change made by any transaction When a transaction commits: All data is written to the main data store A commit record is written to the log. This is the atomic point at which the transaction “happens” After a crash, the log is “replayed” For any transactions that did not commit, the undo operations are performed After the crash, only commited operations have happened! When combined with transaction driven locking, we can automatically support ACID properties with almost no application code complexity This is all built into SQL databases like Oracle, Postgres, DB2, and SQL Server Logging and transaction processing are two of the most important and beautiful data processing technologies

Logging in Action BEGIN_TRANSACTION NA =Access Noah’s Bank account
YA =Access Your Bank account NBal = NA.Balance; Ybal = YA.Balance; Nbal += $1000; Ybal -= $1000; NA.Balance.Write Nbal YA.Balance.Write Ybal END_TRANSACTION Some code to transfer money Noah.Bal = $100 Your.Bal = $1300

YA =Access Your Bank account NBal = NA.Balance; Ybal = YA.Balance; Nbal += $1000; Ybal -= $1000; NA.Balance.Write Nbal YA.Balance.Write Ybal END_TRANSACTION Some code to transfer money Noah.Bal = $100 Your.Bal = $1300 Begin Trans 1 Log

YA =Access Your Bank account NBal = NA.Balance; Ybal = YA.Balance; Nbal += $1000; Ybal -= $1000; NA.Balance.Write Nbal YA.Balance.Write Ybal END_TRANSACTION Some code to transfer money Noah.Bal = $1100 Your.Bal = $1300 Begin Trans 1 Old Noah Bal = $100 Log

YA =Access Your Bank account NBal = NA.Balance; Ybal = YA.Balance; Nbal += $1000; Ybal -= $1000; NA.Write Nbal YA.Balance.Write Ybal END_TRANSACTION Some code to transfer money Noah.Bal = $1100 Your.Bal = $1300 Begin Trans 1 Log Old Noah Bal = $100 Old Your Bal = $1300

YA =Access Your Bank account NBal = NA.Balance; Ybal = YA.Balance; Nbal += $1000; Ybal -= $1000; NA.Write Nbal YA.Write Ybal END_TRANSACTION Some code to transfer money Noah.Bal = $1100 Your.Bal = $300 Begin Trans 1 Log Old Noah Bal = $100 Old Your Bal = $1300 Commit Tr 1

What if we crash while the data is inconsistent?
Logging in Action BEGIN_TRANSACTION NA =Access Noah’s Bank account YA =Access Your Bank account NBal = NA.Balance; Ybal = YA.Balance; Nbal += $1000; Ybal -= $1000; NA.Write Nbal YA.Write Ybal END_TRANSACTION Some code to transfer money What if we crash while the data is inconsistent? Noah.Bal = $1100 Your.Bal = $1300 Begin Trans 1 Log Old Noah Bal = $100 Old Your Bal = $1300 Commit Tr 1

YA =Access Your Bank account NBal = NA.Balance; Ybal = YA.Balance; Nbal += $1000; Ybal -= $1000; NA.Balance.Write Nbal YA.Balance.Write Ybal END_TRANSACTION Some code to transfer money Noah.Bal = $100 Your.Bal = $1300

YA =Access Your Bank account NBal = NA.Balance; Ybal = YA.Balance; Nbal += $1000; Ybal -= $1000; NA.Write Nbal YA.Write Ybal END_TRANSACTION Some code to transfer money Noah.Bal = $100 Your.Bal = $1300 Begin Trans 1 Log

Logging in Action Crash! BEGIN_TRANSACTION
NA =Access Noah’s Bank account YA =Access Your Bank account NBal = NA.Balance; Ybal = YA.Balance; Nbal += $1000; Ybal -= $1000; NA.Write Nbal YA.Write Ybal END_TRANSACTION Some code to transfer money Crash! Noah.Bal = $1100 Your.Bal = $1300 Begin Trans 1 Old Noah Bal = $100 Log

Recovery! When system restarts, data is inconsistent…
Noah.Bal = $1100 Your.Bal = $1300 …but we can play the log to restore consistency! Begin Trans 1 Old Noah Bal = $100 Log

Recovery! We notice that Transaction 1 never committed, so we apply all of its undo entries Noah.Bal = $1100 Your.Bal = $1300 Begin Trans 1 Old Noah Bal = $100 Log

Recovery! We notice that Transaction 1 never committed, so we apply all of its undo entries Noah.Bal = $1100 Your.Bal = $1300 $100 Begin Trans 1 Old Noah Bal = $100 Log

Logging – keeping consistency after crashes
Full Disclosure This explanation is highly simplified but the spirit is exactly right. Examples of things not covered: Some databases use redo vs. undo logging or log both old and new values Transactions can abort (a ROLLBACK record is logged instead of COMMIT) Useful if programmer wants to give up The system can abort a transaction if there is an error The system can abort a transaction if locking has caused deadlock The same logs, if carefully designed, can be used to help with backup, recovery from disk drive failure, and synchronization of distributed systems. Logging – keeping consistency after crashes The key idea: a shared log records information on how to undo any change to the main data When a transaction commits: All data is written to the main data store A commit record is written to the log. This is the atomic point at which the transaction “happens” After a crash, the log is “replayed” For any transactions that did not commit, the undo operations are performed After the crash, only commited operations have happened! When combined with locking, we can automatically support ACID properties with almost no application code complexity This is all built into SQL databases like Oracle, Postgres, DB2, and SQL Server Logging and transaction processing are two of the most important and beautiful data processing technologies

Atomicity and hardware
Important: transactions are committed by an atomic hardware write to the log Before the commit is written, the transaction has not happened After it’s written all of its work is committed It all happens at once: atomically Principle: Almost any computing activity that is to be done atomically must be achieved in a single atomic hardware operation! Store, Test_and_set or compare_and_swap CPU instructions Write a disk block When designing systems that require consistency, start by studying what your hardware can do atomically

Consistency in Distributed Systems

Problem In a distributed system, we want to do work in lots of places
To get consistency, we need to do an atomic update to the system state Challenge: can we get consistency in a distributed system?

Can we get distributed consensus and consistency?
Yes! (but with some limitations) First we need to think about how distributed systems fail… …individual nodes can fail …what if the network partitions? In general, implementing transactions or other consistency guarantees in distributed systems is hard!

This network is fully connected
Network Partition This network is fully connected

Network Partition If these links break the network is partitioned
All computers are still up! Updates in one partition can’t be sent to the other. Network Partition If these links break the network is partitioned

Questions about failures in distributed systems
Can we support replicated data and maintain consistency? Can we run distributed transactions in which work (updating accounts) is spread through the network and achieve consistency? How can we do crash recovery? How do we continue running when the network partitions?

Voting: a simple approach to replicated data
Copies of the same data can be kept at any or all nodes…but when reading you must use the value stored at a majority of nodes!

All computers are still up! Updates in one partition
can’t be sent to the other. Network Partition During partition, only one group of nodes can be a majority…the other can’t proceed!

The Famous CAP Theorem

The Cap Theorem When designing a system with distributed data you would like to have: Consistency: everyone agrees on the data Availability: nobody ever has to stop processing Partition tolerance: keep going even when the network partitions The CAP theorem says: you can have any two simultaneously, but not all three! If your network can partition, then either some nodes will have to stop working (no availability) or data may become inconsistent (other partition doesn’t see the updates)

With the voting algorithm,
only the orange partition can do work. Network Partition The CAP theorem explains why we can never build a system that does better, unless we are willing to sacrifice consistency.

Distributed Transactions

Distributed transactions: the challenge
What if our computation is distributed? We still want ACID properties Atomicity Consistency Isolation Durability Per the CAP theorem: let’s ignore partition for now Amazingly, there are ways to do this: Isolation and Consistency: distributed lock managers Atomicity and Durability: Distributed Two Phase Commit (DTPC)

Distributed two phase commit
Allows a single transaction to be spread across multiple nodes Logging is done at each node as for traditional transactions Special protocol ensures atomic commit of distributed work One of the great achievements of 20th century distributed computing research

Distributed Two Phase Commit
BEGIN_DISTRIBUTED_TRANSACTION NA =Access Noah’s Bank account NBal = NA.Balance; Nbal += $1000; NA.Balance.Write Nbal COMMIT Node 1 logic JOIN_DISTRIBUTED_TRANSACTION YA =Access Your Bank account Ybal = YA.Balance; Ybal -= $1000; YA.Balance.Write Ybal Node 2 Logic Noah.Bal = $100 Your.Bal = $1300 Begin Trans 1 Node 1 Log Join Trans 1 Node 2 Log

BEGIN_DISTRIBUTED_TRANSACTION NA =Access Noah’s Bank account NBal = NA.Balance; Nbal += $1000; NA.Balance.Write Nbal COMMIT Node 1 logic JOIN_DISTRIBUTED_TRANSACTION YA =Access Your Bank account Ybal = YA.Balance; Ybal -= $1000; YA.Balance.Write Ybal Node 2 Logic Noah.Bal = $1100 Your.Bal = $300 Begin Trans 1 Old Noah Balance = $100 Node 1 Log Join Trans 1 Old YourBalance = $1300 Node 2 Log

BEGIN_DISTRIBUTED_TRANSACTION NA =Access Noah’s Bank account NBal = NA.Balance; Nbal += $1000; NA.Balance.Write Nbal COMMIT Node 1 logic JOIN_DISTRIBUTED_TRANSACTION YA =Access Your Bank account Ybal = YA.Balance; Ybal -= $1000; YA.Balance.Write Ybal Node 2 Logic Are you prepared to commit? Noah.Bal = $1100 Your.Bal = $300 Yes, I am prepared Begin Trans 1 Old Noah Balance = $100 Prepared Node 1 Log Join Trans 1 Old YourBalance = $1300 Prepared Node 2 Log

Prepared means: if you ask me later to commit or abort I will be able to do either! BEGIN_DISTRIBUTED_TRANSACTION NA =Access Noah’s Bank account NBal = NA.Balance; Nbal += $1000; NA.Balance.Write Nbal COMMIT Node 1 logic JOIN_DISTRIBUTED_TRANSACTION YA =Access Your Bank account Ybal = YA.Balance; Ybal -= $1000; YA.Balance.Write Ybal Node 2 Logic Are you prepared to commit? Noah.Bal = $1100 Your.Bal = $300 Yes, I am prepared Begin Trans 1 Old Noah Balance = $100 Prepared Node 1 Log Join Trans 1 Old YourBalance = $1300 Prepared Node 2 Log

BEGIN_DISTRIBUTED_TRANSACTION NA =Access Noah’s Bank account NBal = NA.Balance; Nbal += $1000; NA.Balance.Write Nbal COMMIT Node 1 logic JOIN_DISTRIBUTED_TRANSACTION YA =Access Your Bank account Ybal = YA.Balance; Ybal -= $1000; YA.Balance.Write Ybal Node 2 Logic Commit! Noah.Bal = $1100 Your.Bal = $300 Done Begin Trans 1 Old Noah Balance = $100 Prepared Commit Node 1 Log Join Trans 1 Old YourBalance = $1300 Prepared Commit Node 2 Log

What happens if there is a crash?
If a node goes down before the commit, the master node writes an abort record and tells other nodes to abort When any node comes up after a crash or after partition, it checks with master what has happened to any prepared transactions Because prepared means it can go either way, that node can either record a commit or execute a rollback using data from the log We can see the CAP theorem in action again: the algorithm stalls while the network is partitioned

Does Everyone use Distributed 2 Phase Commit?
In the late 1990s everyone thought DTPC would be the key to distributed data In practice, systems like Amazon can’t stop in case of network partition or master node crashes Today: Massive but non-critical data stores do not even attempt perfect consistency: once in awhile your Amazon shopping cart may lose things you’ve parked there Critical transactions (e.g. when you place your order and charge your credit card) are often recorded in less scalable but fully consistent (usually relational) databases

Recent Development: Google Spanner
GPS receivers and atomic clocks allow server clocks to synchronize worldwide with small, bounded skew This enables a new family of algorithms that achieve many of the benefits of DTPC but without some of the limitations The Google F1 SQL database management system (DBMS) is built on top of Spanner (2012) F1 - The Fault-Tolerant Distributed RDBMS Supporting Google's Ad Business (

Summary

Summary Keeping data consistent is important
Techniques like ACID transactions implemented with logs have been spectacularly successful Consistency and scalability tend not to come together Atomicity in software tends to require reduction to a single atomic operation in hardware The CAP theorem says we can’t have Consistency, Availability and Parition tolerance Techniques like Voting and Distributed Two Phase Commit can achieve distributed consistency at the cost of availability Many modern systems sacrifice consistency to achieve availability at massive scale

Consistency and Scalability

Similar presentations

Presentation on theme: "Consistency and Scalability"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Consistency and Scalability

Similar presentations

Presentation on theme: "Consistency and Scalability"— Presentation transcript:

Similar presentations

About project

Feedback