3 What do they have in common? Independent processors that communicate via a narrow interface. Contrast with shared memory. True concurrency. Contrast with multitasking. Partial failures Things get messy when only parts of the system stop working. Things can get really messy when parts of the system can try to do bad things.
4 Partial failures and availability Leslie Lamport at DEC SRC in 1980s A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable.
5 Partial failures Assume five computers crash and recover independently, and each crashed about 4 minutes a month (probability up = 0.9999). Probability all up? Probability one up?
6 Partial failures Assume five computers crash and recover independently, and each crashed about 4 minutes a month (probability up = 0.9999). Probability 5 up? 0.9995 (down 20 min/month) Probability ≥ 1 up? 1 10 20 (down 20 femtosec/month)
7 Fundamental problems… Reasoning about the state of a distributed system. Methods and protocols for making systems mask or detect and recover from failures. Protocols for agreeing on state, message delivery, process failures and recovery, etc.... in the context of different assumptions about the environment (communication, clocks, failures...)
8 A simple distributed coordination problem Two armies are camped on separate hills surrounding a valley in which the enemy army is encamped. The enemy can vanquish each army separately, but if both armies attack at the same time, they can vanquish the enemy.
9 Two General's Problem Attack at dawn! due to Jim Gray
21 Two Generals in practice Deduct $300 Issue $300 Question: what do banks do?
22 Muddy children Seven school children are playing outside after it rained. These are smart children - they’ve been studying logic and all reason flawlessly. At one point their teacher says to them “at least one of you has a muddy forehead”.
23 Muddy children The teacher asks them “raise your hand if you know you have mud on your forehead” There were no mirrors, the students didn’t feel their foreheads, and no one pointed at anyone; a student could only determine it by thinking. When no one raised their hand, the teacher asked again. When no one raised their hand, the teacher asked again. Some students raised their hands. How many did so? How did they know?
28 More Generals Three generals: a commander and two lieutenants. Communications is guaranteed. One general could be a traitor, or all three could be loyal. The commander will tell each lieutenant a command: attack or retreat. If the commander is loyal, then the loyal lieutenants need to obey the commander. If the commander is a traitor, then the lieutenants need to follow the same command (it doesn’t matter which).
37 Replicated sensors 1 - 6 2 - 7 4 - 9 4 - 6 2 - 7 1 - 9 all sensors correct up to one faulty sensorup to two faulty sensors M f n (S): smallest interval that intersects at least n - f input values.
38 Some problems to think about How would you solve agreement with one commander and three lieutenants? How would you solve agreement with one commander and two lieutenants, where a general is either loyal or dead?
39 Principles of distributed systems These kinds of problems come up all the time but in different circumstances. The details are vital - can make a problem go from being easy to impossible. Good distributed systems design requires good engineering on top of sound principles.