Presentation is loading. Please wait.

Presentation is loading. Please wait.

Robustness in the Salus scalable block store Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike.

Similar presentations


Presentation on theme: "Robustness in the Salus scalable block store Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike."— Presentation transcript:

1 Robustness in the Salus scalable block store Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike Dahlin University of Texas at Austin 1

2 Storage: do not lose data Fsck, IRON, ZFS, … Past Now & Future Are they robust enough? 2

3 Salus: provide robustness to scalable systems Scalable systems: crash failure + certain disk corruption (GFS/Bigtable, HDFS/HBase, WAS, FDS, …) Strong protections: arbitrary failure (BFT, End-to-end verification, …) Salus 3

4 Start from “Scalable” 4 File system Key value Database Block store Scalable Robust

5 Scalable architecture Metadata server Data servers Clients Infrequent metadata transfer Parallel data transfer Data is replicated for durability and availability Computation nodes: No persistent storage Data forwarding, garbage collection, etc Tablet server (Bigtable), Region server (HBase) , Stream Manager (WAS), … 5

6 Challenges ? Parallelism vs consistency Prevent computation nodes from corrupting data Read safely from one node 6 Do NOT increase replication cost

7 Parallelism vs Consistency Metadata server Data servers Clients Write 1 Write 2 Barrier Block store requires barrier semantic. 7

8 Prevent computation nodes from corrupting data Metadata server Data servers Clients Single point of failure 8

9 Read safely from one node Metadata server Data servers Clients Single point of failure Does simple checksum work? Can not prevent errors in control flow. 9

10 Outline Challenges Salus – Pipelined commit – Active storage – Scalable end-to-end checks Evaluation 10

11 Salus’ overview Disk-like interface (Amazon EBS): – Volume: a fixed number of fixed-size blocks – Single client per volume: barrier semantic Failure model: arbitrary but no identity forge Key ideas: – Pipelined commit – write in parallel and in order – Active storage – safe write – Scalable end-to-end checks – safe read 11

12 Salus’ key ideas Metadata server Clients Pipelined commit Active storage End-to-end verification 12

13 Outline Challenges Salus – Pipelined commit – Active storage – Scalable end-to-end checks Evaluation 13

14 Pipelined commit Goal: barrier semantic – A request can be marked as a barrier. – All previous ones must be executed before it. Naïve solution: – The client blocks at a barrier: lose parallelism A weaker version of distributed transaction – Well-known solution: two phase commit (2PC) 14

15 Pipelined commit – 2PC 123 4545 Previous leader Prepared Batch i-1 committed Client Servers Leader Prepared Leader Batch i Batch i+1 13 2 15

16 Pipelined commit – 2PC 123 45 13 2 45 Previous leader Batch i-1 committed Client Servers Leader Commit Batch i committed Commit Leader Batch i Batch i+1 Parallel data transfer Sequential commit 16

17 Key difference from 2PC 123 45 13 2 45 Previous leader Batch i-1 committed Client Servers Leader Commit Batch i committed Commit Leader Batch i Batch i+1 17

18 Outline Challenges Salus – Pipelined commit – Active storage – Scalable end-to-end checks Evaluation 18

19 Salus: active storage Goal: a single node cannot corrupt data Well-known solution: BFT replication – Problem: higher replication cost Salus’ solution: use f+1 computation nodes – Require unanimous consent – How about availability/liveness? 19

20 Active storage: unanimous consent Computation nodes Storage nodes 20

21 Active storage: unanimous consent Computation nodes Storage nodes Unanimous consent: – All updates must be agreed by f+1 computation nodes. Additional benefit: – Collocate computation and storage: save network bw 21

22 Active storage: restore availability Computation nodes Storage nodes What if something goes wrong? – Problem: we may not know which one is faulty. Replace all computation nodes 22

23 Active storage: restore availability Computation nodes Storage nodes What if something goes wrong? – Problem: we may not know which one is faulty. Replace all computation nodes – Computation nodes do not have persistent states. 23

24 Discussion Does Salus achieve BFT with f+1 replication? – No. Fork can happen in a combination of failures. Is it worth using 2f+1 replicas to eliminate this case? – No. Fork can still happen as a result of geo-replication. How to support multiple writers? – We do not know a perfect solution. – Give up either linearizability or scalability. 24

25 Outline Challenges Salus – Pipelined commit – Active storage – Scalable end-to-end checks Evaluation 25

26 Evaluation 26

27 Evaluation 27

28 Evaluation 28

29 Outline Salus’ overview Pipelined commit – Write in order and in parallel Active storage: – Prevent a single node from corrupting data Scalable end-to-end verification: – Read safely from one node Evaluation 29

30 Salus: scalable end-to-end verification Goal: read safely from one node – The client should be able to verify the reply. – If corrupted, the client retries another node. Well-known solution: Merkle tree – Problem: scalability Salus’ solution: – Single writer – Distribute the tree among servers 30

31 Scalable end-to-end verification Server 1 Server 2 Server 3 Server 4 Client maintains the top tree. Client does not need to store anything persistently. It can rebuild the top tree from the servers. 31

32 Scalable and robust storage More hardwareMore complex softwareMore failures 32

33 Start from “Scalable” Scalable Robust 33

34 Pipelined commit - challenge Is 2PC slow? – Locking? Single writer, no locking – Phase 2 network overhead? Small compare to Phase 1. – Phase 2 disk overhead? Eliminate that, push complexity to recovery Order across transactions – Easy when failure-free. – Complicate recovery. Key challenge: Recovery 34


Download ppt "Robustness in the Salus scalable block store Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike."

Similar presentations


Ads by Google