Download presentation

Presentation is loading. Please wait.

Published bySadie Honer Modified over 2 years ago

1
Diskless Checkpointing 15 Nov 2001

2
Motivation Checkpointing on Stable Storage Disk access is a major bottleneck! Incremental Checkpointing Copy-on-write Compression Memory Exclusion Diskless Checkpointing

3
Diskless? Extra memory is available (e.g. NOW) Use memory instead of disk Good: Network Bandwidth > Disk Bandwidth Bad: Memory is not stable

4
Bottom-line NOW with (n+m) processors The application runs on exactly n procs, and should proceed as long as The number of processors in the system is at least n The failures occur within certain constraint Available Processors (n+m) Application Processors (n) Chkpnt Processors (m)

5
Overview Coordinated Chkpnt (Sync-and-Stop) To checkpoint, Application Proc: Chkpnt the state in memory Chkpnt Proc: Encoding the application chkpnts and storing the encodings in memory To recover, Non-failed Procs: Roll-back Replacement processors are chosen. Replacement Proc: Calculate the chkpnts of the failed procs using other chkpnts & encodings

6
Outline Application Processor Chkpnt Disk-based Diskless Incremental Forked (or copy-on-write) Optimization Encoding the chkpnts Parity (RAID level 5) Mirroring 1-Dimensional Parity 2-Dimensional Parity Reed-Solomon Coding Reed-Solomon Coding Optimization Result

7
Application Processor Chkpnt Goal The processor should be able to roll back to its most recent chkpnt. Need to tolerate failures when chkpnt Make sure that each coordinated chkpnt remains valid until the next coordinated chkpnt has been completed.

8
Disk-based Chkpnt To chkpnt Save all values in the stack, heap, and registers to disk To recover Overwrites the address space with the stored checkpoint Space Demands 2M in disk (M: the size of an application processor’s address space)

9
Simple Diskless Chkpnt To chkpnt Wait until encoding calculated Overwrite diskless chkpnts in memory To recover Roll-backed from in-memory chkpnts Space Demands Extra M in memory (M: the size of an application processor’s address space)

10
Incremental Diskless Chkpnt To chkpnt Initially set all pages R_ONLY On page fault, copy & set RW To recover Restore all RW pages Space Demands Extra I in memory (I: the incremental chkpnt size)

11
Forked Diskless Chkpnt To chkpnt Application clones itself To recover Overwrites state with clone’s Or clone assumes the role of the application Space Demands Extra 2I in memory (I: the incremental chkpnt size)

12
Optimizations Breaking the chkpnt into chunks Efficient use of memory Sending Diffs (Incremental) Bitwise xor of the current copy and chkpnt copy Unmodified pages need not be sent Compressing Diffs Unmodified regions of memory

13
Application Processor Chkpnt (review) Simple Diskless Chkpnt: Extra M in memory Incremental Diskless Chkpnt: Extra I in memory Forked Diskless Chkpnt: Extra 2I in memory, less CPU activity Optimizations: Chkpnt into chunks, diffs, and compressed diffs

14
Encoding the chkpnts Goal Extra chkpnt processors should store enough information that the chkpnts of failed processors may be reconstructed. Notation: Number of chkpnt processors (m) Number of application processors (n)

15
To chkpnt, On failure of i th proc, Can tolerate: Only one processor failure Remarks: Chkpnt processor is a bottleneck of communication and computation Parity (RAID level 5, m=1) Application Processor Chkpnt Processor j-th byte of Application processor i Example n=4, m=1

16
Mirroring (m=n) Application Processor Chkpnt Processor j-th byte of Application processor i Example n=m=4 To chkpnt, On failure of i th proc, Can tolerate: Up to n processor failures Except the failure of both an application processor and its checkpoint processor Remarks: Fast, no calculation needed

17
1-Dimensional Parity (1

18
2-Dimensional Parity Application Processor Chkpnt Processor j-th byte of Application processor i Example n=4, m=4 To chkpnt, Application processors are arranged logically in a two-dimensional grid Each chkpnt processor calculates the parity of the row or the column On failure of i th proc, Same as in Parity encoding Can tolerate: Any two-processor failures Remarks: Multicast

19
Reed-Solomon Coding (m) To chkpnt, Vandermonde matrix F, s.t. f(i,j)=j^(i-1) Use matrix-vector multiplication to calculate chkpnt To recover, Use Gaussian Elimination Can tolerate: Any m failures Remarks: Use Galois Fields to perform arithmetic Computation overhead

20
Optimizations Sending and calculating the encoding in RAID level 5-based encodings (e.g. Parity) (a) DIRECT: C1 bottleneck (b) FAN-IN: log(n) step

21
Encoding the Chkpnts (review) Parity (RAID level 5, m=1) Only one failure, bottleneck Mirroring (m=n) Up to n failures (unless both app and chkpnt fail), fast 1-Dimensional Parity One failure per group, more efficient than Parity 2-Dimensional Parity Any two failures, comm overhead w/o multicast Reed-Solomon Coding Any m failures, computation overhead DIRECT vs. FAN-IN

22
Testing Applications (1) CPU-Intensive parallel programs Instances that took 1.5~2 hrs on 16 processors NBODY : N-body interactions among particles in a system Particles are partitioned among processors Location field of each particle is updated Expectation: Poor with incremental chkpnt Good with diff-based compression MAT : FP matrix product of two square matrices (Cannon’s alg.) All three matrices are partitioned in square blocks among processors In each step, adds the product and passing the input submatrices Expectation: Incremental chkpnt Very poor with diff-based compression

23
Testing Applications (2) PSTSWM : Nonlinear shallow water equations on a rotating sphere Majority pages, but only few bytes per page are modified Expectation: Poor with incremental chkpnt Good with diff-based compression CELL : Parallel cellular automaton simulation program Two (sparse) grids of cellular automata (current/next) Expectation: Poor with incremental chkpnt Good with compression PCG : Ax=b for a large, sparse matrix First, converted to a small, dense format Expectation: Incremental chkpnt Very poor with diff-based compression

24
Diskless Checkpointing 20 Nov 2001

25
Disk-based vs. Diskless Chkpnt Disk-basedDiskless Where to chkpnt? In stable storageIn local memory How to recover? Restore from stable storageRe-calculate Remarks Can tolerate whole failureCannot tolerate whole failure Low BW to stable storageMemory is much faster Encoding (+communication) overhead

26
Recalculate the lost chkpnt? Error Detection & Correction in Digital Communication Chkpnt Recovery in Diskless Chkpnt 1-bit Parity (m=1) Mirroring (m=n) Remarks -Difference: we can easily know that which node is wrong in chkpnt system. -Some codings can be used to recover from errors in Digital Comm, too. (e.g. Reed-Solomon) 11001011[1] (right) 11000011[1] (detectable) 11001011[0] (detectable) 11000011[0] (oops) 11001011[1] (chkpnt) 1100X011[1] (tolerable) 11001011[X] (tolerable) 1100X011[X] (intolerable) 11001011[11001011] (right) 11001011[11001010] (detectable) 11001011[00111100] (detectable) 11001010[11001010] (oops) 11001011[11001011] (right) 11001011[1100101X] (tolerable) 11001011[XXXXXXXX] (tolerable) 1100101X[1100101X] (intolerable)

27
Performance Criteria Latency: time between chkpnt initiated and ready for recovery Overhead: increase in execution time with chkpnt Applications NBODYN-body interactions PSTSWMSimulation of the states on 3-D system CELLParallel cellular automaton MATFP Matrix multiplication (Canon’s) PCGPCG for sparse matrix Majority pages, but only few bytes per page are modified Only small parts are updated, but updated in their entirety AppDescription Pattern

28
Implementation BASE: No chkpnt DISK-FORK: Disk-based chkpnt w/ fork() SIMP: Simple diskless INC: Incremental diskless FORK: Forked diskless INC-FORK: Incremental, forked diskless C-SIMP: w/ diff-based compression C-INC C-FORK C-INC-FORK

29
Experiment Framework Network of 24 Sun Sparc5 w/s connected to each other by a fast, switched Ethernet: ~ 5MB/s Each w/s has 96MB of physical memory 38MB of local disk storage Disks with bandwidth of 1.7MB/s are connected via Ethernet, and NFS on Ethernet achieved a bandwidth of 0.13 MB/s Latency: time between chkpnt initiated and ready for recovery Overhead: increase in execution time with chkpnt

32
Discussion Latency: diskless has much lower latency than disk-based. Lowers the expected running time of the application in the presence of failures (has small recovery time) Overhead: comparable…

33
Recommendations DISK-FORK: If chkpnt are small If the likelihood of wholesale system failures are high C-FORK: If many pages, but a few bytes per page are modified INC-FORK: If not a significant number of pages are modified

34
Reference J. S. Plank, K. Li, and M.A. Puening. "Diskless checkpointing." IEEE Transactions on Parallel & Distributed Systems, 9(10):972—986, Oct. 1998

Similar presentations

OK

FTOP: A library for fault tolerance in a cluster R. Badrinath Rakesh Gupta Nisheeth Shrivastava.

FTOP: A library for fault tolerance in a cluster R. Badrinath Rakesh Gupta Nisheeth Shrivastava.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google