Presentation is loading. Please wait.

Presentation is loading. Please wait.

UC Santa Cruz Reliability of MEMS-Based Storage Enclosures Bo Hong, Thomas J. E. Schwarz, S. J. * Scott A. Brandt, Darrell D. E. Long Storage Systems Research.

Similar presentations


Presentation on theme: "UC Santa Cruz Reliability of MEMS-Based Storage Enclosures Bo Hong, Thomas J. E. Schwarz, S. J. * Scott A. Brandt, Darrell D. E. Long Storage Systems Research."— Presentation transcript:

1 UC Santa Cruz Reliability of MEMS-Based Storage Enclosures Bo Hong, Thomas J. E. Schwarz, S. J. * Scott A. Brandt, Darrell D. E. Long Storage Systems Research Center University of California, Santa Cruz *Also Santa Clara University, Santa Clara, CA

2 2 MEMS Storage Technology u Micro-Electro-Mechanical Systems (MEMS) storage A promising alternative secondary storage technology Hardware Research: IBM, HP, CMU, Nanochip Magnetic storage, but very different mechanics Spring

3 3 MEMS Storage Technology u MEMS-based storage vs. Magnetic Disk Provides non-volatile storage, too. Delivers 10 * faster access time (< 1 ms) Delivers higher bandwidth (100 MB – 1 GB/s) Small (size of penny, cent) Consumes 100* less power Costs ~10 USD per device Expected to be more reliable Stores limited amount of data per device (3-10 GB) u A serious alternative to disk drives, in particular for mobile computing applications

4 4 Reliability Implication of MEMS-based Storage u Storage systems built from MEMS-based storage … Require more MEMS devices n At least 10 times the number of disks to meet capacity requirements Require more connection components u Reliability implication More components, hence (?) lower reliability

5 5 MEMS Storage Enclosure u Our proposal: MEMS Enclosures A device with dozens of MEMS Single interface to rest of system Might be serviceable, but service calls during economic lifetime should be very rare Interface

6 6 MEMS Storage Enclosures u Reliability an issue: MTTF 1- 2 years without redundant data storage u Uses RAID Level 5 technology with distributed sparing Additional k spares u Calls for service when necessary i.e. when we run out of spares u Organization and number of spares can Decrease the data recovery time and thus improve reliability Reduce human interference n No errors servicing n Reduce maintenance costs

7 7 MEMS Enclosure Reliability u Measure MTBF for enclosures Without replacing spares With replacing spares (service calls) n Determine number of failures that trigger a service call n Mandatory replacement: no redundancy left n Preventive replacement: no spare left

8 8 MEMS Enclosure Reliability without Replacement Disk 23 Yrs 3 spares 5.8 Yrs 2 spares 4.6 Yrs No spare 2.3 Yrs 1 spare 3.5 Yrs 4 spares 6.9 Yrs Disk 11.5 Yrs 5 spares 8.1 Yrs u MTTF DISK = 11.5 or 23 yrs u MTTF MEMS = 23 yrs u 19 data + 1 parity + k dedicated spares u 15-minute data recovery u MTTF is not enough to measure reliability of enclosures without repairs u Instead: focus on data reliability during the economic lifetimes (3-5 years) of enclosures

9 9 MEMS Enclosures with Replacement u Markov model for a MEMS enclosure with N data, one parity, and one dedicated spare devices N – Normal; D – Degraded; DL – Data Loss 1/ – MTTF MEMS (in tens of years) 1/µ – Mean Time Between Recovery (in minutes) 1/ – Mean Time Between Replacement (in days, weeks) u Preventive and mandatory replacement Preventive replacement Mandatory replacement

10 10 MEMS Enclosure Reliability with Replacement u Preventive replacement increases reliability and reduces replacement urgency No spare Preventive + mandatory Mandatory , 2, 3 – Number of spares

11 11 MEMS Enclosure Reliability u Dedicated Sparing Replace all data from a failed MEMS on a single spare MEMS u Distributed Sparing Every spare contains n Client data n Parity data n Spare space

12 12 Distributed Sparing [ Menon and Mattson 1992 ] Before failure X u Shorter data recovery time u More devices can fail After MEMS 4 fails

13 13 Reliability Comparison: Dedicated Sparing vs. Distributed Sparing No spare Preventive + mandatory Mandatory Dedicated , 2– Number of spares Compare with following slide

14 14 Reliability Comparison: Dedicated Sparing vs. Distributed Sparing u Distributed sparing only better at short replacement times when using preventive replacement No spare Dedicated & Distributed Dedicated Distributed , 2– Number of spares Preventive + mandatory Mandatory

15 15 u All about economy How long can MEMS enclosures work without repairs? How often do they need repairing in the first 3-5 years? How does replacement policies affect maintenance frequency? u # of failures an enclosure with k spares can tolerate before the (m+1) th repair is scheduled (m >= 0): (m + 1) × k, under the preventive replacement policy (m + 1) × (k + 1), under the mandatory replacement policy Durability of MEMS Storage Enclosures

16 16 Durability of MEMS Storage Enclosures Probabilities that a MEMS storage enclosure has up to k failure during (0, t] 2 failures 4 failures 6 failures 1 failure Disk 23 Yrs No failure 8 failures 10 failures u First year survivability: 95.7% of disk vs. 98.8% of MEMS enclosures with two spares u Chance that MEMS enclosure with four spares requires more than one service in five years: 3.5% (preventive) vs. 0.6% (mandatory)

17 17 Related Work u MEMS-based storage technology development IBM, HP, CMU CHI 2 PS, Nanochip u Digital Micromirror Devices by TI Reported Mean Time Between Failure: 650,000 hours [Douglass] u RAID reliability Dedicated sparing [Dunphy et al.] Distributed sparing [Menon and Mattson] Parity sparing [Reddy and Banerjee] u Disk failure prediction S.M.A.R.T (Self-Monitoring Analysis and Reporting Technology)

18 18 Summary u Reliability of MEMS storage enclosures Can be more reliable than disks even without failed device replacement Highly reliable when using preventive replacement Dedicated sparing and distributed sparing provide comparable or almost identical reliability u Economy of MEMS storage enclosures Preventive replacement trades more maintenance services for higher reliability

19 19 Thank You! u Acknowledgements Dave Nagle, Greg Ganger, CMU PDL The rest of the UCSC SSRC u More information: u Questions?

20 20 Backup Slides

21 21 MEMS Storage Technology u Micro-Electro-Mechanical Systems (MEMS) storage A promising alternative secondary storage technology Hardware Research: IBM, HP, CMU, Nanochip u Radical differences between MEMS storage and magnetic disk technologies DiskMEMS Recoding media Magnetic Magnetic or physical (non-volatile) Recoding technique Longitudinal Orthogonal (higher density) R/W headSingle Thousands – tip array (Higher bandwidth and parallelism) Media movement Rotation Media sled moves in X and Y independently (no rotation delay)

22 22 MEMS Storage Device Characteristics u Physical size: 1 – 2 cm 2 u Recording density: 250 – 750 Gb/in 2 7GB/s 1ns10ns100ns1us10us100us1ms10ms 1GB/s 2GB/s 3GB/s 4GB/s 5GB/s 6GB/s Throughput DRAM DISK MEMS Predicted Performance in 2005 Access Latency 0.5–2 GB $100-$200/GB 3–10 GB $5-$50/GB 100–500 GB $1-$2/GB

23 23 MEMS Storage Device Spring X Y

24 24 Durability of MEMS Storage Enclosures Probabilities that a MEMS storage enclosure has up to k failure during (0, t] 2 failures 4 failures 6 failures 1 failure Disk 23 Yrs No failure 8 failures 10 failures


Download ppt "UC Santa Cruz Reliability of MEMS-Based Storage Enclosures Bo Hong, Thomas J. E. Schwarz, S. J. * Scott A. Brandt, Darrell D. E. Long Storage Systems Research."

Similar presentations


Ads by Google