Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS-3013 & CS-502, Summer 2006 More on File Systems1 More on Disks and File Systems CS-3013 & CS-502 Operating Systems.

Similar presentations


Presentation on theme: "CS-3013 & CS-502, Summer 2006 More on File Systems1 More on Disks and File Systems CS-3013 & CS-502 Operating Systems."— Presentation transcript:

1 CS-3013 & CS-502, Summer 2006 More on File Systems1 More on Disks and File Systems CS-3013 & CS-502 Operating Systems

2 CS-3013 & CS-502, Summer 2006 More on File Systems2 Additional Topics Mapping files to VM RAID – Redundant Array of Inexpensive Disks Stable Storage Log Structured File Systems

3 CS-3013 & CS-502, Summer 2006 More on File Systems3 Reading Assignment(s) RAID – Tanenbaum §5.4.1 Stable Storage – §5.4.5 Log-structured File System – §6.3.8 These topics will be included on exam next week regardless of whether we complete them this evening

4 CS-3013 & CS-502, Summer 2006 More on File Systems4 Mapping files to VM Instead of “reading” from disk into virtual memory, why not simply use file as the swapping storage for certain VM pages? Called mapping Page tables in kernel point to disk blocks of the file

5 CS-3013 & CS-502, Summer 2006 More on File Systems5 Memory-Mapped Files Memory-mapped file I/O allows file I/O to be treated as routine memory access by mapping a disk block to a page in memory A file is initially read using demand paging. A page-sized portion of the file is read from the file system into a physical page. Subsequent reads/writes to/from the file are treated as ordinary memory accesses. Simplifies file access by allowing application to simple access memory rather than be forced to use read() & write() calls to file system.

6 CS-3013 & CS-502, Summer 2006 More on File Systems6 Memory-Mapped Files (continued) A tantalizingly attractive notion, but … Cannot use C/C++ pointers within mapped data structure Corrupted data structures more likely to persist in file Don’t really save anything in terms of Programming energy Thought processes Storage space & efficiency

7 CS-3013 & CS-502, Summer 2006 More on File Systems7 Memory-Mapped Files (continued) Nevertheless, the idea has its uses 1.Simpler implementation of file operations –read(), write() are memory-to-memory operations –seek() is simply changing a pointer, etc… –Called memory-mapped I/O 2.Shared Virtual Memory among processes

8 CS-3013 & CS-502, Summer 2006 More on File Systems8 Shared Virtual Memory

9 CS-3013 & CS-502, Summer 2006 More on File Systems9 Shared Virtual Memory (continued) Supported in –Windows XP –Apollo DOMAIN –Linux?? Synchronization is the responsibility of the sharing applications –OS retains no knowledge

10 CS-3013 & CS-502, Summer 2006 More on File Systems10 Questions?

11 CS-3013 & CS-502, Summer 2006 More on File Systems11 Problem Question:– –If mean time to failure of a disk drive is 100,000 hours, –and if your system has 100 identical disks, –what is mean time between drive replacement? Answer:– –1000 hours (i.e., 41.67 days  6 weeks) I.e.:– –You lose 1% of your data every 6 weeks! But don’t worry – you can restore most of it from backup!

12 CS-3013 & CS-502, Summer 2006 More on File Systems12 Can we do better? Yes, mirrored –Write every block twice, on two separate disks –Mean time between simultaneous failure of both disks is 57,000 years Can we do even better? –E.g., use fewer extra disks? –E.g., get more performance?

13 CS-3013 & CS-502, Summer 2006 More on File Systems13 RAID – Redundant Array of Inexpensive Disks Distribute a file system intelligently across multiple disks to –Maintain high reliability and availability –Enable fast recovery from failure –Increase performance

14 CS-3013 & CS-502, Summer 2006 More on File Systems14 “Levels” of RAID Level 0 – non-redundant striping of blocks across disk Level 1 – simple mirroring Level 2 – striping of bytes or bits with ECC Level 3 – Level 2 with parity, not ECC Level 4 – Level 0 with parity block Level 5 – Level 4 with distributed parity blocks

15 CS-3013 & CS-502, Summer 2006 More on File Systems15 RAID Level 0 – Simple Striping Each stripe is one or a group of contiguous blocks Block/group i is on disk (i mod n) Advantage –Read/write n blocks in parallel; n times bandwidth Disadvantage –No redundancy at all. System MBTF is 1/n disk MBTF! stripe 8 stripe 4 stripe 0 stripe 9 stripe 5 stripe 1 stripe 10 stripe 6 stripe 2 stripe 11 stripe 7 stripe 3

16 CS-3013 & CS-502, Summer 2006 More on File Systems16 RAID Level 1– Striping and Mirroring Each stripe is written twice Two separate, identical disks Block/group i is on disks (i mod 2n) & (i+n mod 2n) Advantages –Read/write n blocks in parallel; n times bandwidth –Redundancy: System MBTF = (Disk MBTF) 2 at twice the cost –Failed disk can be replaced by copying Disadvantage –A lot of extra disks for much more reliability than we need stripe 8 stripe 4 stripe 0 stripe 9 stripe 5 stripe 1 stripe 10 stripe 6 stripe 2 stripe 11 stripe 7 stripe 3 stripe 8 stripe 4 stripe 0 stripe 9 stripe 5 stripe 1 stripe 10 stripe 6 stripe 2 stripe 11 stripe 7 stripe 3

17 CS-3013 & CS-502, Summer 2006 More on File Systems17 RAID Levels 2 & 3 Bit- or byte-level striping Requires synchronized disks Highly impractical Requires fancy electronics For ECC calculations Not used; academic interest only See Silbershatz, §12.7.3 (pp. 471-472)

18 CS-3013 & CS-502, Summer 2006 More on File Systems18 Observation When a disk or stripe is read incorrectly, we know which one failed! Conclusion: –A simple parity disk can provide very high reliability (unlike simple parity in memory)

19 CS-3013 & CS-502, Summer 2006 More on File Systems19 RAID Level 4 – Parity Disk parity 0-3 = stripe 0 xor stripe 1 xor stripe 2 xor stripe 3 n stripes plus parity are written/read in parallel If any disk/stripe fails, it can be reconstructed from others –E.g., stripe 1 = stripe 0 xor stripe 2 xor stripe 3 xor parity 0-3 Advantages –n times read bandwidth –System MBTF = (Disk MBTF) 2 at 1/n additional cost –Failed disk can be reconstructed “on-the-fly” (hot swap) –Hot expansion: simply add n + 1 disks all initialized to zeros However –Writing requires read-modify-write of parity stripe  only 1x write bandwidth. stripe 8 stripe 4 stripe 0 stripe 9 stripe 5 stripe 1 stripe 10 stripe 6 stripe 2 stripe 11 stripe 7 stripe 3 parity 8-11 parity 4-7 parity 0-3

20 CS-3013 & CS-502, Summer 2006 More on File Systems20 RAID Level 5 – Distributed Parity Parity calculation is same as RAID Level 4 Advantages & Disadvantages –Same as RAID Level 4 Additional advantage: avoids beating up on parity disk Writing individual stripes (RAID 4 & 5) –Read existing stripe and existing parity –Recompute parity –Write new stripe and new parity stripe 12 stripe 8 stripe 4 stripe 0 parity 12-15 stripe 9 stripe 5 stripe 1 stripe 13 parity 8-11 stripe 6 stripe 2 stripe 14 stripe 10 parity 4-7 stripe 3 stripe 15 stripe 11 stripe 7 parity 0-3

21 CS-3013 & CS-502, Summer 2006 More on File Systems21 RAID 4 & 5 Very popular in data centers –Corporate and academic servers Built-in support in Windows XP and other systems –Connect a group of disks to fast SCSI port (320 MB/sec bandwidth) –OS RAID support does the rest!

22 CS-3013 & CS-502, Summer 2006 More on File Systems22 New Topic Problem – how to protect against disk write operations that don’t complete –Power or CPU failure in the middle of a block –Related series of writes interrupted in middle Examples: –Database update of charge and credit –RAID 1, 4, 5 failure between redundant writes

23 CS-3013 & CS-502, Summer 2006 More on File Systems23 Solution (part 1) – Stable Storage Write everything twice (separate disks) Be sure 1 st write does not invalidate previous 2 nd copy RAID 1 is okay; RAID 4/5 not okay! Read blocks back to validate; then report completion Reading both copies If 1 st copy okay, use it – i.e., newest value If 2 nd copy different, update it with 1 st copy If 1 st copy error; use 2 nd copy – i.e., old value

24 CS-3013 & CS-502, Summer 2006 More on File Systems24 Stable Storage (continued) Crash recovery Scan disks, compare corresponding blocks If one is bad, replace with good one If both good but different, replace 2 nd with 1 st copy Result:– If 1 st block is good, it contains latest value If not, 2 nd block still contains previous value An abstraction of an atomic disk write of a single block Uninterruptible by power failure, etc.

25 CS-3013 & CS-502, Summer 2006 More on File Systems25 What about more complex disk operations? E.g., File create operation involves Allocating free blocks Constructing and writing i-node –Possibly multiple i-node blocks Reading and updating directory What if system crashes with the sequence only partly completed? Answer: inconsistent data structures on disk

26 CS-3013 & CS-502, Summer 2006 More on File Systems26 Solution (Part 2) – Log-Structured File System Make changes to cached copies in memory Collect together all changed blocks Write to log file A circular buffer on disk Fast, contiguous write Update log file pointer in stable storage Offline: Play back log file to actually update directories, i-nodes, free list, etc. Update playback pointer in stable storage

27 CS-3013 & CS-502, Summer 2006 More on File Systems27 Transaction Data Base Systems Similar techniques –Every transaction is recorded in log before recording on disk –Stable storage techniques for managing log pointers –One log exist is confirmed, disk can be updated in place –After crash, replay log to redo disk operations

28 CS-3013 & CS-502, Summer 2006 More on File Systems28 Unix LFS Tanenbaum, §6.3.8, pp. 428-430 Everything is written to log i-nodes point to updated blocks in log i-node cache in memory updated whenever i-node is written Cleaner daemon follows behind to compact log Advantages: –LFS is always consistent –LFS performance Much better than Unix FS for small writes At least as good for reads and large writes


Download ppt "CS-3013 & CS-502, Summer 2006 More on File Systems1 More on Disks and File Systems CS-3013 & CS-502 Operating Systems."

Similar presentations


Ads by Google