Presentation is loading. Please wait.

Presentation is loading. Please wait.

Deciding when to forget in the Elephant file system Douglas S. Santry Michael J. Feeley Norman C. Hutchinson Alistair C. Veitch Ross W. Carton Jacob Ofir.

Similar presentations


Presentation on theme: "Deciding when to forget in the Elephant file system Douglas S. Santry Michael J. Feeley Norman C. Hutchinson Alistair C. Veitch Ross W. Carton Jacob Ofir."— Presentation transcript:

1 Deciding when to forget in the Elephant file system Douglas S. Santry Michael J. Feeley Norman C. Hutchinson Alistair C. Veitch Ross W. Carton Jacob Ofir

2 Key Idea Elephant automatically retains all important versions of user files Elephant uses file-grain user-specified retention policies to reclaim storage Previous file versions are named by combining a traditional pathname with a time when the desired version of a file or directory existed

3 INTRODUCTION Modern file systems associate –Deletion of a file with the immediate release of storage –File writes with the irrevocable change of file contents Users control what is on disk by explicitly creating, updating and deleting files Best solution when disk space was at a premium

4 The problem Key problem with current approach is that user actions have immediate and irrevocable effect on disk storage –Users are not protected against their own mistakes Goes against file system objective of protecting data against failure We can do better today

5 Current solutions (I) Cedar protected against accidental overwrites by saving the last few versions of file – Cedar files were immutable: each write created a new version of the file –Does nothing for deleted files Windows and Mac OS allow users to undelete recently deleted files –Does nothing for files that were overwritten

6 Current solutions (II) Many systems are regularly backed up –Can restore the state of any file at backup time Many users maintain multiple versions of their critical data

7 Basic issues Can maintain multiple versions of user files but not all versions of all files –Need a retention policy Should we involve the user in the retention/reclamation decisions? Involving the user means –Less protection from user mistakes –A retention policy that might be better suited to the users’ needs

8 Not all files are created equal Read-only files (like application executables) have no version history Derived files (like object files) can be easily reconstituted Cached files require no version history Temporary files might benefit from a short-term history but not from a long-term history User-modified files would benefit most from a long-term and a short-term history

9 The two objectives Providing users with the ability of undoing recent changes –Keep the complete history of a file over a short period of time (one hour to one week) Maintaining a long-term history of important versions of each file –Keep forever landmark versions of each file

10 Finding the landmark versions Could rely on the user –User ability to recognize landmark versions of a file degrades with age of versions Elephant detects landmark versions by looking at time line of updates to the file –Can identify groups of updates separated by long periods of stability –Last versions of each group of updates are assumed to be landmark versions

11 User interface File versions are –Indexed by their creation time –Named by combining the file pathname with a date and time Versioning is extended to directories –Allow for recovery of deletes Previous versions of a file or a directory are read-only

12 Retention policies (I) Keep One: only keeps latest version of the file Keep All: keeps all versions of the file Keep Safe: keeps all versions of the file during a specific second-chance interval Keep Landmarks : keeps all versions of the file during a specific second-chance interval and only landmark versions after that

13 Retention policies (II) Keep-Landmarks policy also allows user to group files for consideration –Important for inter-dependent files as their consistency requires viewing all files as of the same point of time –Grouping policy is quite flexible: user can specify Individual files Entire directories of subtrees

14 Implementation (I) I-nodes of non-versioned files are stored in a special i-node file I-nodes of versioned files are stored in an i-node log –Versions are stored as an ordered sequence of i-nodes –Changes are detected at the block level –Versions of the same file share identical blocks

15 Implementation (II) Elephant use a different mechanism for versioned directories –We did not discuss it in class

16 Performance Somewhat slower than conventional file systems Using HP-UX traces collected at HP Labs one can estimate that Keep-Landmarks files would account for 62.4 % of files but only 15.2% of the disk space


Download ppt "Deciding when to forget in the Elephant file system Douglas S. Santry Michael J. Feeley Norman C. Hutchinson Alistair C. Veitch Ross W. Carton Jacob Ofir."

Similar presentations


Ads by Google