Presentation is loading. Please wait.

Presentation is loading. Please wait.

Andrew Hanushevsky7-Feb-2000 1 Andrew Hanushevsky Stanford Linear Accelerator Center Produced under contract DE-AC03-76SF00515 between Stanford University.

Similar presentations


Presentation on theme: "Andrew Hanushevsky7-Feb-2000 1 Andrew Hanushevsky Stanford Linear Accelerator Center Produced under contract DE-AC03-76SF00515 between Stanford University."— Presentation transcript:

1 Andrew Hanushevsky7-Feb Andrew Hanushevsky Stanford Linear Accelerator Center Produced under contract DE-AC03-76SF00515 between Stanford University and the Department of Energy Disk Cache Management In Large-Scale Object Oriented Databases

2 Andrew Hanushevsky7-Feb Motivation n Problem u More data (>2 PB) than affordable disk space (< 300 TB) n Realization u Only about 10% of the data is used at any one time n Solution u Hierarchical Mass Storage System F Most data on tape (cheap) in-use data on disk (expensive) n Problem ( its all circular ) u Effectively manage the disk cache to keep the most useful data u Disk cache performance

3 Andrew Hanushevsky7-Feb Basic Disk Caching Architecture Control Data Database Management Cache Management

4 Andrew Hanushevsky7-Feb n Volume Manager + Journaled File System (e.g., Veritas) u Catenates disk devices to form very large capacity logical devices u High performance (60+ MB/Sec) journaled file system for fast recovery u Allows for fast streaming I/O and efficient small block transfers n Problems u Low random access performance u Limited to 1TB of cache/filesystem in most implementations u Unpredictable load balancing The Direct Solution: One Big Filesystem

5 Andrew Hanushevsky7-Feb n Still Need a Volume Manager + Journaled File System u But can spread the load across multiple heads I/O adapeters u Virtually unlimited cache size n Problems u Need to manage multiple filesystems u Need tools to balance the load F If not done automatically The Indirect Solution: Multiple Smaller Filesystem

6 Andrew Hanushevsky7-Feb Supporting Multiple Filesystem /cache1/databases:mydbfile /databases/mydbfile /cache2 /cache3 symlink Index Area Optional data cache Default data area Data Area Any number Any Size Chosen based on free space in LRU order Multiple Independent Filesystems Naming convention allows for audit and index recovery

7 Andrew Hanushevsky7-Feb Staging Manager n Copies files into the cache u Uses index space to link wanted name to actual file location u Uses allocation manager to select target filesystem u Uses lock manager to serialize access to target files & directories u Uses resource manager to control tape drive usage

8 Andrew Hanushevsky7-Feb File Placement (i.e., filesystem selection) n Round-robin allocation u Good for spreading the load n Maximum fit ( fuzz == 0 ) u Filesystem with largest amount of free space u Good when size not known n Maximal fit ( 0 < fuzz < 1 ) u Filesystem with largest amount of free space within a delta u Good when size unknown but want to keep round-robin allocation n First fit ( fuzz == 1 ) u First filesystem that can accommodate the file u Good when size known and want to spread the load

9 Andrew Hanushevsky7-Feb n Asks the staging manager to pre-fetch files u Allows user to transparently map objects to files u Avoids resource wait time (i.e., files available when job runs) u Notifies user synchronously or asynchronously when request completes u Uses client/server model of implementation for isolation Pre-Staging Manager

10 Andrew Hanushevsky7-Feb n Copies modified files from cache to Mass Storage System u File must not have been changed for x seconds F Reduces chance of multiple migrations of same file prior to purge u Specific files can be migrated on a priority basis by request F Uses client/server model of implementation for isolation Migration Manager

11 Andrew Hanushevsky7-Feb n Removes unused migrated files from the cache u Files purged in LRU order across all filesystems F File must not have been used for at least x seconds u Tries to maintain free-space in each file system at a target amount F Purging starts when free space falls below a specified file system threshold u Targets are specific to a filesystem but may be the same for all F Either a space percentage or absolute value, and a global file count u Specific files can be purged on a priority basis by request F Uses client/server model of implementation for isolation Implementation identical to migration priority queue u Files can be also pinned in the cache (i.e., not removable) F For a specific period of time F Until a certain date plus optional non-use time F Indefinitely Purge Manager

12 Andrew Hanushevsky7-Feb Cache Management Utilities n ooss_Xeq provides a common management interface u Audit cache disks (data files must be pointed to from the name space) F Optional fix-up allowed u Audit name space (name space must point to actual data files) F Optional fix-up allowed u Copy a file into the cache F Arbitrary source u Create an empty file in the cache u Rename a file in the index u Relocate a file to another filesystem u Remove a file from the index and cache F Optional removal from the Mass Storage System as well

13 Andrew Hanushevsky7-Feb Components For Effective Disk Cache Management

14 Andrew Hanushevsky7-Feb Conclusion n Effectively Managing A Large Disk Cache is Complex u Performance F Multiple small (100 GB) caches F Allocation Strategy F Relocation Strategy F External resource management (e.g., MSS tape drives) u Fault Tolerance F Multiple loosely connected components F Cache auditing and recovery u Usability F End-user interfaces for staging, migration, and purge u Administration F Extensive tools to safely manipulate cache contents


Download ppt "Andrew Hanushevsky7-Feb-2000 1 Andrew Hanushevsky Stanford Linear Accelerator Center Produced under contract DE-AC03-76SF00515 between Stanford University."

Similar presentations


Ads by Google