Organizing files for performance Chapter 6. 6.1 Data compression Advantages of reduced file size Redundancy reduction: state code example Repeating sequences:

Organizing files for performance Chapter 6

6.1 Data compression Advantages of reduced file size Redundancy reduction: state code example Repeating sequences: run length encoding Variable length code –static (Morse code) –dynamic (Huffman code) Irreversible compression (e.g., jpeg) Unix routines (append.z to compressed files)

6.2 Reclaiming space “Holes” arise when –variable length records are updated –fixed or variable length records are deleted Compaction (for deleted records) –mark deleted records –allows undelete to be implemented –periodically run compaction program

6.2.2 Dynamic reclamation Simple approach: search sequentially until space is found to insert a new record; drawback: very slow Alternative uses linked list stack to allow immediate access to an empty slot, if available; stack may be kept in deleted record slots, with RRN of top in header record.

6.2.3 Variable length records Same scheme (linked list stack) may be used, except byte offset rather than RRN must be used as link Deleted records go on top of stack, but stack must be searched when adding records to find a space big enough to accommodate each new record

6.2.4 Fragmentation Internal –fixed length records –“unsophisticated” variable length scheme External: variable length records –smaller record is placed in a larger slot –leftover space is added to available list Coalescing holes (good test question)

6.2.5 Placement strategies First fit: first record slot that’s big enough Best fit: sort slots in ascending order by size, then use first fit Worst fit: sort in descending order –no need to search: just use first space if it’s big enough –leftover space may be enough for another record

6.3.2 Binary search relational ops for search key retrieval by RRN object-oriented presentation of algorithm –implementation with templates –compilation with class definitions

6.3.3-4 Search performance complexity for binary search is O(log 2 n), compared to O(n) for sequential search records must be sorted on search key disk sort is prohibitively expensive “internal sort” allows direct accesses in memory

6.3.5 Limitations number of disk accesses for binary search is still significant for large files keeping a file sorted can be less efficient than using sequential search; merge technique addresses this problem internal sort is limited to small files, that will fit entirely in memory

6.4 Keysort only keys are kept in memory each key is kept with its RRN (keynode) keynode array is sorted in memory data file can be sorted by reading records in order or sorted keynodes and writing them to a new file keynodes can be written as an index file

6.4.4 Pinned records available list (of deleted record slots) records whose physical locations are referenced in other records are pinned

Organizing files for performance Chapter 6. 6.1 Data compression Advantages of reduced file size Redundancy reduction: state code example Repeating sequences:

Similar presentations

Presentation on theme: "Organizing files for performance Chapter 6. 6.1 Data compression Advantages of reduced file size Redundancy reduction: state code example Repeating sequences:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Organizing files for performance Chapter 6. 6.1 Data compression Advantages of reduced file size Redundancy reduction: state code example Repeating sequences:

Similar presentations

Presentation on theme: "Organizing files for performance Chapter 6. 6.1 Data compression Advantages of reduced file size Redundancy reduction: state code example Repeating sequences:"— Presentation transcript:

Similar presentations

About project

Feedback