Presentation is loading. Please wait.

Presentation is loading. Please wait.

File StructuresSNU-OOPSLA Lab.1 Chap6. Organizing Files for Performance 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File structures by Folk, Zoellick.

Similar presentations


Presentation on theme: "File StructuresSNU-OOPSLA Lab.1 Chap6. Organizing Files for Performance 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File structures by Folk, Zoellick."— Presentation transcript:

1 File StructuresSNU-OOPSLA Lab.1 Chap6. Organizing Files for Performance 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File structures by Folk, Zoellick and Ricarrdi

2 File Structures SNU-OOPSLA Lab. 2 Chapter Objectives(1) u Look at several approaches to data compression u Look at storage compaction as a simple way of reusing space in a file u Develop a procedure for deleting fixed-length records that allows vacated file space to be reused dynamically u Illustrate the use of linked lists and stacks to manage an avail list u Consider several approaches to the problem of deleting variable-length records u Introduce the concepts associated with the terms internal fragmentation and external fragmentation

3 File Structures SNU-OOPSLA Lab. 3 Chapter Objectives(2) u Outline some placement strategies associated with the reuse of space in a variable-length record file u Provide an introduction to the idea underlying a binary search u Undertake an examination of the limitations of binary searching u Develop a keysort procedure for sorting larger files; investigate the costs associated with keysort u Introduce the concept of a pinned record

4 File Structures SNU-OOPSLA Lab. 4 Contents 6.1 Data compression 6.2 Reclaiming space in files 6.3 Finding things quickly: An Introduction to internal sorting and binary searching 6.4 Keysorting

5 File Structures SNU-OOPSLA Lab. 5 Data Compression(1) u Reasons for data compression u less storage u transmitting faster, decreasing access time u processing faster sequentially 6.1 Data Compression

6 File Structures SNU-OOPSLA Lab. 6 Data Compression(2) : Using a different notation u Fixed-Length fields are good candidates u Decrease the # of bits by finding a more compact notation ex) original state field notation is 16bits, but we can encode with 6bit notation because of the # of all states are 50 u Cons. u unreadable by human u cost in encoding time u decoding modules => increase the complexity of s/w => used for particular application 6.1 Data Compression

7 File Structures SNU-OOPSLA Lab. 7 Data Compression(3) : Suppressing repeating sequences u Run-length encoding algorithm u read through pixels, copying pixel values to file in sequence, except the same pixel value occurs more than once in succession u when the same value occurs more than once in succession, substitute the following three bytes Ê special run-length code indicator((ex) ff) Ë pixel value repeated Ì the number of times that value is repeated u ex) 22 23 24 24 24 24 24 24 24 25 26 26 26 26 26 26 25 24 Ô 22 23 ff 24 07 25 ff 26 06 25 24 6.1 Data Compression

8 File Structures SNU-OOPSLA Lab. 8 화면 pixel 빛의 세기 수치화 (digital) 각 pixel 당 전기 신호 (analog) 컴퓨터내 image 의 표현

9 File Structures SNU-OOPSLA Lab. 9 화면 12 8 12 33 99 12 56 7 13 44 66 23 12 4 34 57 99 12 …... 컴퓨터내 image 의 표현

10 File Structures SNU-OOPSLA Lab. 10 화면 12 4 34 57 99 12 …... 56 7 13 44 66 23 12 4 34 57 99 12 …... 12 8 12 33 99 12 56 7 13 44 66 23 12 4 34 57 99 12 …... ** 동영상 --- 초당 25 - 30 개의 정지화상을 교체 (video)(image) 컴퓨터내 color 영상의 표현

11 File Structures SNU-OOPSLA Lab. 11 Data Compression(3) : Suppressing repeating sequences u Run-length encoding (cont’d) u example of redundancy reduction u cons. u not guarantee any particular amount of space savings u under some circumstances, compressed image is larger than original image u Why? Can you prevent this? 6.1 Data Compression

12 File Structures SNU-OOPSLA Lab. 12 Data Compression(4) : Assigning variable-length codes u Morse code: oldest & most common scheme of variable-length code u Some values occur more frequently than others u that value should take the least amount of space u Huffman coding u base on probability of occurrence u determine probabilities of each value occurring u build binary tree with search path for each value u more frequently occurring values are given shorter search paths in tree 6.1 Data Compression

13 File Structures SNU-OOPSLA Lab. 13 Data Compression(5) : Assigning variable-length codes u Huffman coding Letter:abcdefg Prob:0.40.10.10.10.10.10.1 Code:10100110000000100100011 ex) the string “abde” è 101000000001 6.1 Data Compression

14 File Structures SNU-OOPSLA Lab. 14 d(0000)e(0001) f(0010) g(0011) b(010)c(011) a(1) Huffman Tree 0 00 01 000001 6.1 Data Compression

15 File Structures SNU-OOPSLA Lab. 15 Data Compression(6) : Irreversible compression techniques u Some information can be sacrificed u Less common in data files u Shrinking raster image u 400-by-400 pixels to 100-by-100 pixels u 1 pixel for every 16 pixels u Speech compression u voice coding (the lost information is of no little or no value) 6.1 Data Compression

16 File Structures SNU-OOPSLA Lab. 16 Compression in UNIX u System V u pack & unpack use Huffman codes u after compress file, appends “.z” to end of packed file u Berkeley UNIX u compress & uncompress use Lempel-Ziv method u after compress file, appends “.Z” to end of compressed file 6.1 Data Compression

17 File Structures SNU-OOPSLA Lab. 17 Record Deletion and Storage Compaction u Storage compaction u record deletion : just marks each deleted record u reclamation of all deleted records => pros : delete/undelete operation with little effort Ex) Ames|123|OK|…...| Morrison|9035|OK| Brown|625|IA|…...| Delete second record Ames|123|OK|…| *|rrison|9035|OK| Brown|625|IA|…| After compaction Ames|123|OK|…| Brown|625|IA|…| 6. 2 Reclaiming Space in Files

18 File Structures SNU-OOPSLA Lab. 18 Deleting Fixed-length Records for Reclaiming Space Dynamically(1) u Reuse the space from deleted records as soon as possible u deleted records must be marked in special way u we could find the deleted space u To make record reuse quickly, we need u a way to know immediately if there are empty slots in the file u a way to jump directly to one of those slots if they exist => Linked lists or Stacks for avail list * avail list : a list that is made up of deleted records 6. 2 Reclaiming Space in Files

19 File Structures SNU-OOPSLA Lab. 19 Deleting Fixed-length Records for Reclaiming Space Dynamically(2) Head pointer RRN 5 RRN 2 Head pointer RRN 3 PRN 5 RRN 2 (2) (3) 2 25 (a) (b) after pushing record of RRN 3 Head pointer ptr The Linked List The Stack 6. 2 Reclaiming Space in Files

20 File Structures SNU-OOPSLA Lab. 20 Deleting Fixed-length Records for Reclaiming Space Dynamically(3) u Linking and stacking deleted records u arranging and rearranging links are used to make one available record slot point to the next u second field of deleted record points to next record 6. 2 Reclaiming Space in Files

21 File Structures SNU-OOPSLA Lab. 21 0 1 2 3 4 5 6 Edwards... Betas... Wills... *-1 Masters.. *3 Chavez... Edwards... *5 Wills... *-1 Masters.. *3 Chavez... Edwards.. 1st new rec Wills... 3rd new rec Masters.. 2nd new rec Chavez... Sample file showing linked list of deleted records List head(first available record) 5 (delete 3, 5 ) List head(first available record) 1 (delete 1) List head(first available record) -1 (insert three new records) (a) (b) (c) 6. 2 Reclaiming Space in Files

22 File Structures SNU-OOPSLA Lab. 22 Deleting Variable-length Records u Avail list of variable-length records u it has byte count of record at beginning of each record u use byte offset instead of RRN u Adding and removing records u in adding records, search through avail list for right size (=>big enough) 6. 2 Reclaiming Space in Files

23 File Structures SNU-OOPSLA Lab. 23 Size 47 Size 38 Size 72 Size 68 Size 47 Size 68 Size 38 Size 72 New Link Removed record (a)Before removal (b)After removal Removal of a record from an avail list with variable-length records 6. 2 Reclaiming Space in Files

24 File Structures SNU-OOPSLA Lab. 24 Storage Fragmentation u Internal fragmentation (in fixed-length record) u waste space within a record u in variable-length records, minimize wasted space by doing away with internal fragmentation u External fragmentation (in variable-length record) u unused space outside or between individual records u three possible solutions Ê storage compaction · coalescing the holes: a single, larger record slot ¸ minimizing fragmentation by adopting placement strategy 6. 2 Reclaiming Space in Files

25 File Structures SNU-OOPSLA Lab. 25 Internal Fragmentation in Fixed-length Records Ames | John | 123 Maple | Stillwater | OK | 740751 |................................... Morrison | Sebastian | 9035 South Hillcrest | Forest Village | OK | 74820 | Brown | Martha | 625 Kimbark | Des Moines | IA | 50311 |......................... 64-byte fixed-length records Unused space -> Internal fragmentation 6. 2 Reclaiming Space in Files

26 File Structures SNU-OOPSLA Lab. 26 External Fragmentation in Variable-length Records 40 Ames | Jone | 123 Maple | Stillwater | OK | 740751 | 64 Morrison | Sebastian | 9035 South Hillcrest | Forest Village | OK | 74820 | 45 Brown | Martha | 625 Kimb bark | Des Moines | IA | 50311 | Record[1]Record[2] Record[3] ex) Delete Record[2] and Insert New Record[i] : 12-byte unused space 52 Adams | Kits | 3301 Washington D.C | Forest Village | IA | 43563 | External fragmentation record length Record[i] 6. 2 Reclaiming Space in Files

27 File Structures SNU-OOPSLA Lab. 27 Placement Strategies u First-fit u select the first available record slot u suitable when lost space is due to internal fragmentation u Best-fit u select the available record slot closest in size u avail list in ascending order u suitable when lost space is due to internal fragmentation u Worst-fit u select the largest record slot u avail list in descending order u suitable when lost space is due to external fragmentation 6. 2 Reclaiming Space in Files

28 File Structures SNU-OOPSLA Lab. 28 Finding Things Quickly(1) u Goal: Minimize the number of disk accesses u Finding things in simple field and record files may have many seeks u Binary search algorithm for fixed-sized record int BinarySearch(FixedRecordFile &file, RecType &obj, KeyType &key) // binary search for key. { int low = 0; int high = file.NumRecs() - 1; while (low <= high){ int guess = (high - low)/2; file.ReadByRRN(obj, guess); if(obj.Key () == key) return 1; // record found if*obj.Key() < key) high = guess - 1; // search before guess else low = guess + 1; // search after guess } return 0; // loop ended without finding key } 6.3 Finding Things Quickly : An Introduction to Internal Sorting and Binary Searching

29 File Structures SNU-OOPSLA Lab. 29 Classes and Methods for Binary Search Class KeyType {public int operator == (KeyType &); int operator < (KeyType &); }; class RecType {public: KeyType Key();}; class FixedRecordFile{public: int NumRecs(); int ReadByRRN (RecType & Record, int RRN); };

30 File Structures SNU-OOPSLA Lab. 30 Finding Things Quickly(2) u Binary search vs. Sequential search u binary search u O(log n) u list is sorted by key u sequential search u O(n) 6.3 Finding Things Quickly : An Introduction to Internal Sorting and Binary Searching

31 File Structures SNU-OOPSLA Lab. 31 Finding Things Quickly(3) u Sorting a disk file in RAM u read the entire file from disk to memory u use internal sort (=sort in memory) u UNIX sort utility uses internal sort u Limitations of binary search & internal sort u binary search requires more than one or two access c.f.) single access by RRN u keeping a file sorted is very expensive u an internal sort works only on small files 6.3 Finding Things Quickly : An Introduction to Internal Sorting and Binary Searching

32 File Structures SNU-OOPSLA Lab. 32 Internal Sort unsorted file unsorted file sorted file Read the entire file Sort in memory disk memory 6.3 Finding Things Quickly : An Introduction to Internal Sorting and Binary Searching

33 File Structures SNU-OOPSLA Lab. 33 Key Sorting & Its Limitations u So called, “tag sort” : sorted thing is “key” only u Sorting procedure ¶ Read only the keys into memory · Sort the keys ¸ Rearrange the records in file by the sorted keys u Advantage u less RAM than internal sort u Disadvantages(=Limitations) u reading records in disk twice is required u a lot of seeking for records for constructing a new(sorted) file 6.4 Keysorting

34 File Structures SNU-OOPSLA Lab. 34 1212 3 k HARRISON KELLOG HARRIS BELL........ Harrison|Susan|387 Eastern.... Kellog|Bill|17 Maple.... Harris|Margaret|4343 West.... Bell|Robert|8912 Hill.... KEYRRN Records In RAMOn secondary storage k3k3 1 2 HARRISON KELLOG HARRIS BELL........ Harrison|Susan|387 Eastern.... Kellog|Bill|17 Maple.... Harris|Margaret|4343 West.... Bell|Robert|8912 Hill.... KEYRRN Records Conceptual view after sorting keys in RAM Conceptual view before sorting KEYNODES array 6.4 Keysorting

35 File Structures SNU-OOPSLA Lab. 35 Pseudocode for keysort(1)  Program: keysort  open input file as IN_FILE  create output file as OUT_FILE  read header record from IN_FILE and write a copy to OUT_FILE  REC_COUNT := record count from header record  /* read in records; set up KEYNODES array */  for i := 1 to REC_COUNT  read record from IN_FILE into BUFFER  extract canonical key and place it in KEYNODES[i].KEY  KEYNODES[i].KEY = i (continued....) 6.4 Keysorting

36 File Structures SNU-OOPSLA Lab. 36 Pseudocode for keysort(2)  /* sort KEYNODES[].KEY, thereby ordering RRNs correspondingly */  sort(KEYNODES, REC_COUNT)  /* read in records according to sorted order, and write them out in this order */  for i := 1 to REC_COUNT  seek in IN_FILE to record with RRN of KEYNODES[I].RRN write BUFFER contents to OUT_FILE  close IN_FILE and OUT_FILE  end PROGRAM 6.4 Keysorting

37 File Structures SNU-OOPSLA Lab. 37 Two Solutions :why bother to write the file back? u Write out sorted KEYNODES[] array without writing records back in sorted order u KEYNODES[] array is used as index file 6.4 Keysorting

38 File Structures SNU-OOPSLA Lab. 38 k3k3 1 2 HARRISON KELLOG HARRIS BELL........ Harrison|Susan|387 Eastern.... Kellog|Bill|17 Maple.... Harris|Margaret|4343 West.... Bell|Robert|8912 Hill.... KEYRRN Records Index file Original file Relationship between the index file and the data file 6.4 Keysorting

39 File Structures SNU-OOPSLA Lab. 39 Pinned records(1) u Records that are referenced to physical location of themselves by other records u Not free to alter physical location of records for avoiding dangling references u Pinned records make sorting more difficult and sometimes impossible u solution: use index file, while keeping actual data file in original order 6.4 Keysorting

40 File Structures SNU-OOPSLA Lab. 40 Pinned records(2) File with pinned records Record(i) Pinned Record Record (i+1)Pinned Record delete pinned record dangling pointer 6.4 Keysorting

41 File Structures SNU-OOPSLA Lab. 41 Let’s Review !!! 6.1 Data compression 6.2 Reclaiming space in files 6.3 Finding things quickly: An Introduction to internal sorting and binary searching 6.4 Keysorting


Download ppt "File StructuresSNU-OOPSLA Lab.1 Chap6. Organizing Files for Performance 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File structures by Folk, Zoellick."

Similar presentations


Ads by Google