Presentation is loading. Please wait.

Presentation is loading. Please wait.

File StructureSNU-OOPSLA Lab.1 Chap 7. Indexing 서울대학교 컴퓨터공학과 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures by Folk, Zoellick, and Ricarrdi.

Similar presentations


Presentation on theme: "File StructureSNU-OOPSLA Lab.1 Chap 7. Indexing 서울대학교 컴퓨터공학과 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures by Folk, Zoellick, and Ricarrdi."— Presentation transcript:

1 File StructureSNU-OOPSLA Lab.1 Chap 7. Indexing 서울대학교 컴퓨터공학과 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures by Folk, Zoellick, and Ricarrdi

2 File StructureSNU-OOPSLA Lab.2 Chapter Objectives(1) u Introduce concepts of indexing that have broad applications in the design of file systems u Introduce the use of a simple linear index to provide rapid access to records in an entry-sequenced, variable-length record file u Investigate the implementation of the use of indexes for file maintenance u Introduce the template features of C++ for object I/O u Describe the object-oriented approach to indexed sequential files

3 File StructureSNU-OOPSLA Lab.3 Chapter Objectives(2) u Describe the use of indexes to provide access to records by more than one key u Introduce the idea of an inverted list, illustrating Boolean operations on lists u Discuss of when to bind an index key to an address in the data file u Introduce and investigate the implications of self-indexing files

4 File StructureSNU-OOPSLA Lab.4 Contents(1) 7.1 What is an Index? 7.2 A Simple Index for Entry-Sequenced Files 7.3 Using Template Classes in C++ for Object I/O 7.4 Object-Oriented Support for Indexed, Entry- Sequenced Files of Data Objects 7.5 Indexes That Are Too Large to Hold in Memory

5 File StructureSNU-OOPSLA Lab.5Contents(2) 7.6 Indexing to Provide Access by Multiple Keys 7.7 Retrieval Using Combinations of Secondary Keys 7.8 Improving the Secondary Index Structure: Inverted Lists 7.9 Selective Indexes 7.10 Binding

6 File StructureSNU-OOPSLA Lab.6 Overview: Index(1) u Index: a data structure which associates given key values with corresponding record numbers u It is usually physically separate from the file (unlike for indexed sequential files tight binding). u Linear indexes (like indexes found at the back of books) u Index records are ordered by key value as in an ordered relative file u Best algorithm for finding a record with a specific key value is binary search u Addition requires reorganization 7.1 What Is an Index?

7 File StructureSNU-OOPSLA Lab.7 Overview: Index(2) k1k2k4k5k7k9 k1k2k4k5k7k9 AAAZZZCCCXXXEEEFFF Index File Data File 7.1 What Is an Index?

8 File StructureSNU-OOPSLA Lab.8 Overview: Index(3) u Tree Indexes (like those of indexed sequential files) u Hierarchical in that each level u Beginning with the root level, points to the next record u Leaves POINTs only the data file u Indexed Sequential File u Binary Tree Index u AVL Tree Index u B+ tree Index 7.1 What Is an Index?

9 File StructureSNU-OOPSLA Lab.9 Roles of Index? u Index: keys and reference fields u Fast Random Accesses u Uniform Access Speed u Allow users to impose order on a file without actually rearranging the file u Provide multiple access paths to a file u Give user keyed access to variable-length record files 7.1 What Is an Index?

10 File StructureSNU-OOPSLA Lab.10 A Simple Index(1) u Datafile u entry-sequenced, variable-length record u primary key : unique for each entry in a file u Search a file with key (popular need) u cannot use binary search in a variable-length record file(can’t know where the middle record) u construct an index object for the file u index object : key field + byte-offset field 7.2 A Simple Index for E-S Files

11 File StructureSNU-OOPSLA Lab.11 A Simple Index (2) ANG3795 167 COL31809 353 COL38358 211 DG18807 256 FF245 442 LON2312 32 MER75016 300 RCA2626 77 WAR23699 132 DG139201 396 LON|2312|Romeo and Juliet|Prokofiev... RCA|2626|Quarter in C Sharp Minor... WAR|23699|Touchstone|Corea... ANG|3795|Sympony No. 9|Beethoven... COL|38358|Nebeaska|Springsteen... DG|18807|Symphony No. 9|Beethoven... MER|75016|Coq d'or Suite|Rimsky... COL|31809|Symphony No. 9|Dvorak... DG|139201|Violin Concerto|Beethoven... FF|245|Good News|Sweet Honey In The... 32 77 132 167 211 256 300 353 396 442 Datafile Actual data record Address of record Reference field Key Indexfile 7.2 A Simple Index for E-S Files

12 File StructureSNU-OOPSLA Lab.12 A Simple Index (3) u Index file: fixed-size record, sorted u Datafile: not sorted because it is entry sequenced u Record addition is quick (faster than a sorted file) u Can keep the index in memory u find record quickly with index file than with a sorted one u Class TextIndex encapsulates the index data and index operations Key Reference field 7.2 A Simple Index for E-S Files

13 File StructureSNU-OOPSLA Lab.13 Let’s See Figure 7.4 7.2 A Simple Index for E-S Files Class TextIndex{ public: TextIndex(int maxKeys = 100, int unique = 1); int Insert(const char*ckey, int recAddr); //add to index int Remove(const char* key); //remove key from index int Search(const char* key) const; //search for key, return recAddr void Print (ostream &) const; protected: int MaxKeys; // maximum num of entries int NumKeys;// actual num of entries char **Keys; // array of key values int* RecAddrs; // array of record references int Find (const chat* key) const; int Init (int maxKeys, int unique); int Unique;// if true --> each key must be unique }

14 File StructureSNU-OOPSLA Lab.14 Index Implementation u Page 638, 639, 640 u G.1 Recording.h u G.2 Recording.cpp u G.3 Makere.cpp u Page 641, 642 u G.4 Textind.h u G.5 Textind.cpp

15 File StructureSNU-OOPSLA Lab.15 RetrieveRecording with the Index  RetrieveRecording (KEY...) procedure : retrieve a single record by key from datafile. And puts together the index search, file read, and buffer unpack operations into single function int RetriveRecording (Recording & recording, char * key, TextIndex & RecordingIndex, BufferFile & RecordingFile) // read and unpack the recording, return TRUE if succeeds { int result; result = RecordingFile. Read (RecordingIndex.Search(key)); if (result == -1) return FALSE; result = recording.Unpack (RecordingFile.GetBuffer()); return result; }

16 File StructureSNU-OOPSLA Lab.16 u Template Class RecordFile u we want to make the following code possible u Person p; RecordFile pFile; pFile.Read(p); u Recording r; RecordFile rFile; rFile.Read(r); u difficult to support files for different record types without having to modify the class u Template class which is derived from BufferFile u the actual declarations and calls u RecordFile pFile; pFile.Read(p); u RecordFile rFile; rFile.Read(p); Template Class for I/O Object(1) 7.3 Using Template Classes in C++ for Object I/O

17 File StructureSNU-OOPSLA Lab.17 Template Class for I/O Object(2) u Template Class RecordFile 7.3 Using Template Classes in C++ for Object I/O template class RecordFile : public BufferFile{ public: int Read(RecType& record, int recaddr = -1); int Write(const RecType& record, int recaddr = -1); int Append(const RecType& record); RecordFile(IOBuffer& buffer) : BufferFile(buffer) {} }; //The template parameter RecType must have the following methods //int Pack(IOBuffer &); pack record into buffer //int Unpack(IOBuffer &); unpack record from buffer

18 File StructureSNU-OOPSLA Lab.18 u Adding I/O to an existing class RecordFile u add methods Pack and Unpack to class Recording u create a buffer object to use in the I/O u DelimFieldBuffer Buffer; u declare an object of type RecordFile u RecordFile rFile (Buffer); u Declaration and Calls Template Class for I/O Object(3) 7.3 Using Template Classes in C++ for Object I/O Recording r1, r2; rFile.Open(“myfile”); rFile.Read(r1); rFile.Write(r2); Directly open a file and read and write objects of class Recording

19 File StructureSNU-OOPSLA Lab.19 Object-Oriented Approach to I/O u Class IndexedFile u add indexed access to the sequential access provided by class RecordFile u extends RecordFile with Update, Append and Read method u Update & Append : maintain a primary key index of data file u Read : supports access to object by key u TextIndex, RecordFile ==> IndexedFile u Issues of IndexedFile u how to make a persistent index of a file u how to guarantee that the index is an accurate reflection of the contents of the data file 7.4 OO Support for Indexed, E-S Files of Data Objects

20 File StructureSNU-OOPSLA Lab.20 u Create the original empty index and data files u Load the index file into memory u Rewrite the index file from memory u Add records to the data file and index u Delete records from the data file u Update records in the data file u Update the index to reflect changes in the data file u Retrieve records 7.4 OO Support for Indexed, E-S Files of Data Objects Basic Operations of IndexedFile(1) Basic Operations of IndexedFile(1)

21 File StructureSNU-OOPSLA Lab.21 Basic Operations of TextIndexedFile (1) u Creating the files u initially empty files (index file and data file)  created as empty files with header records u implementation ( makeind.cpp in Appendix G )  Create method in class BufferFile u Loading the index into memory u loading/storing objects are supported in the IOBuffer classes u need to choose a particular buffer class to use for an index file ( tindbuff.cpp in Appendix G ) u define class TextIndexBuffer as a derived class of FixedFieldBuffer to support reading and writing of index objects 7.4 OO Support for Indexed, E-S Files of Data Objects

22 File StructureSNU-OOPSLA Lab.22 u Rewriting the index file from memory u part of the Close operation on an IndexedFile u write back index object to the index file u should protect the index when failure u write changes when out-of-date(use status flag) u Implementation u Rewind and Write operations of class BufferFile u Record Addition 7.4 OO Support for Indexed, E-S Files of Data Objects Basic Operations of TextIndexedFile(2) Basic Operations of TextIndexedFile(2) Add an entry to the index Requires rearrangement if in memory, no file access using TextIndex.Insert Add a new record to data file using RecordFile ::Write +

23 File StructureSNU-OOPSLA Lab.23 u Record Deletion u data file: the records need not be moved u index: delete entry really or just mark it u using TextIndex::Delete u Record Updating (2 categories) ¶ the update changes the value of the key field u delete/add approach u reorder both the index and the data file · the update does not affect the key field u no rearrangement of the index file u may need to reconstruct the data file 7.4 OO Support for Indexed, E-S Files of Data Objects Basic Operations of TextIndexedFile(3) Basic Operations of TextIndexedFile(3)

24 File StructureSNU-OOPSLA Lab.24 Class TextIndexedFile(1) u Members u methods u Create, Open, Close, Read (sequential & indexed), Append, and Update operations u protected members u ensure the correlation between the index in memory (Index), the index file (IndexFile), and the data file (DataFile) u char* key() u the template parameter RecType must have the key method u used to extract the key value from the record 7.4 OO Support for Indexed, E-S Files of Data Objects

25 File StructureSNU-OOPSLA Lab.25 Class TextIndexedFile(2) 7.4 OO Support for Indexed, E-S Files of Data Objects Template class TextIndexedFile { public: int Read(RecType& record); // read next record int Read(char* key, RecType& record) // read by key int Append(const RecType& record); int Update(char* oldKey, const RecType& record); int Create(char* name, int mode=ios::in|los::out); int Open(char* name, int mode=ios::in|los::out); int Close(); TextIndexedFile(IOBuffer & buffer, int keySize, int maxKeys=100); ~TextIndexedFile(); // close and delete protected: TextIndex Index; BufferFile IndexFile; TextIndexBuffer IndexBuffer; RecordFile DataFile; char * FileName; // base file name for file int SetFileName(char* fName, char*& dFileName, char*&IdxFName); };

26 File StructureSNU-OOPSLA Lab.26 Enhancements to TextIndexedFile(1) u Support other types of keys u Restriction: the key type is restricted to string (char *) u Relaxation: support a template class SimpleIndex with parameter for key type u Support data object class hierarchies u Restriction: every object must be of the same type in RecordFile u Relaxation: the type hierarchy supports virtual pack methods 7.4 OO Support for Indexed, E-S Files of Data Objects

27 File StructureSNU-OOPSLA Lab.27 Enhancements to TextIndexedFile(2) 7.4 OO Support for Indexed, E-S Files of Data Objects u Support multirecord index files u Restriction: the entire index fit in a single record u Relaxation: add protected method Insert, Delete, and Search to manipulate the arrays of index objects u Active optimization of operations u Obvious: the most obvious optimization is to use binary search in the Find method u Active: add a flag to the index object to avoid writing the index record back to the index file when it has not been changed

28 File StructureSNU-OOPSLA Lab.28 Where are we going? u Plain Stream File u Persistency ==> Buffer support ==> BufferFile Deriving BufferFile using various other classes u Random Access ==> Index support => IndexedFile : Deriving TextIndexedFile using RecordFile and TextIndex

29 File StructureSNU-OOPSLA Lab.29 Too Large Index(1) u On secondary storage (large linear index) u Disadvantages u binary searching of the index requires several seeks(slower than a sorted file) u index rearrangement requires shifting or sorting records on second storage u Alternatives (to be considered later) u hashed organization u tree-structured index (e.g. B-tree) 7.5 Indexes That Are Too Large to Hold in Memory

30 File StructureSNU-OOPSLA Lab.30 Too Large Index (2) u Advantages over the use of a data file sorted by key even if the index is on the secondary storage u can use a binary search u sorting and maintaining the index is less expensive than doing the data file u can rearrange the keys without moving the data records if there are pinned records 7.5 Indexes That Are Too Large to Hold in Memory

31 File StructureSNU-OOPSLA Lab.31 Index by Multiple Keys(1) u DB-Schema = ( ID-No, Title, Composer, Artist, Label) u Find the record with ID-NO “COL38358” (primary key - ID-No) u Find all the recordings of “Beethoven” (2ndary key - composer) u Find all the recordings titled “Violin Concerto” (2ndary key - title) 7.6 Indexing to Provide Access by Multiple Keys

32 File StructureSNU-OOPSLA Lab.32 Index by Multiple Keys(2) u Most people don’t want to search only by primary key u Secondary Key u can be duplicated u Figure --> u Secondary Key Index u secondary key --> consult one additional index (primary key index) BEETHOVEN DG18807 7.6 Indexing to Provide Access by Multiple Keys

33 File StructureSNU-OOPSLA Lab.33 Secondary Index:Basic Operations(1) Secondary Index:Basic Operations(1) u Record Addition u similar to the case of adding to primary index u secondary index is stored in canonical form u fixed length (so it can be truncated) u original name can be obtained from the data file u can contain duplicate keys u local ordering in the same key group 7.6 Indexing to Provide Access by Multiple Keys

34 File StructureSNU-OOPSLA Lab.34 Secondary Index:Basic Operations (2) Secondary Index:Basic Operations (2) u Record Deletion (2 cases) ¶ Secondary index references directly record u delete both primary index and secondary index u rearrange both indexes · Secondary index references primary key u delete only primary index u leave intact the reference to the deleted record u advantage : fast u disadvantage : deleted records take up space 7.6 Indexing to Provide Access by Multiple Keys

35 File StructureSNU-OOPSLA Lab.35 Secondary Index: Basic Operations (3) Secondary Index: Basic Operations (3) u Record Updating u primary key index serves as a kind of protective buffer ¶ Secondary index references directly record u update all files containing record’s location · Secondary index references primary key (1) u affect secondary index only when either primary or secondary key is changed Continued. 7.6 Indexing to Provide Access by Multiple Keys

36 File StructureSNU-OOPSLA Lab.36 Secondary Index: Basic Operations (4) Secondary Index: Basic Operations (4) · Secondary index references primary key(2) À when changes the secondary key u rearrange the secondary key index Á when changes the primary key u update all reference field u may require reordering the secondary index  when confined to other fields u do not affect the secondary key index 7.6 Indexing to Provide Access by Multiple Keys

37 File StructureSNU-OOPSLA Lab.37 Retrieval of Records u Types u primary key access u secondary key access u combination of above u Combination of keys u using secondary key index, it is easy u boolean operation (AND, OR) 7.7 Retrieval Using Combinations of Secondary Keys

38 File StructureSNU-OOPSLA Lab.38 Inverted Lists(1) u Inverted List u a secondary key leads to a set of one or more primary keys u Disadvantages of 2nd-ary index structure u rearrange when adding u repeated entry when duplicating u Solution A: by an array of references u Solution B: by linking the list of references 7.8 Improving the Secondary Index Structure

39 File StructureSNU-OOPSLA Lab.39 Array of References BEETHOVEN ANG3795 DG139201 DG18807 RCA2626 COREA WAR23699 DVORAK COL31809 PROKOFIEV LON2312 RIMSKY-KORSAKOV MER75016 SPRINGSTEEN COL38358 SWEET HONEY IN THE R FF245 Secondary key Set of primary key references Revised composer index 7.8 Improving the Secondary Index Structure * no need to rearrange * limited reference array * internal fragmentation

40 File StructureSNU-OOPSLA Lab.40 Inverted Lists (2) u Guidelines for better solution u no reorganization when adding u no limitation for duplicate key u no internal fragmentation u Solution B: by Linking the list of references u A list of primary key references u secondary key field, relative record number of the first corresponding primary key reference 7.8 Improving the Secondary Index Structure PROKOFIEV ANG36193 LON2312

41 File StructureSNU-OOPSLA Lab.41 Linking List of References (1) BEETHOVEN COREA PROKOFIEV RIMSKY-KORSAKOV SPINGSTEEN SWEET HONEY IN THE R DVORAK 3 2 7 10 6 4 9 LON2312 RCA2626 ANG23699 COL38358 DG18807 MER75016 COL31809 DG139201 ANG36193 WAR23699 8 1 5 0 0 1 2 3 4 5 6 7 8 9 FF245 Secondary Index file Label ID List file Improved revision of the composer index 0 1 2 3 4 5 6 10 7.8 Improving the Secondary Index Structure

42 File StructureSNU-OOPSLA Lab.42 Linking List of References (2) u The primary key references in a separate, entry- sequenced file u Advantages u rearranges only when secondary key changes u rearrangement is quick u less penalty associated with keeping the secondary index file on secondary storage (less need for sorting) u Label ID List file not need to be sorted u reusing the space of deleted record is easy 7.8 Improving the Secondary Index Structure

43 File StructureSNU-OOPSLA Lab.43 Linking List of References (3) u Disadvantage u same secondary key references may not be physically grouped u lack of locality u could involve a large amount of seeking u solution: reside in memory u same Label ID list can hold the lists of a number of secondary index files u if too large in memory, can load only a part of it 7.8 Improving the Secondary Index Structure

44 File StructureSNU-OOPSLA Lab.44 Selective Indexes u Selective Index: Index on a subset of records u Selective index contains only some part of entire index u provide a selective view u useful when contents of a file fall into several categories u e.g. 20 < Age < 30 and $1000 < Salary 7.9 Selective Indexes

45 File StructureSNU-OOPSLA Lab.45 Index Binding(1) u When to bind the key indexes to the physical address of its associated record? ¶ File construction time binding (Tight, in-the-data binding) u tight binding & faster access u the case of primary key u when secondary key is bound to that time u simpler and faster retrieval u reorganization of the data file results in modifications of all bound index files 7.10 Binding

46 File StructureSNU-OOPSLA Lab.46 Index Binding (2) · Postpone binding until a record is actually retrieved (Retrieval-time binding) u minimal reorganization & safe approach u mostly for secondary key u Tight, in-the-data binding is good when u static, little or no changes u rapid performance during retrieval u mass-produced, read-only optical disk 7.10 Binding

47 File StructureSNU-OOPSLA Lab.47 Let’s Review (1) 7.1 What is an Index? 7.2 A Simple Index for Entry-Sequenced Files 7.3 Using Template Classes in C++ for Object I/O 7.4 Object-Oriented Support for Indexed, Entry- Sequenced Files of Data Objects 7.5 Indexes That Are Too Large to Hold in Memory

48 File StructureSNU-OOPSLA Lab.48 Let’s Review(2) 7.6 Indexing to Provide Access by Multiple Keys 7.7 Retrieval Using Combinations of Secondary Keys 7.8 Improving the Secondary Index Structure: Inverted Lists 7.9 Selective Indexes 7.10 Binding


Download ppt "File StructureSNU-OOPSLA Lab.1 Chap 7. Indexing 서울대학교 컴퓨터공학과 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures by Folk, Zoellick, and Ricarrdi."

Similar presentations


Ads by Google