Presentation is loading. Please wait.

Presentation is loading. Please wait.

Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files.

Similar presentations


Presentation on theme: "Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files."— Presentation transcript:

1 Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

2 1-2 - Prof Yousef B. Mahdy- 10/3/2015 File Organization Data Representation in Memory n Persistent: Retained after execution of the program which created it. »When we build file structures, we are making it possible to make data persistent. That is, one program can store data from memory to a file, and terminate. Later, another program can retrieve the data from the file, and process it in memory. n The basic logical unit of data is the field which contains a single data value. n Fields are organized into aggregates, either as many copies of a single field (an array) or as a list of different fields (a record). n When a record is stored in memory, we refer to it as an object and refer to its fields as members.

3 1-3 - Prof Yousef B. Mahdy- 10/3/2015 File Organization 1 n When a record is stored in file, we call it simply a record. n In this chapter, we look at file structures which can be used to organize the data within the file, and at the algorithms which can be used to store and retrieve the data sequentially.

4 1-4 - Prof Yousef B. Mahdy- 10/3/2015 File Organization 2 n Record: A subdivision of a file, containing data related to a single entity. n field : A subdivision of a record containing a single attribute of the entity which the record describes. n stream of bytes: A file which is regarded as being without structure beyond separation into a sequential set of bytes. n Key: a subset of the fields in a record used to identify (uniquely) the record. n Within a program, data is temporarily stored in variables. Individual values can be aggregated into structures, which can be treated as a single variable with parts. In C++, classes are typically used as an aggregate structure.

5 1-5 - Prof Yousef B. Mahdy- 10/3/2015 File Organization 3 n C++ Person class class Person { public: char FirstName [11]; char LastName[11]; char Address [21]; char City [21]; char State [3]; char ZIP [5];}; n With this class declaration, variables can be declared to be of type Person. The individual fields within a Person can be referred to as the name of the variable and the name of the field, separated by a period (.).

6 1-6 - Prof Yousef B. Mahdy- 10/3/2015 File Organization 4 n In memory, each Person will appear as an aggregate, with the individual values being parts of the aggregate: n The output of this program will be: n Obviously, this output could be improved. It is marginally readable by people, and it would be difficult to program a computer to read and correctly interpret this output.

7 1-7 - Prof Yousef B. Mahdy- 10/3/2015 File Organization 5 n In Stream Files, the information is written as a stream of bytes containing no added information: n Problem: There is no way to get the information back in the organized record format. n The question: when we write records, how do we organize the fields in the records: » so that the information can be recovered » so that we save space

8 1-8 - Prof Yousef B. Mahdy- 10/3/2015 File Organization 6 » so that we can process efficiently » to maximize record structure flexibility n We must add structure to the file to maintain the identity of fields. n Simple representation: a file organized as a stream of bytes. Simple, but Reverse Humpty-Dumpty problem »In case of putting all information as a byte of stream, there is no way to get it apart »Solution : Use field structure

9 1-9 - Prof Yousef B. Mahdy- 10/3/2015 File Organization Data Management in Files n The topic of the structure has two main interests: First, to devise better ways to organize data in files and second, to design methods to access data from files. This part deals with the first question, all subsequent parts deals with the second. The method employed to organize data in the file has impact on methods designed to access them.

10 1-10 - Prof Yousef B. Mahdy- 10/3/2015 File Organization Organizational Hierarchy of Data in Files n The file is a collection of Records. A record contains related data which are called fields. Therefore, to organize data in files, fields and records have to be organized. n Key = a subset of the fields in a record used to identify (uniquely, usually) the record. n Consider storing the student records in a file which consists of fields like University Seat Number (USN), Name, Branch and Semester. The different methods to organize this file is discussed in the following part.

11 1-11 - Prof Yousef B. Mahdy- 10/3/2015 File Organization Field Structures n The first method to organize fields is by limiting the maximum size of each field (Fig. a). This is called as Fixed Length Fields. The advantages in this method is that since the size of each field is fixed, the entire field can be read at once. n But the disadvantage is that enormous space is wasted if the value saved in the field is not using the amount of memory reserved for it. n This is a good method for organizing fields if their size is know well in advance. n The main problem to be a while organizing fields is with respect to the usage of the particular field by the application in hand and to distinguish between one field from the next one.

12 1-12 - Prof Yousef B. Mahdy- 10/3/2015 File Organization 1

13 1-13 - Prof Yousef B. Mahdy- 10/3/2015 File Organization 2 n To overcome the problem of differentiating between two field, many methods will be used. n In such one method known as Length Indicator Fields, the length of each field is specified as a prefix to actual data, Fig. b. n The other method to distinguish between two fields is using a separator between them. In the earlier method the size of each field acts as separator. Any special character which is not part of the actual data can be used as separator (F.fig. C). This method is known as Delimited Fields. n In the Self-Describing Fields method as shown in Fig. 1d, every field is proceeded by meta Data describing the data that follows it.

14 1-14 - Prof Yousef B. Mahdy- 10/3/2015 File Organization 3

15 1-15 - Prof Yousef B. Mahdy- 10/3/2015 File Organization Representing Record or Field Length n Record or field length can be represented in either binary or character form. The length can be considered as another hidden field within the record. n This length field can be either fixed length or delimited. n When character form is used, a space can be used to delimit the length field. n A two byte fixed length field could be used to hold lengths of 0 to 65535 bytes in binary form. n A two byte fixed length field could be used to hold lengths of 0 to 99 bytes in decimal character form. n A variable length field delimited by a space could be used to hold effectively any length.

16 1-16 - Prof Yousef B. Mahdy- 10/3/2015 File Organization 4

17 1-17 - Prof Yousef B. Mahdy- 10/3/2015 File Organization 5

18 1-18 - Prof Yousef B. Mahdy- 10/3/2015 File Organization 6 n This helps in understanding the meaning of data. In the previous methods, all the fields had to be stored in a particular order. But in this method, the fields can be organized in any order. The application program is able to understand the data because the actual data is proceeded by meta data. n Having seen different methods of organizing fields, the focus now shifts to organizing records. Since records are structurally not different from fields, most of the methods used for organizing fields can be used.

19 1-19 - Prof Yousef B. Mahdy- 10/3/2015 File Organization 7

20 1-20 - Prof Yousef B. Mahdy- 10/3/2015 File Organization Comparison

21 1-21 - Prof Yousef B. Mahdy- 10/3/2015 File Organization Record Structures n The first method is the fixed-length Record structure, where each record is stored in the fixed size(Fig.a). The size can be determined by adding the maximum space occupied by each field and some space reserved for the header data. Though the size of the entire record is fixed, the fields inside the record can be of varying size or fixed size. n The second method is simple variant of the above method. If in the earlier method, the length of the records is fixed, here the number of fields in each record is fixed (Fig.b). This is called fixed Field count structure is helpful since it combines the flexibility of having any type of field structure combined with the capability of reading record data since the fields in each record is known aprior.

22 1-22 - Prof Yousef B. Mahdy- 10/3/2015 File Organization 1

23 1-23 - Prof Yousef B. Mahdy- 10/3/2015 File Organization 2 n The third method is the familiar technique of specifying length of each record (Fig. C). The length indicator record has the same advantages as discussed earlier for the similar field structure. n Delimited Record structure uses separator between two records (d). n The final organization is creating an Index Structure for records(Fig.E). An index is a collection of key field and reference field. The key field is a member of record which can uniquely identify the record and reference field contains the value that points to the address of corresponding record in the file.

24 1-24 - Prof Yousef B. Mahdy- 10/3/2015 File Organization 3

25 1-25 - Prof Yousef B. Mahdy- 10/3/2015 File Organization fixed length record n A record which is predetermined to be the same length as the other records in the file.

26 1-26 - Prof Yousef B. Mahdy- 10/3/2015 File Organization Examples:

27 1-27 - Prof Yousef B. Mahdy- 10/3/2015 File Organization 1 n Advantage: the offset of each record can be calculated from its record number. This makes direct access possible. n Advantage: there is no space overhead. Disadvantage: there will probably be internal fragmentation (unusable space within records.)

28 1-28 - Prof Yousef B. Mahdy- 10/3/2015 File Organization Algorithms for Fixed Length Records n Reading: »while the number of characters read is less than the record length Read a character into the next element of the array. n Writing: »while the number of characters written is less than the record length Write a character from the next element of the array.

29 1-29 - Prof Yousef B. Mahdy- 10/3/2015 File Organization Delimited Variable Length Records n variable length record: »A record which can differ in length from the other records of the file. n delimited record: »A variable length record which is terminated by a special character or sequence of characters. n Delimiter: »A special character or group of characters stored after a field or record, which indicates the end of the preceding unit.

30 1-30 - Prof Yousef B. Mahdy- 10/3/2015 File Organization 1 n Disadvantage: the offset of each record cannot be calculated from its record number. This makes direct access impossible. Advantage: there is space overhead for the length prefix. Advantage: there will probably be no internal fragmentation (unusable space within records.)

31 1-31 - Prof Yousef B. Mahdy- 10/3/2015 File Organization Algorithms for Delimited Variable Length Records n Reading: »While the last characters read is not the delimiter Read a character into the next element of the array. n Writing: »While the number of characters written is less than the record length Write a character from the next element of the array Write the delimiter

32 1-32 - Prof Yousef B. Mahdy- 10/3/2015 File Organization Length Prefixed Variable Length Records n Disadvantage: the offset of each record can be calculated from its record number. This makes direct access possible. Disadvantage: there is space overhead for the delimiter suffix. n Advantage: there will probably be no internal fragmentation (unusable space within records.)

33 1-33 - Prof Yousef B. Mahdy- 10/3/2015 File Organization Algorithms for Prefixed Variable Length Records

34 1-34 - Prof Yousef B. Mahdy- 10/3/2015 File Organization Indexed Variable Length Records n An auxiliary file can be used to point to the beginning of each record. In this case, the data records can be contiguous. If the records are contiguous, the only access is through the index file. n Advantage: the offset of each record is be contained in the index, and can be looked up from its record number. This makes direct access possible.

35 1-35 - Prof Yousef B. Mahdy- 10/3/2015 File Organization 1 n Disadvantage: there is space overhead for the index file. n Disadvantage: there is time overhead for the index file. n Advantage: there will probably be no internal fragmentation (unusable space within records.) The time overhead for accessing the index file can be minimized by reading the entire index file into memory when the files are opened.

36 1-36 - Prof Yousef B. Mahdy- 10/3/2015 File Organization Packing and Buffering

37 1-37 - Prof Yousef B. Mahdy- 10/3/2015 File Organization 1

38 1-38 - Prof Yousef B. Mahdy- 10/3/2015 File Organization 2

39 1-39 - Prof Yousef B. Mahdy- 10/3/2015 File Organization 1

40 1-40 - Prof Yousef B. Mahdy- 10/3/2015 File Organization 2

41 1-41 - Prof Yousef B. Mahdy- 10/3/2015 File Organization 3

42 1-42 - Prof Yousef B. Mahdy- 10/3/2015 File Organization strcat n The C++ strcat function is short for "string concatenate." Strcat appends a copy of a source string to a destination string. The null terminator character in the destination will be overwritten by the first character of the source and another null character will be appended to the end of the resulting new string.

43 1-43 - Prof Yousef B. Mahdy- 10/3/2015 File Organization Example char string1[80]; char string2[80]; char string3[80]; strcpy (string1,"This string "); strcpy (string2,"Rose"); strcpy (string3,"s smell like old shoes."); strcat (string1,"is concatenated."); strcat (string2,string3);

44 1-44 - Prof Yousef B. Mahdy- 10/3/2015 File Organization Direct access using RRN n A relative file, or relative record file has an organization where each record in the file can be accessed by specifying the record’s position relative to the position of the first record of the file. This is an exact analogue of an array where the element in the array is accessed by specifying a “subscript” value or position of the element relative to the position of the first element of the array. n In a relative file access operation, the position value of a record is called the record's relative record number (RRN) or just its record number.

45 1-45 - Prof Yousef B. Mahdy- 10/3/2015 File Organization 1 n The record's relative position is based on these parameters: »the byte address of the first byte of the first record of the file »the size of the records in the file (recsize) »the RRN value of the target record »the RRN of the first record

46 1-46 - Prof Yousef B. Mahdy- 10/3/2015 File Organization 2

47 1-47 - Prof Yousef B. Mahdy- 10/3/2015 File Organization 3

48 1-48 - Prof Yousef B. Mahdy- 10/3/2015 File Organization 4

49 1-49 - Prof Yousef B. Mahdy- 10/3/2015 File Organization

50 1-50 - Prof Yousef B. Mahdy- 10/3/2015 File Organization Example-2 n Write a C++ program to read and write student objects with fixed-length records and the fields delimited by "|". n Solution: n Fixed length recodes with variable field lengths, so the delimiter “|” is used.

51 1-51 - Prof Yousef B. Mahdy- 10/3/2015 File Organization 1

52 1-52 - Prof Yousef B. Mahdy- 10/3/2015 File Organization 2

53 1-53 - Prof Yousef B. Mahdy- 10/3/2015 File Organization 3

54 1-54 - Prof Yousef B. Mahdy- 10/3/2015 File Organization 4

55 1-55 - Prof Yousef B. Mahdy- 10/3/2015 File Organization 5

56 1-56 - Prof Yousef B. Mahdy- 10/3/2015 File Organization 6

57 1-57 - Prof Yousef B. Mahdy- 10/3/2015 File Organization

58 1-58 - Prof Yousef B. Mahdy- 10/3/2015 File Organization

59 1-59 - Prof Yousef B. Mahdy- 10/3/2015 File Organization

60 1-60 - Prof Yousef B. Mahdy- 10/3/2015 File Organization

61 1-61 - Prof Yousef B. Mahdy- 10/3/2015 File Organization


Download ppt "Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files."

Similar presentations


Ads by Google