File Processing and Data

1 File Processing and Data
Management Concepts Chapter 12

Learning Objective 1 Define the basic terms used in database technology.
Learning Objective 1 Define the basic terms used in database technology.

Field These are used interchangeably to denote the smallest block of data that will be stored and retrieved. Data items Attribute Elements

A field may be a single character or number, or it may be composed of many characters or numbers. Customer name Employee social security number Purchase order number Customer account number

Logical grouping of fields are called records. An employee A customer A vendor An invoice

A record occurrence is a specific set of data values for the record.

we might have the occurrence EMPLOYEE (Brown, , 33)

In a fixed-length record, both the number of fields and the length (character size) of each field are fixed. In variable-length records, the width of the field can be adjusted to each data occurrence. A trailer record is an extension of a master record.

PART_NO PNAME TYPE COST PVEND – the name of the vendor or supplier WARHSE – where the part is stored LOC – the last two digits of the zip code

10 One Storage Location Example

that repeat themselves in variable-length records. Segments Groups Nodes PART, SUPPLIER, and LOCATION can be written as follows: PART (PART_NO, PNAME, TYPE, COST)

12 Tree Diagram for PART, SUPPLIER and LOCATION

A key or record key is a data item or combination of data items that uniquely identifies a particular record in a file. Primary sort key Secondary sort key Tertiary sort keys Relative random order

Learning Objective 2 Identify the three levels of database architecture.
Learning Objective 2 Identify the three levels of database architecture.

Database contents Uses of database Desired reports Information to be reviewed Conceptual level

Logical data structures: Tree (hierarchical) Network Relational Logical level

Access methods: Sequential Indexed-sequential Direct Physical level

The Entity-Relationship (E-R) data model is a conceptual model for depicting the relationships between segments in a database. "Entity" instead of segment Attribute refers to individual fields or data items.

19 Conceptual Architecture
The object-oriented modeling technique (OMT) views the components of the system being modeled as object classes. Object class corresponds to a segment. Object corresponds to a particular instance. Inheritance

20 Example of Object-Oriented Data Modeling Technique

Learning Objective 3 Compare and contrast the different logical models of databases.

The relationships that exist between the segments in the database are determined by the logical data structure, also called the schema or database model.

What are the three major models of logical data structure? 1. Tree or hierarchical structures 2. Network structures 3. Relational models

Tree (hierarchical) model (4 levels and 13 nodes) A B C D E F G H I J K L M

Network model (3 levels and 11 nodes) A B C D E F G H I J K

Both trees and networks are implemented with imbedded pointer fields.

27 Implementing Tree and Network Structures
In a list organization, each record contains one or more pointers (fields) indicating the address of the next logical record with the same attribute(s). A ring structure differs from a list in that the last record in the ring list points back to the first record.

28 Implementing Tree and Network Structures
What is a multiple ring structure? In this type of structure several rings pass through individual records.

29 List Structure Location of first record Attribute Records Index
1 2 3 4 5 Red Blue Index Pointer field to next record End of list indicator

30 Ring Structure Location of first record Attribute Index Pointer field
2 Va 3 Ky Pointer field to next record Pointer field to first record Records 1 2 4 3 4 5 5 2

What is the relational model? It is a logical data structure that views the database as a collection of two-dimensional tables. There are no complicated pointers or lists.

Relational algebra Normal forms Normalization

What are the three normal forms? First normal form Second normal form Third normal form

Learning Objective 4 Explain the different methods of accessing files.

Sequentially accessed files Indexed files Directly accessed files

In a sequential access file, records can only be accessed in their predefined sequence. Sequential file organization is useful when batch processing is required.

been extracted from the records and used to build a new file whose purpose is to provide an index to the original file. One important type of indexed file is an indexed-sequential file.

that is stored on a DASD and is both indexed and physically sorted on the same field. These files are frequently referred to as ISAM files.

of three distinct areas: The index The prime area The overflow area How would a computer locate a file record whose key is 1002?

40 Structure of an ISAM File
Highest key Track index address Master Index 1500 0300 Track address Track Index Track address 1005 0301 0300 Highest key on track Prime Area Key Data Track address 1002 Record found 0301

Direct-access files allow individual records to be almost instantly retrieved without the use of an index. Each record is assigned to a storage location that bears some relationship to the record’s key values. Most direct-access file systems convert a key to a storage location address.

Processing logic flowchart: Data records Randomizing computation (÷ 7) Add remainder to displacement address (10) File storage area

File loading illustration: Remainder after division by seven Displacement factor (initial address of file area) Key Record storage address 1 3 4 10 15 17 11 22 2 13 14 + = Overflow

Storage area contents after loading: Range of randomizing computation Storage allocated for overflow records Record 1 KEY 15* Record 2 KEY 17 Record 3 KEY 11 Record 4 KEY 22 Contents Address 10 11 12 13 14 15 16 17 18 Overflow indicator

The basic economics of file processing are largely determined by the activity ratio. What is the activity ratio? It is the number of accessed records divided by the number of records in the file. The second economic consideration concerns response time.

What is response time? It is the length of time the user must wait for the system to complete an operation. Response time is affected by the physical access time. Another factor that can affect response time is how data records are physically distributed on the disk.

Explain the benefits of database management systems. Describe the considerations that are appropriate to the design of computer-based files and databases.

Database Management Systems (DBMS) are computer programs that enable a user to create and update files, to select and retrieve data, and to generate various outputs and reports. All DBMS contain three common attributes for managing and organizing data.

What are these attributes? Data description language (DDL) Data manipulation language (DML) Data query language (DQL)

DBMS integrate, standardize, and provide security for various accounting applications. In the absence of integration, each type of accounting application such as sales, payroll, and receivables will maintain separate, independent data files and computer programs.

Independent files: Application One X Y B A Application Two X Y C D

X Y A B C D X Y A B Database dictionary and access codes Database system Logical file 1 Application one Data manipulation routines D A Y X X Y C D Logical file 3 Security screened inquiry file Logical file 2 Application two

Database dictionaries are used both alone and with DBMS to centralize, document, control, and coordinate the use of data within an organization. The data dictionary is simply another file, sort of file of files, whose record occurrences consist of data item descriptions.

Items in a data dictionary occurrence: Specifications Name Definition Aliases Characteristics Size Range of values Encoding Editing data Utilization Owner Where used Security code Last update

55 End of Chapter 12

