The chapter will address the following questions:

The chapter will address the following questions:
Introduction The chapter will address the following questions: What are the similarities and differences between conventional files and modern, relational databases? What are of fields, records, files, and databases? What are some examples of each? What is a modern data architecture that includes files, operational databases, data warehouses, personal databases, and work group databases? What are the similarities and differences between the roles of systems analyst, data administrator, and database administrators as they relate to databases? What is the architecture of a database management system? 392 Data storage is a critical component of most information systems – Some people consider it to be the critical component. This chapter teaches the design and construction of physical databases.

The chapter will address the following questions:
Introduction The chapter will address the following questions: How does a relational database implement entities, attributes, and relationships from a logical data model? How do you normalize a logical data model to remove impurities that can make a database unstable, inflexible, and non-scaleable? How do you transform a logical data model into a physical, relational database schema? How do you generate SQL code to create the database structures in a schema? 392 No additional notes provided.

Conventional Files Versus the Database
Introduction All information systems create, read, update and delete data. This data is stored in files and databases. Files are collections of similar records. Databases are collections of interrelated files. The key word is interrelated. The records in each file must allow for relationships (think of them as ‘pointers’) to the records in other files. In the file environment, data storage is built around the applications that will use the files. In the database environment, applications will be built around the integrated database. 395 A database is not merely a collection of files. The records in each file must allow for relationships (think of them as ‘pointers’) to the records in other files. For example, a SALES database might contain ORDER records that are somehow “linked’’ to their corresponding CUSTOMER and PRODUCT records. The database is not necessarily dependent on the applications that will use it. In other words, given a database, new applications can be built to share that database. Each environment has its advantages and disadvantages.

395-396 Figure 11.1 Conventional Files versus the Database
No additional notes provided.

The Pros and Cons of Conventional Files Pros: Conventional files are relatively easy to design and implement because they are normally based on a single application or information system. Historically, another advantage of conventional files has been processing speed. Cons: Duplication of data items in multiple files is normally cited as the principal disadvantage of file-based systems. A significant disadvantage of files is their inflexibility and non-scaleability. In most organizations, many or most existing information systems and applications are built around conventional files. You may already be familiar with various conventional file organizations (e.g., indexed, hashed, relative, and sequential) and their access methods (e.g., sequential and direct) from a COBOL course. These conventional files will likely be in service for quite some time. Conventional files are relatively easy to design and implement because if you understand the end-user’s output needs for that system, you can easily determine the data that will have to be captured and stored to fulfill those needs and define the best file organization for those requirements. Conventional files can be optimized for the access of a single application. At the same time, they can rarely be optimized for shared use by different tasks in an application, or different applications. Still, files have generally outperformed their database counterparts; however, this limitation of database technology is rapidly disappearing thanks to cheaper and more powerful computers, and more efficient database technology. Conventional files also have numerous disadvantages. Files tend to be built around single applications without regard to other (future) applications. Over time, because many applications have common data needs, the common data elements get stored redundantly in many different systems and files. This duplicate data results in duplicate input, duplicate maintenance, duplicate storage, and possibly data integrity problems (different files showing different values for the same data item). And what happens if the data format needs to change? Consider the problem faced by many firms if all systems must support a nine-digit zip code, or four digit years (to accommodate the year 2000). Do you have any idea how many redundant files would have to be located and changed in a typical organization? Add to this the enormous volume of programs that use these zip code and date fields, and you have some sense of the nightmare that a file structure change can become.

The Pros and Cons of Conventional Files As legacy file-based systems and applications become candidates for reengineering, the trend is overwhelmingly in favor of replacing file-based systems and applications with database systems and applications. 396 Because files are typically designed to support a single application’s current requirements and programs, future needs – such as new reports and queries – often require files to be restructured because the original file structure cannot support the new requirements. But if we elect to restructure those files, all programs using those files would also have to be rewritten. In other words, the current programs have become dependent on the files, and vice versa. This usually makes reorganization impractical; therefore, we elect to create new, redundant files to meet the new requirements. But that exasperates the aforementioned redundancy problem. Thus, the inflexibility and redundancy problems tend to escalate one another!

The Pros and Cons of Database Pros: The principal advantage of a database is the ability to share the same data across multiple applications and systems. Database technology offers the advantage of storing data in flexible formats. Databases allow the use of the data in ways not originally specified by the end-users - data independence. The database scope can even be extended without impacting existing programs that use it. New fields and record types can be added to the database without affecting current programs. 397 A common misconception about the database approach is that you can build a single, super-database that contains all data items of interest to an organization. This notion, however desirable, is not currently practical. The reality of such a solution is that it would take forever to build such a complex database. Realistically, most organizations build several databases, each one sharing data with several information systems. Thus, there will be some redundancy between databases. However, this redundancy is both greatly reduced and, ultimately, controlled. Database technology offers the advantage of storing data in flexible formats. This is made possible because databases are defined separately from the information systems and application programs that will use them. Theoretically, this allows us to use the data in ways not originally specified by the end-users. Care must be taken to truly achieve this data independence. If the database is well designed, different combinations of the same data can be easily accessed to fulfill future report and query needs. The database scope can even be extended without impacting existing programs that use it. In other words, new fields and record types can be added to the database without affecting current programs.

The Pros and Cons of Database Cons: Database technology is more complex than file technology. Special software, called a database management system (DBMS), is required. A DBMS is still somewhat slower than file technology. Database technology requires a significant investment. The cost of developing databases is higher because analysts and programmers must learn how to use the DBMS. In order to achieve the benefits of database technology, analysts and database specialists must adhere to rigorous design principles. Another potential problem with the database approach is the increased vulnerability inherent in the use of shared data. 397 Another potential problem with the database approach is the increased vulnerability inherent in the use of shared data. You are literally placing all your eggs in one basket. Therefore, backup and recovery, and security and privacy become important issues in the world of databases. Despite the problems discussed, database usage is growing by leaps and bounds. The technology will continue to improve, and performance limitations will all but disappear. Design methods and tools will also improve. For these reasons, this chapter will focus on database design as an important skill for tomorrow’s system analysts.

Database Design in Perspective To fully exploit the advantages of database technology, a database must be carefully designed. The end product is called a database schema, a technical blueprint of the database. Database design translates the data models that were developed for the system users during the definition phase, into data structures supported by the chosen database technology. Subsequent to database design, system builders will construct those data structures using the language and tools of the chosen database technology. 397 No additional notes provided.

Figure 11.2 Database Design in the Information Systems Framework No additional notes provided.

Database Concepts for the Systems Analyst
Fields Fields are common to both files and databases. A field is the implementation of a data attribute. Fields are the smallest unit of meaningful data to be stored in a file or database. There are four types of fields that can be stored: primary keys, secondary keys, foreign keys, and descriptive fields. Primary keys are fields whose values identify one and only one record in a file. Secondary keys are alternate identifiers for an database. A single file in a database may only have one primary key, but it may have several secondary keys. Primary keys - For example, CUSTOMER NUMBER uniquely identifies a single CUSTOMER record in a database, and ORDER NUMBER uniquely identifies a single ORDER record in a database. A secondary key’s value may identify either a single record (as with a primary key); or identify a subset of all records (such as all ORDERS that have the ORDER STATUS of ‘backordered).

Fields There are four types of fields that can be stored: primary keys, secondary keys, foreign keys, and descriptive fields. (continued) Foreign keys are pointers to the records of a different file in a database. Foreign keys are how the database ‘links’ the records of one type to those of another type. Descriptive fields are any other fields that store business data. 399 Foreign keys - For example, an ORDER record contains the foreign key CUSTOMER NUMBER to “identify’’ or “point to” the CUSTOMER record that is associated with the ORDER. Notice that a foreign key in one file requires the existence of the corresponding primary key in another table – otherwise, it does not “point” to anything! Thus, the CUSTOMER NUMBER in an ORDERS file requires the existence of a CUSTOMER NUMBER in the CUSTOMERS file in order to link those files. Descriptive fields - For example, given an EMPLOYEES file, some descriptive fields include EMPLOYEE NAME, DATE HIRED, PAY RATE, and YEAR-TO-DATE WAGES.

Records Fields are organized into records. Like fields, records are common to both files and databases. A record is a collection of fields arranged in a predefined format. During systems design, records will be classified as either fixed-length or variable-length records. Most database systems impose a fixed-length record structure, meaning that each record instance has the same fields, same number of fields, and same logical size. Variable-length record structures allow different records in the same file to have different lengths. Database systems typically disallow (or, at least, discourage) variable length records. For example, a CUSTOMER record may be described by the following fields (notice the common notation): CUSTOMER (NUMBER, LAST_NAME, FIRST_NAME, MIDDLE_INITIAL, POST_OFFICE BOX_NUMBER, STREET_ADDRESS, CITY, STATE, COUNTRY, POSTAL_CODE, DATE CREATED, DATE_OF_LAST_ORDER, CREDIT_RATING, CREDIT_LIMIT, BALANCE, BALANCE_PAST_DUE ...) In a fixed-length record structure, some database systems will, however, compress unused fields and values to conserve disk storage space. The database designer must generally understand and specify this compression in the database design. Variable-length record structures - For example, a variable-length ORDER record might contain certain common fields that occur once for every order (e.g., ORDER NUMBER, ORDER DATE, and CUSTOMER NUMBER), but other fields that repeat some number of times based on order size (e.g., PRODUCT NUMBER and QUANTITY ORDERED – which depend on the number of items ordered).

Records When a computer program ‘reads’ a record from a database, it actually retrieves a group or block of records at a time. This approach minimizes the number of actual disk accesses. A blocking factor is the number of logical records included in a single read or write operation (from the computer’s perspective). A block is sometimes called a physical record. Today, the blocking factor is usually determined and optimized by the chosen database technology, but a qualified database expert may be allowed to fine tune that blocking factor for performance. 400 No additional notes provided.

Files and Tables Similar records are organized into groups called files. A file is the set of all occurrences of a given record structure. In database systems, a file corresponds to a set of similar records; usually called a table. A table is the relational database equivalent of a file. Some of the types of files and tables include: Master files or tables contain records that are relatively permanent. Once a record has been added to a master file, it remains in the system indefinitely. The values of fields for the record will change over its lifetime, but the individual records are retained indefinitely. 400 Examples of master files and tables include CUSTOMERS, PRODUCTS, and SUPPLIERS.

Files and Tables Some of the types of files and tables include: (continued) Transaction files or tables contain records that describe business events. The data describing these events normally has a limited useful lifetime. In information systems, transaction records are frequently retained on-line for some period of time. Subsequent to their useful lifetime, they are archived off-line. Document files and tables contain stored copies of historical data for easy retrieval and review without the overhead of re-generating the document. 400 Transaction files - For instance, an INVOICE record is ordinarily useful until the invoice has been paid or written off as uncollectable. Examples of transaction files include ORDERS, INVOICES, REQUISITIONS and REGISTRATIONS.

Files and Tables Some of the types of files and tables include: (continued) Archival files and tables contain master and transaction file records that have been deleted from on-line storage. Records are rarely deleted; they are merely moved from on-line storage to off-line storage. Archival requirements are dictated by government regulation and the need for subsequent audit or analysis. Table look-up files contain relatively static data that can be shared by applications to maintain consistency and improve performance. 400 Table look-up files - Examples include sales tax tables, zip code tables, and income tax tables.

Files and Tables Some of the types of files and tables include: (continued) Audit files are special records of updates to other files, especially master and transaction files. They are used in conjunction with archive files to recover ``lost’’ data. Audit trails are typically built into better database technologies. In the not too distant past, file design methods required the analyst to specify precisely how the records in a database should be sequenced (called file organization) and accessed (called file access). In today’s database environment, the database technology itself usually predetermines and/or limits the file organization for all tables that are contained in the database. Once again, a trained database technology expert may be given some control over organization and storage location for performance tuning.

Databases Databases provide for the technical implementation of entities and relationships. The history of information systems has led to one inescapable conclusion: Data is a resource that must be controlled and managed! Out of necessity, database technology was created so an organization could maintain and use its data as an integrated whole instead of as separate data files. 401 As described earlier, stand-alone, application-specific files were once the lifeblood of most information systems; however, they are being slowly but surely replaced with databases. Recall that a database may loosely be thought of as a set of interrelated files. By interrelated, we mean that records in one file may be associated with the records in a different file. For example, a STUDENT record may be linked to all of that student’s COURSE records. In turn, a COURSE record may be linked to the STUDENT records that indicate completion of that course. This two-way linking and flexibility allows us to eliminate most of the need to redundantly store the same fields in the different record types. Thus, in a very real sense, multiple files are consolidated into a single file – the database. So many applications are now being built around database technology that database design has become an important skill for the analyst. Indeed, database technology, once considered important only to the largest corporations with the largest computers, is now common for applications developed on microcomputers and departmental networks. Few, if any information systems staffs have avoided the frustration of uncontrolled growth and duplication of data stored in their systems. As systems were developed, implemented, and maintained, the common data needed by the different systems was duplicated in multiple, conventional files. This duplication carried with it a number of costs: extra storage space required, duplicated input to maintain redundantly stored data and files, and data integrity problems (e.g., the ADDRESS for a specific customer not matching in the various files that contain that customer’s ADDRESS).

Databases Data Architecture: A business’ data architecture is comprised of the files and databases that store all of the organization’s data, the file and database technology used to store the data, and the organization structure set up to manage the data resource. Operational databases have been developed to support day-to-day operations and business transaction processing for major information systems. Data becomes a business resource in a database environment. Information systems are built around this resource to give both computer programmers and end-users flexible access to data. Operational databases - These systems were (and are) developed over time to replace the conventional files that used to support the applications. Access to these databases is limited to computer programs that use the DBMS to process transactions, maintain the data, and generate regularly scheduled management reports. Some query access may also be provided.

Databases Data Architecture: Many information systems shops hesitate to give end-users access to operational databases, because the volume of unscheduled reports and queries could overload the computers and hamper business operations. To remedy that problem, data warehouses were developed. computers. Data warehouses store data that is extracted from the production databases and conventional files. Fourth-generation programming languages, query tools, and decision support tools are then used to generate reports and analyses off these data warehouses. 402 No additional notes provided.

Databases Data Architecture: Personal computer and local network database technology has rapidly matured to allow end-users to develop personal and departmental databases. These databases may contain unique data, or they may import data from conventional files, operational databases, and/or data warehouses. Personal databases are built using PC database technology such as Access, dBASE, Paradox, and FoxPro.

Databases Data Architecture: To manage the enterprise-wide data resource, a staff of database specialists may be organized around the following administrators: A data administrator is responsible for the data planning, definition, architecture, and management. One or more database administrators are responsible for the database technology, database design and construction, security, backup and recovery, and performance tuning. 403 In smaller businesses, these roles may be combined; or perhaps assigned to a systems analyst.

401-402 Figure 11.3 A Typical Modern Data Architecture
The figure above illustrates the data architecture into which many companies have evolved. As shown in the figure, most companies still have numerous conventional file-based information system applications, most of which were developed prior to the emergence of high performance database technology. In many cases, the processing efficiency of these files or the projected cost to redesign these files has slowed conversion of the systems to database.

Databases Database Architecture: Database architecture refers to the database technology including the database engine, database management utilities, database CASE tools for analysis and design, and database application development tools. The control center of a database architecture is its database management system. A database management system (DBMS) is specialized computer software available from computer vendors that is used to create, access, control, and manage the database. The core of the DBMS is often called its database engine. The engine responds to specific commands to create database structures, and then to create, read, update, and delete records in the database. 403 The database management system is purchased from a database technology vendor such as Oracle, IBM, Microsoft, or Sybase.

Databases Database Architecture: A systems analyst, or database analyst, designs the structure of the data in terms of record types, fields contained in those record types, and relationships that exist between record types. These structures are defined to the database management system using its data definition language. Data definition language (or DDL) is used by the DBMS to physically establish those record types, fields, and structural relationships. Additionally, the DDL defines views of the database. Views restrict the portion of a database that may be used or accessed by different users and programs. DDLs record the definitions in a permanent data repository. 403 No additional notes provided.

403-404 Figure 11.4 A Typical Database Architecture

Databases Database Architecture: Some data dictionaries include formal, elaborate software that helps database specialists track metadata – the data about the data –such as record and field definitions, synonyms, data relationships, validation rules, help messages, and so forth. The database management system also provides a data manipulation language to access and use the database in applications. A data manipulation language (or DML) is used to create, read, update, and delete records in the database, and to navigate between different records and types of records. The DBMS and DML hide the details concerning how records are organized and allocated to the disk. The metadata is stored in a data dictionary or repository (which may or may not be provided by the DBMS vendor). To help design databases, CASE tools may be provided either by the database technology vendor (e.g., Oracle) or from a third-party CASE tool vendor (e.g., Popkin, Logic Works, etc.). In general, the DML is very flexible in that it may be used by itself to create, read, update, and delete records; or its commands may be ‘called’ from a separate host programming language such as COBOL, Visual Basic, or Powerbuilder.

Databases Database Architecture: Many DBMSs don’t require the use of a DDL to construct the database, or a DML to access the database. They provide their own tools and commands to perform those tasks. This is especially true of PC-based DBMSs. Many DBMSs also include proprietary report writing and inquiry tools to allow users to access and format data without directly using the DML. Some DBMSs include a transaction processing monitor (or TP monitor) that manages on-line accesses to the database, and ensures that transactions that impact multiple tables are fully processed as a single unit. This especially true of PC-based DBMSs such as Microsoft Access. Access provides a simple graphical user interface to create the tables, and a form-based environment to access, browse, and maintain the tables. Most high-end DBMSs are designed to interact with popular third-party transaction processing monitors such as CICS and Tuxedo.

Databases Relational Database Management Systems: There are several types of database management systems and they can be classified according to the way they structure records. Early database management systems organized records in hierarchies or networks implemented with indexes and linked lists. Relational databases implement data in a series of tables that are ‘related’ to one another via foreign keys. Files are seen as simple two-dimensional tables, also known as relations. The rows are records. The columns correspond to fields. 405 No additional notes provided.

405 Figure 11.5 A Simple, Logical Data Model

405-406 Figure 11.6 A Simple, Physical Database Schema

Databases Relational Database Management Systems: Both the DDL and DML of most relational databases is called SQL (which stands for Structured Query Language). SQL supports not only queries, but complete database creation and maintenance. A fundamental characteristic of relational SQL is that commands return ‘a set’ of records, not necessarily just a single record (as in non-relational database and file technology). 405 To access tables and records, SQL provides the following basic commands: SELECT specific records from a table based on specific criteria (e.g. SELECT CUSTOMER WHERE BALANCE > ) PROJECT out specific fields from a table (e.g. PROJECT CUSTOMER TO INCLUDE ONLY CUSTOMER_NUMBER, CUSTOMER_NAME, BALANCE) JOIN two or more tables across a common field – a primary and foreign key (JOIN CUSTOMER AND ORDER USING CUSTOMER_NUMBER)

Databases Relational Database Management Systems: High-end relational databases also extend the SQL language to support triggers and stored procedures. Triggers are programs embedded within a table that are automatically invoked by a updates to another table. Stored procedures are programs embedded within a table that can be called from an application program. Both triggers and stored procedures are reusable because they are stored with the tables themselves. This eliminates the need for application programmers to create the equivalent logic within each application that use the tables. Triggers - For example, if a record in deleted from an PASSENGER AIRCRAFT table, a trigger can force the automatic deletion of all corresponding records in a SEATS table for that aircraft. Stored procedures - For example, a complex data validation algorithm might be embedded in a table to ensure that new and updated records contain valid data before they are stored. Examples of high performance relational DBMSs include Oracle Corporation’s Oracle, IBM’s Database Manager, Microsoft’s SQL Server (being used in the SoundStage project), and Sybase Corporation’s Sybase. Many of these databases run on mainframes, minicomputers, and network database servers. Additionally, most personal computer DBMSs are relational (or partially so). Examples include Microsoft’s Access and Foxpro, and Borland’s Paradox and dBASE. These systems can run on both stand-alone personal computers and local area network file servers.

Data Analysis for Database Design
What is a Good Data Model? A good data model is simple. As a general rule, the data attributes that describe an entity should describe only that entity. A good data model is essentially non-redundant. This means that each data attribute, other than foreign keys, describes at most one entity. A good data model should be flexible and adaptable to future needs. We should make the data models as application-independent as possible to encourage database structures that can be extended or modified without impact to current programs. While a data model effectively communicates database requirements, it does not necessarily represent a good database design. It may contain structural characteristics that reduce flexibility and expansion, or create unnecessary redundancy. Therefore, we must ‘prepare’ the data model for database design and implementation.

The technique used to improve a data model in preparation for database design is called data analysis. Data analysis is a process that prepares a data model for implementation as a simple, non-redundant, flexible, and adaptable database. The specific technique is called normalization. Normalization is a technique that organizes data attributes such that they are grouped together to form stable, flexible, and adaptive entities. 408 No additional notes provided.

Normalization is a three-step technique that places the data model into first normal form, second normal form, and third normal form. An entity is in first normal form (1NF) if there are no attributes that can have more than one value for a single instance of the entity. An entity is in second normal form (2NF) if it is already in 1NF, and if the values of all non-primary key attributes are dependent on the full primary key – not just part of it. An entity is in third normal form (3NF) if it is already in 2NF, and if the values of its non-primary key attributes are not dependent on any other non-primary key attributes. Any attributes that can have multiple values actually describe a separate entity, possibly an entity (and relationship) that we haven’t yet included in our data model . Any non-key attributes that are dependent on only part of the primary key should be moved to any entity where that partial key becomes the full key. Again, this may require creating a new entity and relationship on the model. Any non-key attributes that are dependent on other non-key attributes must be moved or deleted. Again, new entities and relationships may have to be added to the data model.

Normalization Example First Normal Form: The first step in data analysis is to place each entity into 1NF. 409 First Normal Form - Please refer to the following figures 11-8 through Figures 11.9 through demonstrate how to place these three entities into 1NF. The original entity is depicted on the left side of the page. The 1NF entities are on the right side of the page. Each figure shows how normalization changed the data model and attribute assignments.

409-410 Figure 11.8 An Unnormalized SoundStage Data Model
Referring to the figure above, you should find three entities that are not in 1NF – MEMBER, MEMBER ORDER and CLUB. Each contains a repeating group, that is, a group of attributes that have multiple values for a single instance of the entity (denoted by the brackets). Consider, for example, the entity MEMBER. A single MEMBER can belong to multiple CLUBs and, therefore, have multiple values for CLUB NAME AND AGREEMENT NUMBER – one for each club to which he or she belongs. For a single instance of MEMBER, the number of clubs and agreements may vary. Similarly, a MEMBER ORDER can contain data about more than one ORDERED PRODUCT. And a CLUB can sponsor more than one AGREEMENT. How do we fix these anomalies in our model.

409-412 Figure 11.9 First Normal Form
Let’s examine the MEMBER ORDER entity in the figure above. First, we remove the attributes that can have more than one value for an instance of the entity. That alone places MEMBER ORDER in 1NF. But what do we do with the removed attributes? These attributes repeat many times ‘as a group’. Therefore, we moved the entire group of attributes to a new entity, MEMBER ORDERED PRODUCT. Each instance of these attributes describes one PRODUCT on a single MEMBER ORDER. Thus, if a specific ORDER contains five PRODUCTs, there will be five instances of the new MEMBER ORDERED PRODUCT entity. Each entity instance has only one value for each attribute; therefore, the new entity is also in first normal form. Notice how the primary key of the new entity was created—that is, by combining the primary key of the original entity, ORDER NUMBER, with the implicit key attribute of the group, PRODUCT NUMBER. Thus, we have what was described in Chapter 5 as a concatenated key. Since we know from Chapter 5 that each part of a concatenated key is a foreign key back to another entity, we added relationships (and cardinality) from the new MEMBER ORDERED PRODUCT entity to both the MEMBER and PRODUCT entities.

412 Figure 11.10 First Normal Form
Another example of 1NF is shown in the figure above for the CLUB entity. The attributes that can have many values (commonly called ‘repeating attributes’) are easy to spot. They include attributes such as AGREEMENT ACTIVE DATE and OBLIGATION PERIOD. As before, we created a new entity, AGREEMENT (as named by the users), keyed by the concatenation of CLUB NAME and AGREEMENT NUMBER. We moved the repeating attributes to that new entity. Once again, we also created a relationship between AGREEMENT and CLUB.

412-413 Figure 11.11 First Normal Form
To place the MEMBER entity in 1NF, we removed the repeating attributes. Those attributes seemed dependent on a combination of CLUB NAME and AGREEMENT NUMBER, so we created a new entity called CLUB MEMBERSHIP with that key. The repeating attributes were then moved to that entity. It was then that we noticed that the CLUB MEMBERSHIP entity was, in fact, a ternary associative entity (review Chapter 5). Each part of the concatenated key (MEMBER NUMBER, CLUB NAME, and AGREEMENT NUMBER) was a foreign key back to different entities. Thus, we completed our model by adding relationships (with cardinality) from that associative entity back to the MEMBER, CLUB, and AGREEMENT entities.

Normalization Example Second Normal Form: The next step of data analysis is to place the entities into 2NF. It is assumed that you have already placed all entities into 1NF. 2NF looks for an anomaly called a partial dependency, meaning an attribute(s) whose value is determined by only part of the primary key. Entities that have a single attribute primary key are already in 2NF. Only those entities that have a concatenated key need to be checked. 414 No additional notes provided.

414-415 Figure 11.12 Second Normal Form
First, let’s check the MEMBER ORDERED PRODUCT entity. Most of the attributes are dependent on the full primary key. For example, QUANTITY ORDERED makes no sense unless you have both a ORDER NUMBER and a PRODUCT NUMBER. Think about it! By itself, ORDER NUMBER is inadequate since the order could have as many quantities ordered as there are products on the order. Similarly, by itself, PRODUCT NUMBER is inadequate since the same product could appear on many orders. Thus, QUANTITY ORDERED requires both parts of the key and is fully dependent on the key. The same could be said of QUANTITY SHIPPED and PURCHASE UNIT PRICE. But what about ORDERED PRODUCT DESCRIPTION and ORDERED PRODUCT TITLE? Do we really need ORDER NUMBER to determine a value for either. No! Instead, the values of these attributes are dependent only on the value of PRODUCT NUMBER. Thus, the attributes are not dependent on the full key – we have uncovered a partial dependency error that must be fixed. How do we fix this type of normalization error? To fix the problem, we simply move the non-key attributes, ORDERED PRODUCT DESCRIPTION and ORDERED PRODUCT TITLE, to an entity that only has PRODUCT NUMBER as its key. If necessary, we would have to create this entity, but the PRODUCT entity with that key already exists. But we had to be careful because PRODUCT is a supertype. Upon inspection of the subtypes (see Figure 11.12), we discover that the attributes are already in the MERCHANDISE and TITLE entities, albeit under a synonym. Thus, we didn’t actually have to move the attributes from the MEMBER ORDERED PRODUCT entity – we just deleted them as redundant data.

Normalization Example Third Normal Form: Entities are assumed to be in 2NF before beginning 3NF analysis. Third normal form analysis looks for two types of problems, derived data and transitive dependencies. In both cases, the fundamental error is that non key attributes are dependent on other non key attributes. Derived attributes are those whose values can either be calculated from other attributes, or derived through logic from the values of other attributes. A transitive dependency exists when a non-key attribute is dependent on another non-key attribute (other than by derivation). Transitive analysis is only performed on those entities that do not have a concatenated key. If you think about it, storing a derived attribute makes little sense. First, it wastes disk storage space. Second, it complicates simple updates. Why? Every time you change the base attributes, you must remember to re-perform the calculation and also change its result.

Normalization Example Third Normal Form: Third normal form analysis looks for two types of problems, derived data and transitive dependencies. (continued) A transitive dependency exists when a non-key attribute is dependent on another non-key attribute (other than by derivation). This error usually indicates that an undiscovered entity is still embedded within the problem entity. Transitive analysis is only performed on those entities that do not have a concatenated key. “An entity is said to be in third normal form if every non-primary key attribute is dependent on the primary key, the whole primary key, and nothing but the primary key.” 416 Such a condition, if not corrected, can cause future flexibility and adaptability problems if a new requirement eventually requires us to implement that undiscovered entity as a separate database table. Before we leave the subject of normalization, we should acknowledge that several normal forms beyond 3NF exist. Each successive normal form makes the data model simpler, less redundant, and more flexible. However, systems analysts (and most database experts) rarely take data models beyond 3NF unless absolutely necessary.

416 Figure 11.13 Third Normal Form
For example, look at the MEMBER ORDERED PRODUCT entity in the figure above. The attribute EXTENDED PRICE is calculated by multiplying QUANTITY ORDERED by PURCHASE UNIT PRICE. Thus, EXTENDED PRICE (a non-key attribute) is not dependent on the primary key as much as it is dependent on QUANTITY ORDERED and PURCHASE UNIT PRICE. Thus, we simplify the entity by deleting EXTENDED PRICE. Sounds simple, right? Well, not always! There is disagreement on how far you take this rule. Some experts argue that the rule should be applied only within a single entity. Thus, these experts would not delete a derived attribute if the attributes required for the derivation are assigned to different entities. Other experts argue that the rule should be required regardless of where the base attributes are stored. We tend to agree based on the argument that a derived attribute that involves multiple entities presents a greater danger for data inconsistency caused by updating an attribute in one entity and forgetting to subsequently update the derived attribute in another entity. (The exception to this rule would be those databases that support triggers, described earlier in this chapter) that could automatically update the derived attributes.) Transitive analysis is only performed on those entities that do not have a concatenated key. In our example, this includes PRODUCT, MEMBER ORDER, MEMBER, and CLUB. For the entity PRODUCT, all of the non key attributes are dependent on the primary key, and only the primary key. Thus, PRODUCT is already in third normal form.

416-417 Figure 11.14 Third Normal Form
Transitive analysis is only performed on those entities that do not have a concatenated key. In our example, this includes PRODUCT, MEMBER ORDER, MEMBER, and CLUB. For the entity PRODUCT, all of the non key attributes are dependent on the primary key, and only the primary key. Thus, PRODUCT is already in third normal form. But look at the entity, MEMBER ORDER, in the figure above. In particular, examine the attributes MEMBER NAME and MEMBER ADDRESS. Are these attributes dependent on the primary key, MEMBER ORDER NUMBER? No! The primary key MEMBER ORDER NUMBER in no way determines the value of MEMBER NAME and MEMBER ADDRESS. On the other hand, the values of MEMBER NAME and MEMBER ADDRESS are dependent on the value of another non-primary key in the entity, MEMBER NUMBER. How do we fix this problem? MEMBER NAME and MEMBER ADDRESS need to be moved from the MEMBER ORDER entity to an entity whose key is just MEMBER NUMBER. If necessary, we would create that entity, but in our case we already have an MEMBER entity with the required primary key. And as it turns out, we don’t need to really move the problem attributes since they are already assigned to the MEMBER entity. We did, however, have to notice that MEMBER ADDRESS was a synonym for MEMBER STREET ADDRESS. We elected to keep the latter term in MEMBER.

Normalization Example Simplification by Inspection: When several analysts work on a common application, it is not unusual to create problems that won’t be taken care of by normalization. These problems are best solved through simplification by inspection, a process wherein a data entity in 3NF is further simplified by such efforts as addressing subtle data redundancy. Please refer to figure on page 419 in the textbook. The authors apologize that this figure is not available at this time. Also through inspection, we realized that the CLUB MEMBERSHIP attributes for ‘taste’ and ‘media’ preferences were in fact different depending on which club to which a member belongs. For example, ‘media’ has a different set of possible values based on club. In an AUDIO CLUB, the value set is CASSETTE, COMPACT DISC, MINI-DISC, and DIGITAL VERSATILE DISC. In the VIDEO CLUB, the value set is VHS TAPE, LASER DISC, 8MM TAPE, and DIGITAL VERSATILE DISC. In the GAME CLUB, media values include CD-ROM, DIGITAL VERSATILE DISC, and various CARTRIDGE formats. Thus, what we thought was one attribute, MEDIA PREFERENCE was, in fact, three attributes, AUDIO MEDIA PREFERENCE, VIDEO MEDIA PREFERENCE, and GAME MEDIA PREFERENCE.

Normalization Example CASE Support for Normalization: Most CASE tools can only normalize to first normal form. They accomplish this in one of two ways. They look for many-to-many relationships and resolve those relationships into associative entities. They look for attributes specifically described as having multiple values for a single entity instance. It is exceedingly difficult for a CASE tool to identify second and third normal form errors. That would require the CASE tool to have the intelligence to recognize partial and transitive dependencies. 418 No additional notes provided.

File Design Introduction.
Most fundamental entities from the data model would be designed as master or transaction records. The master files a typically fixed length records. Associative entities from the data model are typically joined into the transaction records to form variable length records (based on the one-to-many relationships). Other types of files (not represented in the data model) are added as necessary. Two important considerations of file design are file access and organization. The systems analyst usually studies how each program will access the records in the file (‘sequentially’ or ‘randomly’), and then select an appropriate file organization. 418 In practice, many systems analysts select an indexed sequential (or ISAM/VSAM) organization to support the likelihood that different programs will require different access methods into the records.

Database Design Introduction
The design of any database will usually involve the DBA and database staff. They will handle the technical details and cross-application issues. It is useful for the systems analyst to understand the basic design principles for relational databases. 418,420 No additional notes provided.

Goals and Prerequisites to Database Design
The goals of database design are as follows: A database should provide for the efficient storage, update, and retrieval of data. A database should be reliable – the stored data should have high integrity to promote user trust in that data. A database should be adaptable and scaleable to new and unforeseen requirements and applications. 420 No additional notes provided.

Goals and Prerequisites to Database Design
The data model may have to be divided into multiple data models to reflect database distribution and database replication decisions. Data distribution refers to the distribution of either specific tables, records, and/or fields to different physical databases. Data replication refers to the duplication of specific tables, records, and/or fields to multiple physical databases. Each sub-model or view should reflect the data to be stored on a single server. 420 No additional notes provided.

Database Design The Database Schema
The design of a database is depicted as a special model called a database schema. A database schema is the physical model or blueprint for a database. It represents the technical implementation of the logical data model. A relational database schema defines the database structure in terms of tables, keys, indexes, and integrity rules. A database schema specifies details based on the capabilities, terminology, and constraints of the chosen database management system. 420 No additional notes provided.

Transforming the logical data model into a physical relational database schema rules and guidelines: Each fundamental, associative, and weak entity is implemented as a separate table. The primary key is identified as such and implemented as an index into the table. Each secondary key is implemented as its own index into the table. Each foreign key will be implemented as such. Attributes will be implemented with fields. These fields correspond to columns in the table. No additional notes provided.

Transforming the logical data model into a physical relational database schema rules and guidelines: (continued) The following technical details must usually be specified for each attribute. Data type. Each DBMS supports different data types, and terms for those data types. Size of the Field. Different DBMSs express precision of real numbers differently. NULL or NOT NULL. Must the field have a value before the record can be committed to storage? Domains. Many DBMSs can automatically edit data to ensure that fields contain legal data. Default. Many DBMSs allow a default value to be automatically set in the event that a user or programmer submits a record without a value. 421 Data type. For example, different systems may designate a large alphanumeric field differently (e.g., MEMO in Access and LONG VARCHAR in Oracle). Also, some databases allow the choice of no compression versus compression of unused space (e.g., CHAR versus VARCHAR in Oracle). Size of the Field. For example, in Oracle, a size specification of NUMBER (3,2) supports a range from to 9.99. NULL or NOT NULL. Again, different DBMSs may require different reserved words to express this property. Primary keys can never be allowed to have null values. Domains. This can be a great benefit to ensuring data integrity independent from the application programs. If the programmer makes a mistake, the DBMS catches the mistake. But for DBMSs that support data integrity, the rules must be precisely specified in a language that is understood by the DBMS. Many of the above specifications were documented as part of a complete data model. If that data model was developed with a CASE tool, the CASE tool may be capable of automatically translating the data model into the language of the chosen database technology.

Transforming the logical data model into a physical relational database schema rules and guidelines: (continued) Supertype/subtype entities present additional options as follows: Most CASE tools do not currently support object-like constructs such as supertypes and subtypes. Most CASE tools default to creating a separate table for each entity supertype and subtype. If the subtypes are of similar size and data content, a database administrator may elect to collapse the subtypes into the supertype to create a single table. Evaluate and specify referential integrity constraints. Would you ever want to compromise the third normal form entities when designing the database? For example, would you ever want to combine two third normal form entities into a single table (that would, by default, no longer be in third normal form)? Usually not! Although a DBA may create such a compromise to improve database performance, he or she should carefully weigh the advantages and disadvantages. Although such compromises may mean greater convenience through fewer tables or better overall performance, such combinations may also lead to the possible loss of data independence—should future, new fields necessitate resplitting the table into two tables, programs will have to be rewritten. As a general rule, combining entities into tables is not recommended. Please refer to figure on page 422 in the textbook. The authors apologize that this figure is not available at this time.

Data and Referential Integrity
Database Design Data and Referential Integrity There are at least three types of data integrity that must be designed into any database - key integrity, domain integrity and referential integrity. Key Integrity: Every table should have a primary key (which may be concatenated). The primary key must be controlled such that no two records in the table have the same primary key value. The primary key for a record must never be allowed to have a NULL value. 423 No additional notes provided.

Database Design Data and Referential Integrity Domain Integrity: Appropriate controls must be designed to ensure that no field takes on a value that is outside of the range of legal values. Referential Integrity: A referential integrity error exists when a foreign key value in one table has no matching primary key value in the related table. 423 For example, if GRADE POINT AVERAGE is defined to be a number between 0.00 and 4.00, then controls must be implemented to prevent negative numbers and numbers greater than Not long ago, application programs were expected to perform all data editing. Today, most database management systems are capable of data editing. For the foreseeable future, the responsibility for data editing will continue to be shared between the application programs and the DBMS. The architecture of relational databases implements relationships between the records in tables via foreign keys. The use of foreign keys increases the flexibility and scalability of any database, but it also increases the risk of referential integrity errors. For example, an INVOICES table usually includes a foreign key, CUSTOMER NUMBER, to ‘reference back to’ the matching CUSTOMER NUMBER primary key in the CUSTOMERS table. What happens if we delete a CUSTOMER record? There is the potential that we may have INVOICE records whose CUSTOMER NUMBER has no matching record in the CUSTOMERS table. Essentially, we have compromised the referential integrity between the two tables.

Database Design Data and Referential Integrity Referential Integrity: Referential integrity is specified in the form of deletion rules as follows: No restriction. Any record in the table may be deleted without regard to any records in any other tables. Delete:Cascade. A deletion of a record in the table must be automatically followed by the deletion of matching records in a related table. Delete:Restrict. A deletion of a record in the table must be disallowed until any matching records are deleted from a related table. No additional notes provided.

Database Design Data and Referential Integrity Referential Integrity: Referential integrity is specified in the form of deletion rules as follows: (continued) Delete:Set Null. A deletion of a record in the table must be automatically followed by setting any matching keys in a related table to the value NULL. 424 The final database schema, complete with referential integrity rules is illustrated in Figure 11.17, on page 425 in the textbook. The authors apologize that this figure is not available at this time. This is the blueprint for writing the SQL code (or equivalent) to create the tables and data structures

Database Design Roles Some database shops insist that no two fields have exactly the same name. This presents an obvious problem with foreign keys A role name is an alternate name for a foreign key that clearly distinguishes the purpose that the foreign key serves in the table. The decision to require role names or not is usually established by the data or database administrator. 424 Some database shops insist that no two fields have exactly the same name. This constraint serves to simplify documentation, help systems, and metadata definitions. This presents an obvious problem with foreign keys. By definition, a foreign key must have a corresponding primary key. During logical data modeling, using the same name suited our purpose of helping the users understand that these foreign keys allow us to match up related records in different entities. But in a physical database, it is not always necessary or even desirable to have these redundant field names in the database.

Database Design Database Prototypes
Prototyping is not an alternative to carefully thought out database schemas. On the other hand, once the schema is completed, a prototype database can usually be generated very quickly. Most modern DBMSs include powerful, menu-driven database generators that automatically create a DDL and generate a prototype database from that DDL. A database can then be loaded with test data that will prove useful for prototyping and testing outputs, inputs, screens, and other systems components. 424 No additional notes provided.

Database Capacity Planning
Database Design Database Capacity Planning A database is stored on disk. The database administrator will want an estimate of disk capacity for the new database to ensure that sufficient disk space is available. Database capacity planning can be calculated with simple arithmetic as follows. For each table, sum the field sizes. This is the record size for the table. For each table, multiply the record size times the number of entity instances to be included in the table. This is the table size. 426 This simple formula ignores factors should as packing, coding, and compression, but by leaving out those possibilities, you are adding slack capacity. For each table, sum the field sizes. This is the record size for the table. Avoid the implications of compression, coding, and packing – in other words, assume that each stored character and digit will consume one byte of storage. Note that formatting characters (e.g., commas, hyphens, slashes, etc.) are almost never stored in a database. Those formatting characters are added by the application programs that will access the database and present the output to the users. For each table, multiply the record size times the number of entity instances to be included in the table. It is recommended that growth be considered over a reasonable time period (e.g., three years). This is the table size.

Database Capacity Planning
Database Design Database Capacity Planning Database capacity planning can be calculated with simple arithmetic as follows. (continued) Sum the table sizes. This is the database size. Optionally, add a slack capacity buffer (e.g., 10%) to account for unanticipated factors or inaccurate estimates above. This is the anticipated database capacity. 426 This simple formula ignores factors should as packing, coding, and compression, but by leaving out those possibilities, you are adding slack capacity. For each table, sum the field sizes. This is the record size for the table. Avoid the implications of compression, coding, and packing – in other words, assume that each stored character and digit will consume one byte of storage. Note that formatting characters (e.g., commas, hyphens, slashes, etc.) are almost never stored in a database. Those formatting characters are added by the application programs that will access the database and present the output to the users. For each table, multiply the record size times the number of entity instances to be included in the table. It is recommended that growth be considered over a reasonable time period (e.g., three years). This is the table size.

Database Structure Generation
Database Design Database Structure Generation CASE tools are frequently capable of generating SQL code for the database directly from a CASE-based database schema. This code can be exported to the DBMS for compilation. Even a small database model can require 50 pages or more of SQL data definition language code to create the tables, indexes, keys, fields, and triggers. Clearly, a CASE tool’s ability to automatically generate syntactically correct code is an enormous productivity advantage. Furthermore, it almost always proves easier to modify the database schema and re-generate the code, than to maintain the code directly. 426 Figure 11.8 on pages 427 and 428 is a sample page of code generated by System Architect from the SoundStage database schema. The authors apologize that this figure is not available at this time.

The Next Generation of Database Design
Introduction Relational database technology is widely deployed and used in contemporary information system shops. One new technology is slowly emerging that could ultimately change the landscape dramatically – object database management systems. The heir apparent to relational DBMSs, object database management systems store true objects, that is, encapsulated data and all of the processes that can act on that data. Because relational database management systems are so widely used, we don’t expect this change to happen quickly. It is expected that these vendors will either build object technology into their existing relational DBMSs, or they will create new, object DBMSs and provide for the transition between relational and object models. 426 No additional notes provided.

Summary Introduction Conventional Files Versus the Database Database Concepts for the Systems Analyst Data Analysis for Database Design File Design Database Design The Next Generation of Database Design

The chapter will address the following questions:

Similar presentations

Presentation on theme: "The chapter will address the following questions:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The chapter will address the following questions:

Similar presentations

Presentation on theme: "The chapter will address the following questions:"— Presentation transcript:

Similar presentations

About project

Feedback