2 Chapter Objectives Explain data design concepts and data structures Describe file processing systems and various types of filesUnderstand database systems and define the components of a database management system (DBMS)Describe Web-based data design3
3 Chapter ObjectivesExplain data design terminology, including entities, fields, common fields, records, files, tables, and key fieldsDescribe data relationships, draw an entity-relationship diagram, define cardinality, and use cardinality notationExplain the concept of normalizationExplain the importance of codes and describe various coding schemes3
4 Chapter ObjectivesDescribe relational and object-oriented database modelsExplain data warehousing and data miningDifferentiate between logical and physical storage and recordsExplain data control measures
5 IntroductionYou will develop a physical plan for data organization, storage, and retrievalBegins with a review of data design concepts and terminology, then discusses file-based systems and database systems, including Web-based databasesConcludes with a discussion of data storage and access, including strategic tools such as data warehousing and data mining, physical design issues, logical and physical records, data storage formats, and data control4
6 Data Design Concepts Data Structures A file or table contains data about people, places, things, or events that interact with the systemFile-oriented systemFile processing systemDatabase systemData StructureA framework for organizing and storing data in an information system.Consists of files or tables linked in various ways.Depending on this link either a file processing system or database management system
7 Data Design Concepts Overview of File Processing Potential problems Data redundancyData integrityRigid data structureFile ProcessingA system that uses files that contain all the data necessary for inquiries/reportingIt is still used.It is efficient, because …They do not require the additional processor time and memory space needed by preprogrammed database functionsThey are simple to create, especially when a data file is accessed by only one applicationThey can be tailored tightly to specific application or business needsDisadvantages:Data redundancy results because each department has its own files, and the same data (such as an employee name) frequently appears in each of those filesIn a file processing system, the need to make changes to data in multiple files (which results from data redundancy) can compromise data integrity.The rigid data structure of a file processing system results from a department’s data files usually being closely tied to the department’s applications. This can make it difficult for managers who require information from multiple departmentsExample: previous slide 3 items of information (mechanic no, name, and payrate) are stored in both data files
8 Data Design Concepts Overview of File Processing Uses various types of filesMaster fileTable fileTransaction fileWork file – scratch fileSecurity fileHistory fileThese files are used by a file processing system.Master File:Dynamic file that stores permanent type data and contains one record for each recordEx. School maintains master files for courses, students, facultyTable File:Contains reference data used by the information systemStatic and not updated by information systemEx. Tax tables, Postage rate tables, Zip code tablesTransaction File:Stores records that contain day-to-day business and operational dataAn input file that updates a master fileEx. Charges, PaymentsWork File or Scratch File:A temporary file created by in information system for a single task.Usually created by one process in the information system and used by another process within the same system.Ex. Sorted files, Report files, Output reports until printedSecurity File:Created and saved for backup and recovery purposesEx. Audit trail files, backups of master, table transaction filesHistory File:Created for archiving purposesEx. Inactive Student file If student hasn’t register in last 2 semesters, student deleted from active student master and added to inactive. If re-registers, deleted from inactive and added back to active student master
9 Data Design Concepts Overview of Database Systems A properly designed database system offers a solution to the problems of file processingProvides an overall framework that avoids data redundancy and supports a real-time, dynamic environmentDatabase management system (DBMS)The main advantage of a DBMS is that it offers timely, interactive, and flexible data access
10 Data Design Concepts Overview of Database Systems Advantages ScalabilityBetter support for client/server systemsEconomy of scaleFlexible data sharingEnterprise-wide application – database administrator (DBA)Stronger standardsScalability system can be expanded, modified, or downsized to meet changing needs of businessClient/Server support these systems require power and flexibility of a db designEconomies of Scale a company that uses an enterprise-wide db with a powerful mainframe server instead of several smaller computers is saving money through economies of scale. The more processing, the cheaper it gets.Flexible Data Sharing The usage of a DB allows users to access information from anywhere and view consistent information in different waysDBA The Database Administrator typically administers the databse.Stronger Standards Important to have standardization on data names, formats, and documentation throughout the organization used in the database (DBA helps to ensure)
11 Data Design Concepts Overview of Database Systems Advantages Controlled redundancyBetter securityIncreased programmer productivityData independenceAdvantages of DB SystemsControlled Redundancy:We are not storing data in numerous placeReduces inconsistency and data errorsData items do not need to be duplicated in multiple locationsBetter Security:DBA defines authorization procedures to ensure correct access to dbProgrammer Productivity:Programmers to not have to create the underlying file structure for a databaseThey can concentrate on logical design and a new db application can be developed more quickly than a file-oriented systemData Independence:Systems that interact with a DBMS are relatively independent of how the physical data is maintainedThus you can alter data structures without modifying the systems that use data
12 Data Design Concepts Database Tradeoffs Because DBMSs are powerful, they require more expensive hardware, software, and data networks capable of supporting a multi-user environmentMore complex than a file processing systemProcedures for security, backup, and recovery are more complicated and critical
13 DBMS ComponentsInterfaces for Users, Database Administrators, and Related SystemsUsersQuery languageQuery by example (QBE)SQL (structured query language)Database AdministratorsA DBA is responsible for DBMS management and supportTo work with the data within a DBMS there are several options:Query Language:A query is a request for specific data from a database.Query languages allow users to specify the data to display, print, or store.Each query language has its own grammar and vocabulary.Even without a programming background, most query languages can be learned in a short time.Query By Example:Query by example provides a graphical user interface to assist users with retrieving data.QBE is a relatively simple concept in which users are led intuitively through a query, step by step.Most DBMSs provide a QBE feature. (Access)SQL (Structured Query Language):SQL truly is a multiplatform tool.SQL is available in popular database programs for personal computers and networks, and most relational databases for midrange servers and mainframes include SQL.At one time, only professional programmers could access mainframe data.Database Administrators:The DBA maintains the database and assesses its requirements. Many large companies have both a DBA and a database analyst (DA), or data modeler, who focuses on the meaning and use of data. In smaller companies, one person often assumes both roles.
14 DBMS ComponentsInterfaces for Users, Database Administrators, and Related SystemsRelated information systemsA DBMS can support several related information systems that provide input to, and require specific data from, the DBMSRelated information systems:With these systems, unlike a human interface, no human intervention is required for the 2-way communication that occurs between the DBMS and the related systems.
15 DBMS Components Data Manipulation Language Schema A data manipulation language (DML) controls database operations, including storing, retrieving, updating, and deleting dataSchemaThe complete definition of a database, including descriptions of all fields, tables, and relationships, is called a schemaYou also can define one or more subschemasData Manipulation Language:DML serves the user who is querying a database. Some DBMSs, such as Microsoft Access, use QBE to hide the DML from a userSchema:Describes the structure of the database and contains descriptive information about the stored data, including access and content controls, relationships among data elements, and details of physical data store organization.Subschema: A view of the database that a particular system or user needs or is allowed to accessTo protect privacyproject management system does not retrieve employee pay ratesThis could be used to restrict the level of access the a user is given to the schemasome users can only update data, others can delete and create
16 DBMS Components Physical Data Repository The data dictionary is transformed into a physical data repository, which also contains the schema and subschemasThe physical repository might be centralized, or distributed at several locationsODBC – open database connectivityJDBC – Java database connectivityODBCUses SQL statements that the DBMS understands and can executeJDBCEnables Java applications to exchange data with any database that uses SQL statements and is JDBC-compliant
17 Web-Based Database Design Characteristics of Web-Based DesignIn a Web-based design, the Internet serves as the front end, or interface, for the database management systemInternet technology provides enormous power and flexibilityWeb-based systems are popular because they offer ease of access, cost-effectiveness, and worldwide connectivity
18 Web-Based Database Design Connecting a Database to the WebDatabase must be connected to the Internet or intranetMiddlewareMacromedia’s ColdFusionTo access data in a Web-based system, the db must be connected to the Internet or intrnetThe db and the Internet speak two different languages, howeverDb have nothing to do with HTML, the language of the WebMiddleware:Software that allows DB to connect to the Web and enable data to be viewed and updatedColdFusion – A Middleware product to accelerate the deploy web applications
19 Web-Based Database Design Data SecurityWeb-based data must be totally secure, yet easily accessible to authorized usersTo achieve this goal, well-designed systems provide security at three levels:The database itselfThe Web serverThe telecommunication links that connect the components of the system
20 Data Design Terminology DefinitionsEntityTable or fileFieldAttributeCommon fieldRecordTupleNow it is time for the systems analyst to select a design approach and begin to construct the system.Entity:Something we are collecting and maintaining data about a person, place, thing, or eventTable/File:Data is organized into tables or files which contain related records about the entityStructure consists of columns and rowsField:Attribute or characteristic about the entity.First Name, Last Name, AddressA common field is an attribute that appears in more than one entity can is used to link entities in various types of relationships (primary foreign keys)Record:Also called tuple describes one instance or occurrence of an entityA set of related fields
21 Data Design Terminology Key FieldsPrimary keyComposite key(aka Combination key , Concatenated key, Multi-valued key)Candidate keyNonkey fieldForeign keySecondary keyPrimary KeyUniquely identifies a field or combination of fields of an entityWould you want to use a field like Name for a Key field? No Why not? Not UniqueComposite KeyA combination of 2 or more key fields that make up the primary keyCandidate KeyBefore you choose the primary key, a candidate key is a field that could be a choice for the primary keyIf not chosen, this is a nonkeyForeign KeyA field in one table that matches a primary key in another table to establish the relationship between the two tablesThe foreign key does not have to be uniqueEx. Carlton Smith has Advisor 49. The value 49 must be a unique value in the ADVISOR table because it is the PK but 49 can appear any number of times in the STUDENT table, where the advisor number is a foreign key.Turn to pg. 317 Figure Bottom: Student-Number and Course-ID are foreign keys that serve as the PK in the GRADE table.Using both of these fields as the PK assures the the grade will be assigned to the proper student in the proper coursesSecondary KeysKey values that are not uniqueZip Code
22 Data Design Terminology Referential IntegrityValidity checks can help avoid data input errorsA type of validity check that is a set of rules that will not allow …data inconsistenciesquality problemsWhen refer to a relational db, it means that …A foreign key value cannot be entered in one table unless you have the matching primary key in another table.You cannot enter a customer order into the ORDER table unless that customer already exists in the CUSTOMER tableIf you don’t have referential integrity you could end up with an orphan order because there was not related customerIf you had an Order Master and an Order Detail table and you don’t have referential integrity you could end up deleting the order master records from the ORDER MASTER table leaving the ORDER DETAIL records as orphans.
23 Entity-Relationship Diagrams An entity is a person, place, thing, or event for which data is collected and maintainedProvides an overall view of the system, and a blueprint for creating the physical data structuresEntity-relationship diagramERDa model that shows the logical relationships and interaction among system entities.Provides an overall view of the system and blueprint for creating physical data structures
24 Entity-Relationship Diagrams Drawing an ERDThe first step is to list the entities that you identified during the fact-finding process and to consider the nature of the relationships that link themEntities are labeled with singular nounsRelationships are diamond shapesERDs depict relationships, not data or information flows
25 Entity-Relationship Diagrams Types of RelationshipsOne-to-one relationship (1:1)One-to-many relationship (1:M)Many-to-many relationship (M:N)Associative entityThere are 3 types of Relationships that can exist between entities:1-1exists when exactly one of the 2nd entity occurs for each instance of the 1st entityFigure 7-15, pg. 3181-Mexists when one occurrence of the 1st entity can relate to many instances of the 2nd entity, but each instance of the 2nd entity can associate with only one instance of the 1st entityFigure 7-16, pg. 319M-Mexists when one instance of the 1st entity can relate to many instances of the 2nd entity , and one instance of the 2nd entity can relate to many instances of the 1st entityFigure 7-17, pg. 319Notice the M-M is different from the 1:1 or 1:M in in that the event or transaction that links the 2 entities together is actually a third entity, called an associative entity that has its own characteristics.
26 Entity-Relationship Diagrams CardinalityCardinality notationCrow’s foot notationCardinality:After drawing the initial ERD then you need to determine how many instances of one entity relate to instances of the other entity. This technique is called cardinality.Cardinality Notation:Modeling this interaction is done by using special symbols that represent the relationship, termed cardinality notation.Crow’s Foot Notation:The symbols used for the notation include circles, bars, and symbols that resemble crow’s feet.Figure 7-20, Pg. 321 NotationSingle Bar | indicates oneDouble Bar | | indicates one and only oneCircle indicates zeroCrow’s Foot indicates manyGo through Figure 7-20, pg. 321
27 Normalization Table design Involves four stages: unnormalized design, first normal form, second normal form, and third normal formMost business-related databases must be designed in third normal formWe now start the process of creating our tables but this is only a logical view. These tables may or may not translate into actual tables in the physical design.Normalization:Is a process where we create table designs by assigning specific fields to each table in the database based on a set of rules with the goal in mind to correct inherent problems with our table design.It involves 4 stages:Unnormalized Design, 1NF, 2NF, 3NFThe 3 normal forms constitute a progression in 3NF is the best designMust business db must be designed in 3NF
28 Normalization Standard Notation Format Designing tables is easier if you use a standard notation format to show a table’s structure, fields, and primary keyExample: NAME (FIELD 1, FIELD 2, FIELD 3)
29 Normalization Repeating Groups and Unnormalized Designs Often occur in manual documents prepared by usersUnnormalized designThe first inherent problem with unnormalized data is something called repeating groups.A group of fields that occur any number of times in a single record with each occurrence have different values.They often occur in manual documents.Ex. A report card with student’s information at the top followed by a list of courses and grades at the bottom. This academic or grade information would represent a repeating group.P. 7-22, pg. 323Think of repeating group as set of child (subsidiary) records contained within the parent (main) record.
30 Normalization First Normal Form Second Normal Form A table is in first normal form (1NF) if it does not contain a repeating groupTo convert, you must expand the table’s primary key to include the primary key of the repeating groupSecond Normal FormTo understand second normal form (2NF), you must understand the concept of functional dependenceFunctionally dependent1NF:A table is in 1NF if it DOES NOT contain a repeating groupYou give the new table the new primary key and include that as a FK in the original table Pg. 323, Figure 722 Raw, Pg. 324, Figure 723 – 1NF2NF:The table is in 1NF AND all fields that are not part of the primary key are functionally dependent on the entire primary key.
31 Normalization Second Normal Form A standard process exists for converting a table from 1NF to 2NFCreate and name a separate table for each field in the existing primary keyCreate a new table for each possible combination of the original primary key fieldsStudy the three tables and place each field with its appropriate primary key
32 Normalization Second Normal Form Four kinds of problems are found with 1NF designs that do not exist in 2NFConsider the work necessary to change a particular product’s description1NF tables can contain inconsistent dataAdding a new product is a problemDeleting a product is a problem2NF:Consider Work to Make Changes to Fields:If there are 500 current orders for Product #304 We have to modify 500 records. Updating would be cumbersome and expensive.1NF Table Contain Inconsistent Data:Entering data manually leaves you wide open to mismatching of data.Adding A New Product is a Problem:Because PK includes Order Number and Product #, you need values for both fields to add a record. How do you enter a new product that has not yet ben ordered by a customer? Could use a dummy order number but can create difficulties.Pg. 326, Figure 7-24
33 Normalization Third Normal Form 3NF design avoids redundancy and data integrity problems that still can exist in 2NF designsA table design is in third normal form (3NF) if it is in 2NF and if no nonkey field is dependent on another nonkey fieldTo convert the table to 3NF, you must remove all fields from the 2NF table that depend on another nonkey field and place them in a new table that uses the nonkey field as a primary keyFigure 7-26, pg NFFigure 7-27, pg 328 3NF
34 Normalization A Normalization Example To show the normalization process, consider the familiar situation, which depicts several entities in a school advising system: ADVISOR, COURSE, and STUDENT
35 Steps in Database Design Create the initial ERDAssign all data elements to entitiesCreate 3NF designs for all tables, taking care to identify all primary, secondary, and foreign keysVerify all data dictionary entriesAfter creating your final ERD and normalized table designs, you can transform them into a databaseCreate ERD:Review the DFDs to identify system entities, talk to usersCreate a draft ERDAnalyze each relationship of the ERD to see if it is 1:1, 1:m, or M:NFigure 7-39, p. 337Assign all data elements to entities:Verify that every data element in DD is associated logically with an entityCreate 3NF designs for all tables, taking care to identify all primary, secondary, and foreign keysGenerate the final ERD that will include new entities identified during normalizationFigure 7-40, p. 338Verify all data dictionary entriesVerify and check all entries have been made for data stores, records, and data elements, and codes.Transformation into a database will take place once finished and other steps have been completed.
36 Chapter Summary Any questions? Files and tables contain data about people, places, things, or events that affect the information systemDBMS designs are more powerful and flexible than traditional file-oriented systemsData design tasks include creating an initial ERD; assigning data elements to an entity; normalizing all table designs; and completing the data dictionary entries for files, records, and data elementsAny questions?49