Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Modeling and Database Design

Similar presentations


Presentation on theme: "Data Modeling and Database Design"— Presentation transcript:

1 Data Modeling and Database Design
is assigned to contains staffed by subcontract member is a member of belongs to Employee Employee number First name Last name Employee function Employee salary Team Team number Specialty Division Division number Division name Division address Task Task name Task cost Project Project number Project name Project label Start date End date Customer Customer number Customer name Customer address Customer activity Customer telephone Customer fax Minder Chen, Ph.D.

2 Rationales for Data Modeling
Data is the foundation of modern information systems enabled by data base technologies. Data in an organization exist and can be described independently of how these data are used. Data should be managed as a corporate-wide resource. The types of data used in an organization do not change very much. Data have certain inherent properties which lead to correct structuring. If we structure data according to their inherent properties, the structure (i.e., data models) will be stable.

3 History of Data Modeling
Importance of Entity-Relationship Modeling Technique Database Data modeling and enterprise-wide data Data quality Data updating and accessing tools and procedure Data sharing culture ER modeling technique was first developed by Peter Chen in 1976 A conceptual/logical data modeling tool A user-oriented approach A graphic-based method ER modeling technique is the major data modeling method in Information Engineering and is widely supported by most of CASE tools. Data modeling is the foundation of most database-centered transaction processing systems and data warehouse systems

4 C/S Development Methodology
SDLC rules=> performance => Conceptual Analysis Logical Design Physical Design C/S Architecture User Interface Work Flow Form Sequences Forms, Screens Process Flow Object Interaction Model Programs, Procedures Application Logic Information & Data Base Data Model Database Schema Tables, Indexes Source: David Vaskevitch, Client/Server Strategies, IDG Books, 1993.

5 Client/Server Application Development Methodology
Where Do You Start? Requirements Information & Data Base Processes Behavior Workflow User Interface Architecture Application Design and Development Source: David Vaskevitch, Client/Server Strategies, IDG Books, 1993.

6 Multiple Perspectives
ONE BUSINESS We do these things We use this data DATA ACTIVITY EMPLOYEE HIRE PAY PROMOTE FIRE ...... ....

7 Data Model (Entity Relationship Diagram)
Member placed by; is enrolled under; Member Agreement Order places applies to sells; generates; established by; is sold on generated by established is featured in; sponsors; Product Promotion Club features is sponsored by

8 Entity Types Definition: Identifying Entity Types
An entity is an object or event, real or abstract, about which we would like to store data. Entity is the abbreviation of entity type. It represent a set of entity instances which can be described by the same set of attribute types. The value of the same attribute for each entity instance may be different. Identifying Entity Types What information is required by the business? Things that are of interest to the business that need to be remembered in order to manage and track them. Things belong to the same entity type have common characteristics.

9 Naming Entity Types The name of each entity is in singular form
a noun an adjective + a noun a noun + a noun => (noun string) an adjective + a noun + a noun Examples Customer, Customer Order, Product, Hourly Employee, Project, Department, Unfilled Customer Order Be clear and concise Avoid abbreviation Be consist with user’s terminology Identify synonyms Customer Client Product Merchandise Supplier Vendor Teacher Faculty Use one name as the official name and document others as aliases

10 Exercise: Entity Type Naming
Courses Department Customer Order PO

11 Properties of Entity Types
Name Description Identifier Properties: Estimated number (Max., Min., Average) of entity instances Expected growth rate of entity instances Subject Area in which the Entity Type resides Attributes that describe the Entity Types Examples of entity type instances

12 Definition of an Entity Type
A poor definition of Customer: Anyone that buys something from the company. Can employees be a customer? Can a leaser be a customer? If the company sold a subsidiary to another company, does the new owner consider a customer? Good definition should be: Compatible Precise Concise Clear Complete

13 Good Definition Compatible Precision: Complete Concise and Clear
Customer: An ORGANIZATION that purchase PRODUCTs for personal use. Distributor: An ORGANIZATION that purchase PRODUCTs for resale. Precision: With appropriate qualifiers Example: An ORGANIZATION is considered to have purchase a PRODUCT when we receive a valid PURCHASE ORDER from it. Complete ORGANIZATION, PRODUCT, PURCHASE ORDER need to be defined. Concise and Clear Use modular definition

14 Example of Entity Type Descriptions
Customer Information about all persons or organizations who purchases Product All goods manufactured and sold Raw-material Components used to manufacture Products. Supplier Vendors of Raw Materials. Buyer Company personnel responsible for purchasing Raw-Materials from Suppliers

15 Entity Type and Entity Instance (Occurrence)
Entity Types Entity Instance Vendor ABC Co. Employee John Smith Course Intro. to IE Department Marketing Department

16 Exercise: Entity Types or Entity Instances?
Maryland Organization Unit Customer President Bill Clinton Department of Commerce Address

17 Finding Entity Types Interviews with users JAD workshops Business forms Reports Computer files using reverse engineering Operation manuals

18 Where to Look for an Entity Type?
Tangible or Intangible Things The nouns that are used to describe the problem domain will often correspond to the major Entity Types of the system, at least at a high level. Examples: Product, Sensor, and Employee, Department, and Sale Office. Resources Any resources that an organization needs to manage should be represented as an Entity Type. Information assists the efficient and effective use of other resources through improved decision. Examples: Inventory, Machine, Bank Account, and Customer. Roles Played Roles can be played by persons or organizational units. Examples: Customers, Managers, and Account representatives. Events Events are incidents that occur at points in time. An event often involved an interaction between two Entity Types or an action that changes the status of an Entity Type. Examples: Sale, Delivery, and Registration of a motor vehicle.

19 BIAIT: Business Information Analysis and Integration Technique
Analysis of Orders Ordered entities can be a thing, a space, or a skill. View the order from supplier side. If an organization receives no orders, it has no reason for existing. An organization unit can receive multiple types of orders. 4 questions about the Supplier: Billing (Cash)? Deliver Late (Immediate)? Profile customer? Negotiate price (Fixed)? 3 questions about the Ordered Entity: Rented (Sold)? Tracked? Made to order (Stock)? Source: Carlson, W. M., "BIAIT: Business Information Analysis and Integration Technique - The New Horizon," Data Base, Vol. 10, No. 4, 1979, pp. 3-9.

20 Criteria for Evaluating an Entity Type
Need to be remembered by the information system in order to be functional. Can be operated on: CREATE, READ, UPDATE, DELETE. Has a set of operations/services that always apply to change the status of each occurrence of an Entity Type. Carry a set of attributes that always apply to describe each occurrence of an Entity Type. Have at least one relationship with other entity type. Exist more than one entity occurrence (instance) in an Entity Type. Have at least a unique identifier. Domain-based requirements: Something that the system must have in order to operate. These may be clearly specified in the problem description or known from subject matter experts.

21 Entity Relationship Modeling and Diagramming
Relationships Entity Relationship Diagramming Notation Attributes Identifiers Partitioning and Entity Subtypes

22 Relationship (Type) Definition Examples
A Relationship Type is an association among Entity Types. It indicates that there is a business relationship between these Entity Types. Relationship Membership is the participation of an Entity Type in a Relationship. In IE, a Relationship Type can involve only two Entity Types (binary relationship). Some other modeling techniques allow n-ary relationships. Examples CUSTOMER places ORDER ORDER is placed by CUSTOMER EMPLOYEE works on PROJECT PROJECT has project member EMPLOYEE

23 Paring (Relationship Instance)
Relationship paring is a pair of Entity Instances of two Entity Types associated by a Relationship Type between these two Entity Types. Entity Types Entity Instance Student Student#1 Student#2 Course Course#A Course#B Course#C Course#D Relationship Relationship Paring Student takes Course Student#1 takes Course#A Student#1 takes Course#B Student#1 takes Course#D Student#2 takes Course#A Student#2 takes Course#C Student#2 takes Course#D

24 Relationship Instances Grouping
Definition: A collection of pairings of a Relationship Membership in which an Entity Instance is involved. Examples: Student#1 takes Course#A, #B, and #D Student#2 takes Course#A, #C, and #D Course#A is taken by Student#1 and Student#2

25 Relationship Cardinality
One-to-One 1:1 One-to-Many E1 E2 1:M Many-to-Many E2 E1 M:N

26 Relationship Cardinality
The number of Entity Instances involved in the Relationship Instances Grouping in a Relationship Type. Three Forms of Cardinality 1. One-to-one (1:1) DEPARTMENT has MANAGER Each DEPARTMENT has one and only one MANAGER Each MANAGER manages one and only one DEPARTMENT 2. One-to-many (1:m) CUSTOMER places ORDER Each CUSTOMER sometimes (95%) place one or more ORDERs Each ORDER always is placed by exactly one CUSTOMER 3. Many-to-many (m:n) INSTRUCTOR teaches COURSE Each INSTRUCTION teaches zero, one, or more COURSEs Each COURSE is taught by one or more INSTRUCTORs

27 Entity Relationship Diagram (ERD): Notations
Graphical Notations Cardinality indicator zero one many relationship-description Entity-X Entity-Y reversed-relation-description min max Translate into two structured statements Each Entity-X relationship-description cardinality-indicator (one-or-many) Entity-Y Each Entity-Y reversed-relationship-description (zero-or-one) Entity-Y Example is-managed-by Department Manager manages

28 Optionality of Relationship Memberships
Whether all entity instances of both entity types need to participate in relationship pairing. Optionality: Mandatory Optional Example: CUSTOMER membership is optional ORDER membership is mandatory places ORDER CUSTOMER is placed by

29 Relationship Statements
Cardinality indicator one one or more Graphical Notations places ORDER CUSTOMER is placed by Optionality indicator zero (sometimes) one (always) Each Entity X optionality relationship cardinality Entity Y Each CUSTOMER sometimes places one or more ORDER. Each ORDER always is placed by one CUSTOMER.

30 Defining Relationships
Name Description Property Cardinality volumes Optionality percentage: % of Entity Type X's instances pairing with Entity Type's Y's instances Transferability: A relationship is transferable if an entity instance can change its pairing within the same relationship. TRANSFERABLE: An EMPLOYEE can change to a different DEPARTMENT. NON-TRANSFERABLE: An ORDER cannot be transferred to another CUSTOMER.

31 ERD: More Examples Parallel Relationship Involuted or Looped
places Customer Order belongs-to is-contained-in Product contains Parallel Relationship (b) manages Employee Project is-managed-by works-for has-project-members is-consists-of (c) Involuted or Looped Relationship Part contained-in

32 Identifying Relationships
Association between entity types Entity types that are used on the same forms or documents. A description in a business document that has a verb that relates two entity types has consists of uses

33 Attributes Definition Naming Conventions: Examples
Characteristics that could be used to describe Entity Types and Relationship Types. However, in IE, relationship types are not allowed to have attributes. Naming Conventions: Names that have business meaning Don't use abbreviation or possessive case, e.g., PN and Customer's name Don't include entity type name because IEF will prefix the attribute name with entity type name automatically Use standard format: Entity Type Name (Qualifiers) Domain Name Customer Name Employee Starting Date Examples Customer has customer name, address, and telephone number Product has quantity-on-hand, weight, volume, color, and name. Employee has SSN, salary, and birthday. Employee-works-for-project has percentage-of-time, starting-date.

34 Attributes: Notations
Student ID Student Student Student Name studentID Birth date Student ID name phone enrollment Course no. Birth date Student(Student ID, Student Name, Birth Date) Finding Attributes: Attributes are identified progressively during BAA phase. Data Analysis Activity Analysis Interaction Analysis Current Systems Analysis

35 Attribute Value Definition Examples
Attribute Values are instances of Attributes used to describe specific Entity Instances Examples Customer Number: Customer Name: Minder Chen State: VA Order Total: $23,000 Sale tax: $250 An attribute of an entity type should have only one value at any given time. (No repeating group) Avoid using complex coding scheme for an attribute. For example: PART Number: X-XXX-XXX Part Type Material Sequence Number

36 Type & Instance OBJECT TYPE OCCURRENCE Entity Type Entity Instance
Entity Entity Instance Entity Type Entity Relationship (Type) Pairing (Relationship Instance) Attribute (Type) (Attribute) Value

37 Attribute Source Categories
Basic Definition: An Attribute Value that cannot be deduced or calculated. Examples: Student name and Birthday Derived Definition: The Attribute Value can be calculated or deduced from relationship Groupings or from the values of other Attributes. The value of a Derived Attribute changes constantly. Examples: Student Age, Account Balance, Number of courses taken. Designed Definition: The Attribute is created to overcome the system constraints. The value of a Designed Attribute does not change. Examples: Student ID, Course number.

38 Properties of Attributes
Name Description Attribute Source Category: Basic, Derived, Designed Domain or data type: Text, Number, Date, Time, Timestamp Optionality: Mandatory or optional Length and/or precision Permitted Values (Legal Values) Ranges A set of values (Code Table) Default value or algorithm Tools such as PowerBuilder has additional properties for table’s columns called extended attributes Validation Rule Editing Format Reporting Format Column Heading Form Label Code Table

39 Data Modeling Case Study
The following is description by a pharmacy owner: "Jack Smith catches a cold and what he suspects is a flu virus. He makes an appointment with his family doctor who confirm his diagnosis. The doctor prescribes an antibiotic and nasal decongestant tablets. Jack leaves the doctor's office and drives to his local drug store. The pharmacist packages the medication and types the labels for pill bottles. The label includes information about customer, the doctor who prescribe the drug, the drug (e.g., Penicillin), when to take it, and how often, the content of the pill (250 mg), the number of refills, expiration date, and the date of purchase." Please develop a data model for the entities and relationships within the context of pharmacy. Also develop a definition for "prescription". List all your underlying assumptions used in your data models.

40 Data Modeling Process List entity types Create relationships
Pick a central entity type Work around the neighborhood Add entity types to the diagram Build relationships among them Determine cardinalities of relationships Find/Create identifiers for each entity type Add attributes to the entity type in the data model Analyze and revise the data model

41 Classifying Attribute and Partitioning
An Entity Subtype A collection of Entities of the same type to which a narrower definition and additional Attributes and Relationships apply. An Entity Subtype inherits (retains) all the Attributes and Relationships of its parent Entity Type. Classifying Attribute: An attribute of the Base Entity Type whose values partition the Entity Instances into Subtypes. Partitioning: A basis for subdividing one entity type into subtypes. The process of dividing an Entity Type into several Subtypes based on a Classifying Attribute is called Partitioning. The Classifying Attribute is recorded as a property of the Partitioning and it appears on the diagram.

42 Normalization A data base is a model or an image of the reality.
Logical Data Base Design is a process of modeling and capturing the end-user views of an application domain and synthesis them into a data base structure. Normalization is a logical data base design method. The basis for normalization is the functional dependencies among attributes in a table.

43 SQL Terminology CREATE TABLES (p_no CHAR(5) NOT NULL,
Column Product Table p_no product_name quantity price 101 Color TV 201 B&W TV 202 PC Row Create a table in SQL CREATE TABLES (p_no CHAR(5) NOT NULL, product_name CHAR(20), quantity SMALLINT, price DECIMAL(10, 2));

44 SQL Terminology Set Theory Relational DB File Example
Relation Table File Product_table Attribute Column Data item Product_name Tuple Row Record Product_101's info. Domain Pool of legal values Data type DATE

45 SQL Principles The result of a SQL query is always a table (View or Dynamic Table) Rows in a table are considered to be unordered Dominate the markets since late 1980s Can be used in interactive programming environments Provide both data definition language (DDL) and data manipulation language (DML) A non-procedural language Can be embedded in 3GL: Embedded SQL Dynamic SQL

46 SQL: Data Definition Language (DDL)
TABLE VIEW INDEX DATABASE CREATE DROP ALTER TABLE

47 SQL: Introduction A relational data base is perceived by its users as a collection of tables E. F. Codd 1969 Dominate the markets since late 1980s Strengths: Simplicity End-user orientation Standardization Value-based instead of pointer-based Endorsed by major computer companies Most CASE products support the development of relational data base centered applications

48 SQL: Data Manipulation Language (DML)
p_no product_name quantity price 101 Color TV 201 B&W TV 202 PC SELECT UPDATE INSERT DELETE The Generic Form of the SELECT Statement SELECT [DISTINCT] column(s) FROM table(s) [WHERE conditions] [GROUP BY column(s) [HAVING condition]] [ORDER BY column(s)]

49 Database Table The following code retrieves only the Last Name and the Employee ID where the Employee ID is greater than 5. The records are retrieved in descending order. SELECT LastName, EmployeeID FROM Employees WHERE EmployeeID > 5 ORDER BY EmployeeID DESC

50 WHERE Clause WHERE: Use the Where clause to limit the selection. The # symbol indicates literal date values. SELECT * FROM Employees WHERE LastName = "Smith" SELECT Employees.LastName FROM Employees WHERE Employees.State in ('NY','WA') SELECT OrderID FROM Orders WHERE OrderDate BETWEEN #01/01/93# AND #01/31/93#

51 Keys A key, also called identifier, is an Attribute or a Composite Attribute that can be used to uniquely identify an instance of an entity type. Examples: Entity Type Key Warehouse Warehouse Number Product Product Number Student Student ID or SSN Ship Name and Port of Registration Stock of Product Product Number and Warehouse No.

52 Types of Key Primary Key: A unique key is an attribute or a set of attributes that has been used by the DBMS as the identifier of a table. Candidate (Alternative) Key: An attribute or a set of attributes that could have been used as the primary key of a table. Secondary (Index) Key: An attribute or a set of attributes that has been used to construct the data retrieval index. Concatenated (Combined or Composite) Key: A set of attributes that has been used as the key. Foreign Key: An attribute or a set of attributes that is used as the primary key in another table.

53 Purposes of Normalization
Avoid maintenance problems such as Update . Insert: There may be no place to insert new information. Delete: Some important information will be lost by deletion. Update: Inconsistency may occur because of the existence of data redundancy. Provide maximum flexibility to meet future information needs by keeping tables corresponding to object types in their simplified forms.

54 A Common Sense Approach to Normalization
Don't rush to put all the information in one table. Create a table to correspond to a class of a simple object type that should exist by itself, i.e., "one fact in one place." Include common fields (links) as ways of joining information from several related tables. Avoid redundancy by using links to retrieve data from related tables.

55 It is built around the concept of normal forms.
Normalization Theory Normalization is a process of systematically breaking a complex table into simpler ones. It is built around the concept of normal forms. A relation is in a particular normal form if it satisfies a specific set of constraints such as dependencies among attributes in the relation. For x is an integer and x > 1, if a relation is in x-NF than it is in (x-1)-NF. Higher order normal forms are usually more desirable than lower order normal forms. Normalization process usually starts from complex relations which are usually drawn from some existing documents such as business forms.

56 A Business Form

57 An Informal Example of Normalization
A CUSTOMER ORDER contains the following information: OrderNo OrderDate CustNo CustAddress CustType Tax Total one or more than one Order-Item which has ProductNo Description Quantity UnitPrice Subtotal.

58 Solution Unnormalized table 1st NF 2nd NF 3rd NF
(OrderNo, OrderDate, CustNo, CustAddress, CustType, Tax, Total, 1{ProductNo, Description, Quantity, UnitPrice,Subtotal}n) Remove repeating group (OrderNo, ProductNo, Description, Quantity, UnitPrice, Subtotal) 1st NF Remove partial FD 2nd NF (OrderNo, OrderDate, CustNo, CustAddress, CustType, Tax, Total) Remove transitive FD (OrderNo, ProductNo, Quantity, UnitPrice, Subtotal) (ProductNo, Description, UnitPrice) (OrderNo, OrderDate, CustNo, Tax, Total) 3rd NF (CustNo, CustAddress, CustType)

59 Unnormalized Form A relation that has multi-valued attributes (repeating groups). Normalization Process: Remove Multi-value Attributes If an unnormalized relation R has a primary key K and a multi-value attribute M, the normalization process is: The multi-value attribute M should be removed from R. A new relation will be created with (K,M) as the primary key of the relation. There may be some other attributes associated with this new relation. R will then be at least in 1NF. Example: An Employee relation has an attribute language-spoken. For some employees there may be more than one language that they can speak. EMP (employeeID, empName, empAddress, (language1, language2, ...)) ò EMP (employeeID, empName, empAddress) EMP-LANGUAGE (employeeID, language, skillLevel)

60 How Do You Remove the Repeating Groups?
CREATE TABLE MEM_CONDITION ( MEMBER# VARCHAR2(12) NOT NULL, CASE# VARCHAR2(16) NOT NULL, DIAG_ARRAY_ VARCHAR2(6) NOT NULL, DIAG_ARRAY_ VARCHAR2(6) NOT NULL, DIAG_ARRAY_ VARCHAR2(6) NOT NULL, DIAG_ARRAY_ VARCHAR2(6) NOT NULL, DIAG_ARRAY_ VARCHAR2(6) NOT NULL, DIAG_EX_ARRAY_1 VARCHAR2(2) NOT NULL, DIAG_EX_ARRAY_2 VARCHAR2(2) NOT NULL, DIAG_EX_ARRAY_3 VARCHAR2(2) NOT NULL, DIAG_EX_ARRAY_4 VARCHAR2(2) NOT NULL, DIAG_EX_ARRAY_5 VARCHAR2(2) NOT NULL, DRUG_ARRAY_ VARCHAR2(12) NOT NULL, DRUG_ARRAY_ VARCHAR2(12) NOT NULL, DRUG_ARRAY_ VARCHAR2(12) NOT NULL, DRUG_ARRAY_ VARCHAR2(12) NOT NULL, DRUG_ARRAY_ VARCHAR2(12) NOT NULL, LC_ARRAY_ VARCHAR2(4) NOT NULL, LC_ARRAY_ VARCHAR2(4) NOT NULL, LC_ARRAY_ VARCHAR2(4) NOT NULL, LC_ARRAY_ VARCHAR2(4) NOT NULL, LC_ARRAY_ VARCHAR2(4) NOT NULL, MEM_REVIEW VARCHAR2(4) NOT NULL, OP# VARCHAR2(4) NOT NULL, PROC_ARRAY_ VARCHAR2(6) NOT NULL, PROC_ARRAY_ VARCHAR2(6) NOT NULL, PROC_ARRAY_ VARCHAR2(6) NOT NULL, PROC_ARRAY_ VARCHAR2(6) NOT NULL, PROC_ARRAY_ VARCHAR2(6) NOT NULL, PROV_ARRAY_ VARCHAR2(12) NOT NULL, PROV_ARRAY_ VARCHAR2(12) NOT NULL, PROV_ARRAY_ VARCHAR2(12) NOT NULL, PROV_ARRAY_ VARCHAR2(12) NOT NULL, PROV_ARRAY_ VARCHAR2(12) NOT NULL, REC_TYPE VARCHAR2(2) NOT NULL, SP_ARRAY_ VARCHAR2(4) NOT NULL, SP_ARRAY_ VARCHAR2(4) NOT NULL, SP_ARRAY_ VARCHAR2(4) NOT NULL, SP_ARRAY_ VARCHAR2(4) NOT NULL, SP_ARRAY_ VARCHAR2(4) NOT NULL, TRANSCODE VARCHAR2(2) NOT NULL, TT_ARRAY_ VARCHAR2(4) NOT NULL, TT_ARRAY_ VARCHAR2(4) NOT NULL, TT_ARRAY_ VARCHAR2(4) NOT NULL, TT_ARRAY_ VARCHAR2(4) NOT NULL, TT_ARRAY_ VARCHAR2(4) NOT NULL, VOID VARCHAR2(2) NOT NULL, YMDEFF VARCHAR2(8) NOT NULL, YMDEND VARCHAR2(8) NOT NULL, YMDTRANS VARCHAR2(8) NOT NULL, PRIORITY VARCHAR2(2) NOT NULL );

61 Functional Dependency
Notation: R.X => R.Y Definition: Attribute Y of Relation R is functionally dependent on the Attribute X of Relation R when there is each value of R.Y associated with no more than one value of R.X. R.X and R.Y may be composite attributes. Description: R .Y is functionally dependent on R.X R.X functionally determines R.Y

62 Full & Partial Dependency
R.A => R.B If B is not functionally dependent on any subset of A (other than A itself), B is fully dependent on A in R. If B is functionally dependent on a subset of A (other than A itself), B is partially dependent on A in R.

63 First Normal Form (1NF) A relation R is in the first normal form (1NF) if and only if all attributes of any tuple in R contain only atomic values. Normalization Process: Remove Partial Functional Dependencies If R is in 1NF and has a composite primary key (K1,K2), an attribute P is functionally dependent on K1 (K1 => P) (i.e., P is partially dependent on (K1, K2)), the normalization process is: The attribute P should be removed from R and a new relation will be created with K1 as the primary key and P as a non-key attribute. A relation that is in 1NF and not in 2NF must have a composite primary key. Example Supplier-Part relation has attributes supplier#, part#, qty, city, distance, where (supplier#, part#) is the key. City is partially dependent on supplier#. SUPPLIER-PART (supplier#, part#, qty, city, distance) ò SUPPLIER-PART (supplier#, Part#, qty) SUPPLIER (supplier#, city, distance)

64 Non-loss Decomposition
Normalization is a reduction (decomposition) process that replaces a relation by suitable projections. Each of the projection is a new relation that is in a further normalized form than the original relation. The collection of projections is equivalent to the original relation. The original relation can always be recovered by taking the natural join of these projections. Any information that can be derived from the original relation can also be derived from the further normalized relations. The converse is not true. The process is reversible because no information is loss in the reduction process.

65 Transitive Dependency
In a relation R, if R.A =>R.B and R.B => R.C then attribute C is said to be transitively dependent on attribute A.

66 Second Normal Form (2NF)
A relation R is in the second normal form (2NF) if and only if it is in 1NF and every non-key attribute is fully dependent on the primary key. Normalization Process: Remove Transitive Dependencies If R is in 2NF and has two non-key attributes A1 and A2 where A2 is functionally dependent on A1 (A1 => A2). The A2 should be removed from R and a new relation will be created with A1 as the primary key and A2 as a non-key attribute. Example Supplier relation has attributes supplier#, city, distance, where supplier# is the key and distance to a supplier can be determined by the city of the supplier. SUPPLIER (supplier#, city, distance, quality_level) ò SUPPLIER (Supplier#, city, quality_level) CITY-DISTANCE (city, distance)

67 Third Normal Form (3NF) A relation R is in the third normal form (3NF) if and only if the non-key attributes (if there is any) are fully dependent on the primary key of R (i.e., R is in its 2NF) and are mutually independent. Heuristic to Check Whether a Relation Is in 3NF All the non-key attributes (which are not multi-value attributes) are dependent on the (primary) key, the whole key, and nothing but the key. Explanation All the non-key attributes have atomic value and dependent on the key (1NF - No multi-value attribute), the whole key, (2NF - No Partially Functional Dependency) and nothing but the key (3NF - No Transitive Functional Dependency)

68 Normalization Process
Unnormalized Form A B C D E F G H remove repeating groups 1NF 2NF F G H A B C D E A remove partial dependencies remove transitive dependencies 3NF 3NF 3NF D E 3NF A F G F H A B C D

69 Normalization: Pros and Cons
Reduce data redundancy & space required Enhance data consistency Enforce data integrity Reduce update cost Provide maximum flexibility in responding ad hoc queries Cons Many complex queries will be slower because joins have to be performed to retrieve relevant data from several normalized tables Programmers/users have to understand the underlying data model of an database application in order to perform proper joins among several tables The formulation of multiple-level queries is a nontrivial task.

70 Join Two Tables SELECT Categories.CategoryName, Products.ProductName
FROM Categories, Products WHERE Products.CategoryID = Categories.Category ID

71 Tables in Relational DB
Identify Primary Keys and Foreign Keys in the following Tables!!! ID ID

72 Join Tables SELECT Orders.OrderID, Orders.CustID, LastName, Firstname, Orders.ItemID, Description FROM Customer, Orders, Inventory WHERE Customer.CustID = Orders.CustID AND Orders.ItemID = Inventory.ItemID ORDER BY CustID, Orders.ItemID ID ID

73 Foreign Keys & Primary Keys in a Sample Access Database

74 An Example of a Complex Query
Please list name and phone number of customers who have ordered product number 007. SELECT customer_name, customer_phone FROM customer WHERE customer_number IN SELECT customer_number FROM order WHERE order_no IN SELECT order_no FROM orderItem WHERE product_number = 007

75 Denormalization The process of intentionally backing away from normalization to improve performance. Denormalization should not be the first choice for improving performance and should only be used for fine tuning a database for a particular application. Requirements Prior normalization Knowledge of data usage Benefits Minimize the need for joins Reduce number of tables Reduce number of foreign keys Reduce number of indices Knowledge of Data Usage How often are two data items needed together How many rows are involved How volatile is denormalized data How important is visibility of data to users What is the minimum response time and frequency of an query

76 De-normalization: An Example
JOIN R1 R2 Denormalization R1 * R 2 R2 Where: R1 (ProductNo, SupplierNo, Price) R2 (SupplierNo, Name, Address, Phone) R1*R2 (ProductNo, SupplierNo, Name, Address, Phone, Price) R2 should be kept to prevent data loss. Data redundancy in R1*R2 and R2 could cause potential data inconsistency problems if the redundant data in these two tables are not maintained properly.

77 Data Model Refinement and Transformation
Associative Entity Type Removing Many-to-Many Relationships Keys Transformation to Relational Databases

78 Refinement of a Data Model: Analysis and Simplification
Isolated Entity Type Solitary Entity Type One-to-One Relationship Redundant Relationship Multi-Valued Attributes Attribute with Attributes Many-to-Many Relationship

79 Isolated Entity Type An Entity Type that does not participate in a Relationship. Since every Entity Type should participate in at least one Relationship, there exist two alternatives: Identify a relevant Relationship Remove the Entity Type from the model

80 Solitary Entity Type An Entity Type that has only one Entity Instance. Examples: Computer Center, Sales Tax, and Current Order Number. Solitary Entity Types may be too restrictive. Alternatives: Introduce another Entity Type with a wider scope. Computer Center ==> Organization Unit Define it as an Attribute of an Entity Type. Sales Tax ==> Sales Tax of Order Define it as a data element in an parameter table. A parameter table has only one row. Current Order Number ==> Current Order Number of Parameter Table

81 Evaluate One-to-One Relationship
It may be an unnecessary relationship between two Entity Types if they have the same attribute and relationships (i.e., they are identical). It should be then combined into one Entity Type. Maybe Incorrect becomes Purchase Request Purchase Order has request Correct Purchase Order

82 Redundant Relationship
Is this relationship redundant? has ordered product customer is ordered by places ORDERS is placed by has contains order item order is part of Differences in timing of an entity type in its life cycle: Implemented as separate entity types or use subtypes Use value of attributes or additional attributes to differentiate them

83 Redundant Relationship
Product Warehouse stocks is held as holds Stock contains is held in Non-redundant is contained in is contained in Product Order Line Order contains contains is placed by places is contained in is contained in Order History Customer contains contains

84 Multi-Valued Attribute
Definition An Attribute that may have more than one value at a time is called a multi-valued attribute. Solution: Create an Entity Type for the multi-valued attribute Example: Languages spoken by an Employee Employee(ID, Name, Phone, Languages) Employee(111, “John Smith”, , (English, Chinese)) Employee(ID, Name, Phone) Employee(111, “John Smith”, ) Employee_language(ID, Language) Employee_language(111, English) Employee_language(111, Chinese)

85 Attribute with Attributes
An Attribute that can be described by other Attributes is called an attribute with attributes. Example: College Degree by an Employee (John Smith has a College Degree in Computer Sciences from George Mason University) Solution: Create an Entity Type to avoid an Attribute with Attributes. Add new attributes to the existing Entity Type.

86 Associative Entity Type
An Associative Entity Type is an Entity Type whose existence is meaningful only if it participates in several (>=2) Relationship Types at the same time. Associative Entity Types are often introduced to represent additional information in many-to-many Relationships or to decompose a many-to-many Relationship into two one-to-many Relationships. Associative Entity Types are also used to represent n-ary Relationships in a binary data model.

87 Remove Many-to-Many Relationship
Given contains Order Product belongs-to Why? There is no place to attach Attributes that are required to describe a many-to-many Relationship. It is difficult to translate many-to-many Relationships into relational tables automatically. How? A many-to-many relationship can be decomposed into two one-to-many Relationships by creating an Associative Entity Type between the existing two Entity Types. contains has Order Order Line Product belongs to is contained in

88 Remove Many-to-Many Relationships: Exercises
Remove the many-to-many relationship from the following ER diagrams (a) has-sources Product Supplier offers (b) takes Student Course is-taken-by (c) consists-of Part is-contained-in

89 Bills of Material 1 2 Product Structure 3 1 2 2
Part C consists-of is-a-component-in B 1 2 D E D F Product Structure 3 1 2 2 Product-Structure(Parent Part No, Child Part No, Quantity) A B 2 A C 1 B D 1 B E 3 C D 2 C F 2

90 Using an Associative Entity Type to Represent an N-ary Relationship
involved in product usage involved in product usage Product Project involved in product usage Supplier Product Usage is an Associative Entity Type for a 3-ary Relationship. is used in uses Product Product Usage Project supplies Supplier

91 Translate Data Models to Relational Tables
Given contains has Order Order Line Product belongs to is contained in Key: Order# Attribute: Order date Customer ID Sale Person ID Key: Order#+Product# Attribute: Quantity Unit Price Key: Product# Attribute: Description Qty-on-hand Unit Price Relational Tables Created CREATE TABLE ORDER (OrderNo CHAR(10) NOT NULL, OrderDate DATE, CustomerID CHAR(10), SalePersonID CHAR(10));

92 Transformation of Data Models to Relational Database Tables
The entire, or part of, a data (entity-relationship) model can be translated into a normalized database design. Objects Created At most one relational database One or more relations (tables) Data structures (DDL) representing the elements (attributes) and the primary key of each relation Data type of each data elements

93 Heuristics of Transformation
A table is created for each Entity Type in the ER diagram. A table is created for each multi-valued attribute. Relationship Types are implemented as tables or as foreign keys in other tables. Many-to-many relationship types are translated into tables. Foreign keys are used for implementing one-to-one and one-to-many Relationship Types. For one-to-many Relationship Types, the foreign key is placed in the table that represents the Entity Type on the "many" end of the Relationship Type. For identifying one-to-many Relationship Types, the PK of the "one" table migrate to the "many" table as a FK and the FK is also part of the PK of the "many" table. For non-identifying one-to-many Relationship Types, the PK of the "one" table migrate to the "many" table as a FK and the FK is a non-key attribute of the "many" table.

94 Auction Web Site's Data Model
Auction Web Site's Data Model

95 A Data Model for an Electronic Commerce Application
dept_id = parent_id sku = sku pfid = pfid shopper_id = shopper_id order_id = order_id dept_id = dept_id basket shopper_id char(32) date_changed datetime marshalled_order image dept dept_id int parent_id name varchar(255) description text product_attribute pfid varchar(30) attribute_id tinyint attribute_index attribute_value varchar(20) product_family manufacturer_id short_description long_description image_filename intro_date list_price monogramable product_variant sku attribute0 attribute1 attribute2 attribute3 attribute4 promo_cross related_pfid promo_price promo_name promo_type promo_description promo_rank active date_start date_end shopper_all shopper_column varchar(64) shopper_op varchar(2) shopper_value cond_all cond_column cond_op cond_value cond_basis char(1) cond_min award_all award_column award_op award_value award_max disjoint_cond_award disc_type disc_value real promo_upsell varchar(30) related_pfid varchar(30) description varchar(255) receipt_item shopper pfid varchar(30) shopper_id char(32) sku int created datetime order_id char(26) name varchar(235) row_id int password varchar(20) quantity int street varchar(50) adjusted_price int city varchar(50) state varchar(30) zip varchar(15) receipt country varchar(20) order_id char(26) phone varchar(16) shopper_id char(32) varchar(50) total int status tinyint date_entered datetime date_changed datetime marshalled_receipt image

96 Attribute 0 of pfid 14 is size and the attribute value 1 is Grande and 2 is Tall and 3 is Short

97 Web-based Build-To-Order Application

98 Data Model for Build-To-Order Application


Download ppt "Data Modeling and Database Design"

Similar presentations


Ads by Google