Presentation is loading. Please wait.

Presentation is loading. Please wait.

Entity-Relationship Modelling, Database design Normalisation

Similar presentations


Presentation on theme: "Entity-Relationship Modelling, Database design Normalisation"— Presentation transcript:

1 Entity-Relationship Modelling, Database design Normalisation
DST Revision – Week 1 Entity-Relationship Modelling, Database design Normalisation

2 Assessed Components Assignment – due next week
Exam – Jan 21st Two hours 30 multiple choice questions (1 mark each) Two long questions (35 marks each)

3 A tool for Conceptual Data Modelling
ER Diagrams A tool for Conceptual Data Modelling

4 An Entity-Relationship Diagram

5 What’s wrong with this?

6 Discovering Data Entities
Never confuse data entities with other elements of the problem to be solved A true data entity will have many possible instances, each with a distinguishing characteristic Treasurer is the person entering data – and data about the treasurer has nothing whatsoever to do with this problem Is the expense report entity necessary? No - it is only the result of extracting data from the database. Even though there will be multiple instances of expense reports given to the treasurer over time, data needed to compute the report contents each time are already represented by the ACCOUNT and EXPENSE entity types “Gives-to” and “Receives” are business activities, not relationships between entities.

7 The Correct E-R Model

8 Attributes and Weak Entities

9 Strong & Weak Entities Most entities are classified as strong entity types [Rectangle] – ones that exist independently from other entity types (such as EMPLOYEE) These always have a unique characteristic - an attribute or combination of attributes - that uniquely distinguish each occurrence of that identity A weak entity type [[Double Rectangle]] depends on some other entity type. It has no meaning in the ER diagram without the entity on which it depends (such as DEPENDENT) The entity type on which the weak entity type depends is called the Identifying owner (or owner for short).

10 Weak entities The Identifying relationship is the relationship between a weak entity type and and its owner (such as ‘Has’ in the previous slide) The weak entity identifier is its partial identifier (double underline) combined with that of its owner. During a later design stage dependent name will be combined with Employee_ID (the identifier of the owner) to form a full identifier for DEPENDENT.

11 Attributes An attribute is a property or characteristic of an entity type, for example the entity EMPLOYEE may have attributes Employee_Name and Employee_Address. In ER diagrams (drawn in this way) place the attribute name in an ellipse with a line connecting it to its associated entity Attributes may also be associated with relationships An attribute is associated with exactly one entity or relationship

12 A Composite Attribute

13 Simple and Composite Attributes
Some attributes can be broken down into meaningful component parts, such as Address, which can be broken down into Street_Address, Town, Postcode... etc. The component attributes may appear above or below the composite attribute on an ER diagram Provide flexibility to users, you can refer to it as a single unit or to the individual components A simple (atomic) attribute is one that cannot be broken down into smaller components

14 An Entity with a Multivalued attribute (Skill) and a derived attribute (Years_Employed)

15 Multivalued Attributes
An attribute that may have more than one value for a given instance, e.g. EMPLOYEE may have more than one Skill. A multivalued attribute is one that may take on more than one value – it is represented by an ellipse with double lines

16 Derived Attributes Some attribute values can be calculated or derived from others e.g., if Years_Employed needs to be calculated for EMPLOYEE, it can be calculated using Date_Employed and Today's_Date A derived attribute is one whose value can be calculated from related attribute values (plus possibly other data not in the database) A derived attribute is signified by an ellipse with a dashed line (see previous Fig.)

17 Simple Identifier attribute (Key)

18 Identifier Attribute Identifier attribute or Key is an attribute (or combination of attributes) that uniquely identifies individual instances of an entity type, such as Student_ID To be a candidate identifier, each entity instance must have a single value for the attribute, and the attribute must be associated with each entity The identifier attribute is underlined, such as Student_ID

19 Composite Identifier A Composite Identifier is when there is no single (or atomic attribute) that can serve as an identifier For example, in a database that tracks flights, Flight_ID is a composite identifier that has component attributes Flight_Number and Date – this combination is required to uniquely identify individual occurrences of Flight Flight_ID is underlined, whilst its components are not (see next slide)

20 Composite key attribute

21 Criteria for selecting identifiers
Some entities have more than one candidate identifier, so the following criteria should be used: Choose identifier that will not change in value over the life of each instance of the entity type Choose identifier that is guaranteed to have valid values and Will not be null (or unknown). If composite, make sure all parts will have valid values

22 Selecting identifiers
Don’t pick identifiers whose structure indicates classifications, locations or people that might change. e.g. the first two digits of an identifier may indicate a warehouse location, but such codes are often changed as conditions change, rendering them invalid. Consider substituting new, simple identifiers for long, composite ones, e.g. an attribute called Game_Number could be used for the entity type GAME instead of Home_Team and Away_Team

23 Relationships A relationship is an association among the instances of one or more entity types that is of interest to the organisation Relationship Type is a meaningful association between (or among) entities – implying that the relationship allows us to answer questions that could not be answered given only the entity types. It is denoted by a diamond symbol

24 Relationship types and instances
(a) Relationship type (Completes)

25 Attributes on relationships
Attributes may be associated with a relationship, as well as with an entity For example an organisation may want to record the date when an employee completes each course In the following diagram, the relationship ‘Completes’ joins the EMPLOYEE and COURSE entities, and Date_Completed is joined to this as it is a property of the relationship ‘Completes’

26 Attribute on a Relationship

27 Associative entities (gerunds)
The presence of one or more attributes on a relationship suggests that the relationship should be represented as an entity type An associative entity is an entity type that associates the instances of one or more entity types and contains attributes that are specific to the relationship between those entity instances. The associative entity type CERTIFICATE is represented with the diamond relationship symbol enclosed within the entity rectangle

28 Associative Entities The following figure shows the relationship ‘Completes’ converted to an associative entity type. A CERTIFICATE is awarded to each EMPLOYEE who completes a COURSE, each certificate has a Certificate_Number that serves as the identifier

29 (b) An associative entity (CERTIFICATE)

30 Associative Entities (gerunds)
The purpose of the special symbol is to preserve the information that the entity was initially specified as a relationship on the ER diagram There is no relationship diamond on the line between an associative entity and a strong entity, because the associative entity represents the relationship

31 Associative entities How do you know when to convert a relationship to an associative entity type? Four conditions should exist: All of the relationships are ‘many’ relationships The resulting associative identity type has independent meaning to end-users, and can preferably be identified with a single-attribute identifier The associative entity has one or more attributes in addition to the identifier The associative entity participates in one or more relationships independent of the entities related in the associated relationship

32 Degree of a Relationship
Is the number of entity types that participate in it. Thus ‘Completes’ has degree 2, since there are two participating entity types, EMPLOYEE and COURSE The three most common relationship degrees are unary (degree 1), binary (degree 2) and ternary (degree 3) Higher degree relationships are possible but rarely encountered in practice

33 Unary relationship Is between the instances of a single entity type (also called recursive relationships) ‘Is_Married_To’ is a one-to-one relationship between instances of the PERSON entity type ‘Manages’ is a one-to-many relationship between instances of the EMPLOYEE entity type

34 Binary relationships Between the instances of two entity types, and is the most common type of relationship encountered in data modelling. e.g. (one-to-one) an EMPLOYEE is assigned one PARKING_PLACE, and each PARKING_PLACE is assigned to one EMPLOYEE e.g. (one to many) a PRODUCT_LINE may contain many PRODUCTS, and each PRODUCT belongs to only one PRODUCT_LINE e.g. (many-to-many) a STUDENT may register for more than one COURSE, and each COURSE may have many STUDENTS

35 Ternary relationships
A ternary relationship is a simultaneous relationship among the instances of three entity types Let’s see this in an E-R diagram

36 Ternary Relationships

37 Ternary Relationships
Vendors can supply various parts to warehouses The relationship ‘Supplies’ is used to record the specific PARTs supplied by a given VENDOR to a particular WAREHOUSE There are two attributes on the relationship ‘Supplies’, Shipping_Mode and Unit_Cost e.g. one instance of ‘Supplies might record that VENDOR X can ship PART C to WAREHOUSE Y, that the Shipping_Mode is ‘next_day_air’ and the Unit_Cost is £5-00 per unit

38 Ternary relationships
We do not use diamond symbols on the lines from SUPPLY_SCHEDULE to the three entities, because these lines do not represent binary relationships It is recommended that all ternary (or higher) relationships are converted into associative entities (as in the slide), as it makes the representation of participation constraints (discussed later) easier Many CASE tools cannot represent ternary relationships, so you must represent the ternary relationship with an associative entity and three binary relationships

39 Cardinality Constraints
The number of instances of one entity that can or must be associated with each instance of another entity. If we have two entity types A and B, the cardinality constraint specifies the number of instances of entity B that can (or must) be associated with entity A e.g. a video store may stock more than one VIDEOTAPE for each MOVIE, this is a ‘one-to-many’ relationship.

40 Cardinality Constraints
(a) Basic relationship

41 (b) Relationship with cardinality constraints

42 Minimum Cardinality The minimum cardinality of a relationship is the minimum number of instances of an entity B that may be associated with each instance of an entity A In our example, the minimum number of VIDEOTAPES of a MOVIE is zero (entity B is an optional participant in the ‘Is_Stocked_As’ relationship) This is signified by the symbol zero through the arrow near the VIDEOTAPE entity.

43 (b) Relationship with cardinality constraints

44 Maximum cardinality Is the maximum number of instances of an entity B that may be associated with each instance of entity A In the following slide the maximum cardinality for the VIDEOTAPE entity type is ‘many’ (an unspecified number greater than 1) This is indicated by the ‘crow’s foot’ symbol on the arrow next to the VIDEOTAPE entity symbol

45 (b) Relationship with cardinality constraints

46 Mandatory one cardinality
Relationships are bi-directional, so there is also cardinality notation next to the MOVIE entity Notice that as the minimum and maximum are both one, this is called mandatory one cardinality (i.e., each VIDEOTAPE of a MOVIE must be a copy of exactly one movie) VIDEOTAPE is represented as a weak entity because it cannot exist unless the original owner movie also exists

47 Mandatory one cardinality
The identifier of the MOVIE is ‘Movie_Name’ VIDEOTAPE does not have a unique identifier, however the partial identifier Copy_Number together with Movie_Name would uniquely identify an instance of VIDEOTAPE

48 Example of mandatory cardinality constraints
Each PATIENT has one or more PATIENT_HISTORIES (the initial patient visit is always recorded as an instance of PATIENT_HISTORY) Each instance of PATIENT_HISTORY ‘Belongs to’ exactly one PATIENT (see following Fig.)

49 Mandatory cardinalities

50 Example of one optional, one mandatory cardinality constraint
EMPLOYEE Is_Assigned_To PROJECT Each PROJECT has at least one EMPLOYEE assigned to it (some projects have more than one) Each EMPLOYEE may or (optionally) may not be assigned to any existing PROJECT, or may be assigned to one or more PROJECTs (see following Fig.)

51 One optional, one mandatory cardinality

52 An example using a ternary relationship
PART and WAREHOUSE are mandatory participants in the relationship, whilst VENDOR is an optional participant The cardinality of each of the participating entities is mandatory one, since each SUPPLY_SCHEDULE instance must be related to exactly one instance of each of these participating entity types

53 An example using a ternary relationship
Each VENDOR can supply many PARTs to any number of WAREHOUSES, but need not supply any parts Each PART can be supplied by any number of VENDORs to more than one WAREHOUSE, but each part must be supplied by at least one vendor to a warehouse Each WAREHOUSE can be supplied with any number of PARTS from more than one VENDOR, but each warehouse must be supplied with at least one part

54 Cardinality constraints in a ternary relationship

55 An example using a ternary relationship
A ternary relationship is not equivalent to three binary relationships Unfortunately you cannot draw ternary relationships with many CASE tools Instead you must represent ternary relationships as three binaries If you are forced to do this, then do not draw the binary relationships with diamonds and make sure the cardinality next to the three strong entities are mandatory one

56 Multiple Relationships
An organisation may want to model more than one relationship between the same entity types The following figure shows two relationships between PROFESSOR and COURSE The relationship Is_Qualified associates professors with the courses they are qualified to teach A given course may have more than one person qualified to teach it, or (optionally) may not have any qualified instructors (such as a new course) Each professor should be qualified to teach at least one course (we hope!)

57 Multiple Relationships
The second relationship in this figure associates professors with the courses they actually teach during a given semester (where the maximum cardinality for a given semester is 4) This shows how a fixed constraint (upper or lower) can be recorded The attribute ‘Semester’ (which could be a composite attribute with components ‘Semester_Name’ and ‘Year’) is on the relationship Is_Scheduled)

58 (b) Professors and courses (fixed upon constraint)

59 Review of Basic E-R Notation

60

61 Building Relational Databases

62 Database Design Steps 1) Determine the purpose of the database
What is it going to do? What data do you need to collect? What information do you want to be able to extract? 2) What tables and fields do you need? What fields belong in each table? What properties does each field need? How are the tables going to be related? What are the Primary Keys that link the tables 3) Build the tables Enter test data Test/review Revise

63 Basic Design Rules of Relational Databases
Unique Field Names Keep field names unique across tables, and keep them as clear as possible in each table. No Calculated or Derived Fields Calculations and derivations can be performed in Queries, Forms and Reports. Doing them in a table only increases the chance of data entry error.

64 Basic Design Rules of Relational Databases
Data is broken down into Smallest Logical Parts Smallest “Sortable” parts. Remember it’s much easier to put fields together than pull them apart. Unique Records Each of your tables should have unique records. We ensure this by setting one field to be a Primary Key. This can be a unique datum or an AutoNumber.

65 One to Many Relationships
One to Many relationships are the most common relationships. One Birdfeeder is visited by Many Birds One Garden contains Many Birdfeeders One Patient has Many Prescriptions One Hospital has Many Patients One Student attends Many Classes A record MUST be in the One table in order to appear in the Many table.

66 One to Many Relationships
Primary Key linked to Foreign Key 1 Medical Record # Patients Prescription Number Medications

67 One to One Relationships
One to One relationships can often be combined into a single table. One Garden has One Address One Patient has One Home Phone Number One Student has One Student ID

68 Many to Many Relationships
Many to Many relationships are also very common. Many Students are taught by Many Teachers Many Patients see Many Doctors Many Medications are taken by Many Patients Many Customers buy Many Products You cannot handle these using an RDBMS!

69 Many to Many Relationship
Sales Database CUSTOMERS Customer ID First Last Address City State Zip PRODUCTS Product ID Product Supplier Description Units Cost Price Many to Many Relationship

70 Sales Database   1 One Customer can have many sales 1
CUSTOMERS Customer ID First Last Address City State Zip PRODUCTS Product ID Product Supplier Description Units Cost Price 1 One Customer can have many sales 1 One Product can be sold many times SALES Sales ID Customer Product Date Quantity

71 Examples  1 One Kind of Medication can be taken by Many Patients
Patient Meds PM ID Patient ID Med ID Dosage Directions 1 One Patient can take many Medications One Kind of Medication can be taken by Many Patients Patients Patient ID First Last Address City State Zip Medications Med ID Medication Description

72

73 Transforming E-R diagrams into Relational Databases

74 Relational Database A database modelled using:
Relations (properly formed tables) Relationships between the Relations

75 NOTE: all relations are in 1st Normal form
Definition: A relation is a named, two-dimensional table of data Table consists of rows (records), and columns (attribute or field) Requirements for a table to qualify as a relation: It must have a unique name. Every attribute value must be atomic (not multivalued, not composite) Every row must be unique (can’t have two rows with exactly the same values for all their fields) Attributes (columns) in tables must have unique names The order of the columns must be irrelevant The order of the rows must be irrelevant NOTE: all relations are in 1st Normal form

76 Correspondence with E-R Model
Relations (tables) correspond with entity types and with many-to-many relationship types Rows correspond with entity instances and with many-to-many relationship instances Columns correspond with attributes

77 Key Fields Keys are special fields that serve two main purposes:
Primary keys are unique identifiers of the relation in question. Examples include employee numbers, social security numbers, etc. This is how we can guarantee that all rows are unique Foreign keys are identifiers that enable a dependent relation (on the many side of a relationship) to refer to its parent relation (on the one side of the relationship) Keys can be simple (a single field) or composite (more than one field) Keys usually are used as indexes to speed up the response to user queries

78 Foreign Key (implements 1:N relationship between customer and order)
Primary Key Foreign Key (implements 1:N relationship between customer and order) Combined, these are a composite primary key (uniquely identifies the order line)…individually they are foreign keys (implement M:N relationship between order and product)

79 Constraints Reduce the chance that users will enter incorrect data
Domain Constraints The allowable values for an attribute (types, lengths etc). Assist with the integrity of the entity No primary key attribute may be null. All primary key fields MUST have data

80 Domain definitions enforce domain integrity constraints

81 Integrity Constraints
Referential Integrity – rule that states that any foreign key value (on the relation of the many side) MUST match a primary key value in the relation of the one side. (Or the foreign key can be null) For example: Delete Rules Restrict – don’t allow delete of “parent” side if related rows exist in “dependent” side Cascade – automatically delete “dependent” side rows that correspond with the “parent” side row to be deleted Set-to-Null – set the foreign key in the dependent side to null if deleting from the parent side  not allowed for weak entities

82 Figure 5-5: Referential integrity constraints (Pine Valley Furniture) Referential integrity constraints are drawn via arrows from dependent to parent table

83 Referential integrity constraints are implemented with foreign key to primary key references

84 Transforming E-R Diagrams into Relations
Mapping Regular Entities to Relations Simple attributes: E-R attributes map directly onto the relation Composite attributes: Use only their simple, component attributes Multivalued Attribute - Becomes a separate relation with a foreign key taken from the superior entity

85 Mapping a regular entity
(a) CUSTOMER entity type with simple attributes (b) CUSTOMER relation

86 Figure 5-9: Mapping a composite attribute
(a) CUSTOMER entity type with composite attribute (b) CUSTOMER relation with address detail

87 Figure 5-10: Mapping a multivalued attribute
Multivalued attribute becomes a separate relation (Table) with foreign key (b) 1–to–many relationship between original entity and new relation

88 Transforming ER Diagrams into Relations (cont.)
Mapping Binary Relationships One-to-Many - Primary key on the one side becomes a foreign key on the many side Many-to-Many - Create a new relation with the primary keys of the two entities as its primary key One-to-One - Primary key on the mandatory side becomes a foreign key on the optional side

89 Figure 5-12a: Example of mapping a 1:M relationship
Relationship between customers and orders Note the mandatory one

90 Figure 5-12b Mapping the relationship
Again, no null value in the foreign key…this is because of the mandatory minimum cardinality Foreign key

91 Figure 5-13a: Example of mapping an M:N relationship
E-R diagram (M:N) The Supplies relationship will need to become a separate relation

92 New intersection relation
Figure 5-13b Three resulting relations Composite primary key New intersection relation Foreign key

93 Figure 5-14a: Mapping a binary 1:1 relationship
In_charge relationship

94 Figure 5-14b Resulting relations

95 Transforming ER Diagrams into Relations (cont.)
Mapping Associative Entities Identifier Not Assigned Default primary key for the association relation is composed of the primary keys of the two entities (as in M:N relationship) Identifier Assigned It is natural and familiar to end-users Default identifier may not be unique

96

97

98 Figure 5-16a: Mapping an associative entity with an identifier

99 Figure 5-16b Three resulting relations

100 Transforming ER Diagrams into Relations (cont.)
Mapping Unary Relationships One-to-Many - Recursive foreign key in the same relation Many-to-Many - Two relations: One for the entity type One for an associative relation in which the primary key has two attributes, both taken from the primary key of the entity

101 Figure 5-17: Mapping a unary 1:N relationship
(a) EMPLOYEE entity with Manages relationship (b) EMPLOYEE relation with recursive foreign key

102 Figure 5-18: Mapping a unary M:N relationship
(a) Bill-of-materials relationships (M:N) (b) ITEM and COMPONENT relations

103

104 Normalisation

105 Data Normalization Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of data The process of decomposing relations with anomalies to produce smaller, well-structured relations

106 Well-Structured Relations
A relation that contains minimal data redundancy and allows users to insert, delete, and update rows without causing data inconsistencies Goal is to avoid anomalies Insertion Anomaly – adding new rows forces user to create duplicate data Deletion Anomaly – deleting rows may cause a loss of data that would be needed for other future rows Modification Anomaly – changing data in a row forces changes to other rows because of duplication A table should not contain more than one entity type

107 Example – Figure 5.2b Question – Is this a relation?
Answer – Yes: unique rows and no multivalued attributes Question – What’s the primary key? Answer – Composite: Emp_ID, Course_Title

108 Anomalies in this Table
Insertion – can’t enter a new employee without having the employee take a course Deletion – if we remove employee 140, we lose information about the existence of a Tax Acc class Modification – giving a salary increase to employee 100 forces us to update multiple records Why do these anomalies exist? Because there are two entity types into a single relation. This results in duplication, and an unnecessary dependency between the entities

109 Functional Dependencies and Keys
Candidate Key: A unique identifier. One of the candidate keys will become the primary key E.g. perhaps there is both credit card number and NI number in a table…in this case both are candidate keys Each non-key field should be functionally dependent on the primary key

110 Steps in normalization

111 First Normal Form No multivalued attributes
Every attribute value is atomic The next slide is not in 1st Normal Form (multivalued attributes) therefore it is not a relation All relations are in 1st Normal Form

112 Table with multivalued attributes, not in 1st normal form
NOT a relation – just a table!

113 Table with no multivalued attributes and unique rows, in 1st normal form
This is a relation, but not a well-structured one

114 Anomalies in this Table
Insertion – if new product is ordered for order 1007 of existing customer, customer data must be re-entered, causing duplication Deletion – if we delete the Dining Table from Order 1006, we lose information concerning this item's finish and price Update – changing the price of product ID 4 requires update in several records Why do these anomalies exist? Because there are multiple entity types into a single relation. This results in duplication, and an unnecessary dependency between the entities

115 Second Normal Form 1NF PLUS every non-key attribute is functionally dependent on the ENTIRE primary key Every non-key attribute must be defined by the entire key, not by only part of the key

116 Therefore, NOT in 2nd Normal Form
Order_ID  Order_Date, Customer_ID, Customer_Name, Customer_Address Customer_ID  Customer_Name, Customer_Address Product_ID  Product_Description, Product_Finish, Unit_Price Order_ID, Product_ID  Order_Quantity Therefore, NOT in 2nd Normal Form

117 Getting it into Second Normal Form
Partial Dependencies are removed, but there are still transitive dependencies

118 Third Normal Form 2NF PLUS no transitive dependencies (functional dependencies on non-primary-key attributes) Note: this is called transitive, because the primary key is a determinant for another attribute, which in turn is a determinant for a third Solution: non-key determinant with transitive dependencies go into a new table; non-key determinant becomes primary key in the new table and stays as foreign key in the old table

119 Getting it into Third Normal Form
Transitive dependencies are removed

120 Make sure you know how to…
Draw Entity-Relationship diagrams from a written specification. Design a database schema using a E-R diagram Normalise a database schema to the Third Normal Form

121 Make sure you understand and can discuss(1)...
Entities – Strong, Weak, Associative Relationships First, second and third degree (unary, binary, tertiary) One to one, one to many and many to many relationships Primary and foreign keys Attributes Simple (atomic) and Composite Derived Multivalued Identifier attributes (keys) and how to choose them

122 Make sure you understand and can discuss(2)...
Integrity constraints – Domain and referential Anomalies Insertion, deletion and modification Normalisation – 1st, 2nd and 3rd forms

123 Next week Revise SQL


Download ppt "Entity-Relationship Modelling, Database design Normalisation"

Similar presentations


Ads by Google