Maturity DB Process Design Stage Review Logical Design Physical Design DDL Script Review Coding Unit Test Integration Test Evaluation Stress Test Production
Design decide the system quality Design Stage Coding Stage Testing Stage Production
Design Stage Logical Design Physical Design Maintain Plan
I Logical Design
Data Model What is a logical data model? What is the purpose of data modeling? How to design logical data model?
What is Data Model? A model is an abstract representation of some real thing. Data modeling is the action of exploring data-oriented structures. A logical data model is a graphical representation of the information requirements of a business area, it is not a database.
Data Models Concepts Conceptual data models Logical data models (LDMs). Physical data models (PDMs).
What is the difference between a logical data model and a physical database design? THE LOGICAL MODEL THE PHYSICAL DATABASE DESIGN Includes all entities, relationships, and attributes (and their information types) whether supported by a technology or not. Uses business names. Captures and records information necessary for the business. Includes tables, columns, keys, datatypes, validation rules. DB triggers, stored procedures, domains, and access constraints (security). Names may be limited by the DBMS. Includes technology-specific data elements such as flags, switches, and timestamps. Includes unique identifiers. Includes primary keys, foreign keys, and indices for fast data access. Is normalized to at least 3rd normal form. May be de-normalized to meet performance requirements. Does not include any redundant data. May include redundant data elements. Does not include any derived data. May include results of complex or difficult to recreate calculations. Business experts drive the model. Designer drive the model.
A simple logical data model.
A simple physical data model
Logical Data Model Format Logical Data Model is in format known as “Entity Relationship Diagram” (ERD) Most popular data modeling tools are Erwin, ER Studio and Power Designer.
Data Model What is a logical data model? What is the purpose of data modeling? How to design logical data model?
Advantages to Using a Model Easier to understand model at a glance No need to trace through narrative descriptions of relationships Communicates one clear definition Understood by business and technical staff
Benefits of a Logical Data Model Using a Logical Data model speeds maintenance and eases the Transition to new technologies. Capture business requirements (ensure understanding) Ability to share data across enterprise resulting in: Accurate data Consistent data Reduced costs Easier to implement changes in your business Business requirements can be satisfied in database design
Data Model What is a logical data model? What is the purpose of data modeling? How to design logical data model?
Who uses the logical data model? The Business Area Experts own the logical data model. They describe their data requirements to the data modeler and review the models created. They use the models for impact analysis of changes to business requirements. The Data Modeler conducts facilitated sessions with business area experts to gather the data requirements and build the logical data model. The data modeler also works with the process analyst to link data with processes. The data modeler is responsible for getting approval of the logical data model from the business area experts and then works with the DBA to transition the logical model to the physical model. The DBA (Designer) builds the physical data model from the logical data model. To create a good quality database design, the DBA reviews the logical model to select technology appropriate keys, create indexes, detail data types, and build referential integrity to protect the data values. The database administrator may de-normalize the database for efficiency. DBAs also are responsible for creating db schemas, maintaining referential integrity, and monitoring database performance.
Actions in Data Modeling Identify – Determine which things are represented in the model. Name – Each thing represented in the model needs to have a unique and meaningful name. Describe – Name is important, but not sufficient. Description should be no more than three sentences, each with subject, object, and verb. Must answer: What is it? What it is not. Sometimes: What are some examples? Associate – Much of the meaning is in associations among the things represented in the model.
How to Model Data Identify entity types Identify attributes Assign keys Inversion Entries Identify relationships Normalize to Reduce Data Redundancy
What is an Entity? Entity: a person, place, thing, concept or event that the business wants to store information about A movie is an entertainment, documentary, or educational event which has been recorded in a moving picture format. MOVIE
Entity and Instance Each entity is made up by a group of objects, which are named as Instances. Each instance can be identified from other instances.
ENTITY Examples Mr.Koch People Ms.Chou HongKong Place R.O.C BMW 525i category ENTITY Instance Mr.Koch EMPLOYEE STUDENT OFFICE AUTOMOBILE CHEMICAL FUNDS TRANSFER TENNIS TOURNAMENT COUNTRY DEPARTMENT ORDER People Place Things Event concept Ms.Chou HongKong R.O.C BMW 525i Ammonia 42233 U.S. OPEN L789 I12345
What is an Attribute? Attribute: a fact or characteristic of an entity with only one meaning (atomic) Each entity type will have one or more data attributes attributes Employee Id Employee Last Name Employee First Name Employee Address Employee Phone Number EMPLOYEE ENTITY Name
Two kinds of Attributes Key Attributes Non-key Attributes Consultant Id Consultant Last Name Consultant First Name Consultant Specialization Consultant Hourly Rate CONSULTANT Key Attributes Non-key Attributes
Candidate Keys One single attribute or a group of attributes that can be used to identify each instance. TEACHER Teacher Last Name Teacher First Name Teacher Address Teacher Country Teacher Certificate Id Teacher Mother Maiden Name Teacher Phone Number Teacher Date of Birth
Primary Key A candidate key with the highest priority that be used to identify the instance EMPLOY ID First Name Last Name Address Department Phone Number Birthday Employee PK
Alternate Key All the candidate keys except PK Employee Id Employee Last Name (AK1) Employee First Name (AK1) Employee Address Employee City Employee State Employee Zip Code Employee Phone Number (AK2) Employee Date of Birth (AK1,AK2)
Inversion Entries Some of attributes be used to find out the instance wanted. The result may not be unique. Employee Id Employee Last Name (AK1,IE2) Employee First Name (AK1) Employee Address Employee City (IE1) Employee State (IE1) Employee Zip Code Employee Phone Number Employee Date of Birth (AK1) EMPLOYEE
What is a Relationship? Relationship: an association between occurrences of one or more entities which provides some relevant and valuable information MOVIE VIDEO TAPE is recorded on records
What is a Verb Phrase Parent-to-child verb phrase describes how the parent is related to the child. In the example to the left, the verb phrase states that “STORE rents A MOVIE.” Child-to-parent verb phrase describes how a child entity is related to a parent entity. In the example to the left, the verb phrase states that “MOVIE is rented from A STORE”
Cardinality of Relationship One-to-one One-to-many Many-to-one Many-to-many All types can be optional for one or both entities
Identifying Relationship An identifying relationship is a relationship between two entities in which an instance of a child entity is identified through its association with a parent entity, which means the child entity is dependent on the parent entity for its identify and cannot exist without it. MOVIE MASTER Movie Master Id Movie Name Movie Star Movie Type Movie Rating MOVIE COPY Movie Master Id (FK) Movie Copy Number Movie Copy Create Date Movie Copy Due Date Movie Copy Condition is rented as/ is created from
Mandatory non-identifying relationship A non-identifying relationship in which an instance of the child entity must be related to an instance of the parent entity. places/ is received from CUSTOMER Customer Id Customer Name Customer Address Customer Phone ORDER Order Number Customer Id (FK) Order Date Order Status Order Shipdate
Non-mandatory non-identifying relationship A non-identifying relationship in which an instance of the child entity can exist without being related to an instance of the parent entity. EMPLOYEE Employee Id Department Number (FK) Employee Name Employee Address employs/ belongs to Department Number Department Name Department Location DEPARTMENT
Many-to-Many Relationship A many-to-many relationship is one where a relationship and its inverse are both to-many (if you are used to entity-relationship modeling using a relational database. is ordered from /sends us PART SUPPLIER
Build Relationship 1:M Y N Start 1 : M M:M Cardinality of R M : M Draw and name an Identifying Relationship from Parent to Child M:M inheritable or Non-inheritable Draw and name a Non-identifying Relationship from Parent to Child FK - NO NULL FK - NULLS ALLOWED 1 : M M : M 1:M Cardinality of R Indentify Non-identify Start Y N
Normalize to Reduce Data Redundancy Data normalization is a process in which data attributes within a data model are organized to increase the cohesion of entity types. Level Rule First normal form (1NF) An entity type is in 1NF when it contains no repeating groups of data. Second normal form (2NF) An entity type is in 2NF when it is in 1NF and when all of its non-key attributes are fully dependent on its primary key. Third normal form (3NF) An entity type is in 3NF when it is in 2NF and when all of its attributes are directly dependent on the primary key.
Normalization Step by step process to verify and refine logical data model Condition of model at completion of each step is a “normal form” DOT standard is third normal form First normal form: Eliminate repeating groups Second normal form: Ensure that all attributes depend on the entity identifier Third normal form: Ensure that all attributes depend only on the entity identifier
1st Normal Form Eliminate repeating groups To remove the repeating group of fields, collapse them into a single field with multiple records in a new table, related back to the primary data.
2nd Normal Form Uniquely identify each instance Each table must contain attributes for a single subject and each table must contain an attribute (or set of attributes) that uniquely identify a single record within that table.
3rd Normal Form Eliminate columns not dependent on the key Each attribute must depend on the primary key, so the violating fields are moved into separate, related tables.
II Physical Design
Physical Design Mapping Logical Model to Physical Model Naming standard Identify table type Column Data Type Group tables Assign Keys Choose Index Denormalizate to improve performance Storage
Mapping Logical Model to Physical Model Entity -> Table Attribute -> Column Primary Key -> Primary Key Relationship -> Foreign Key Inversion Entry -> Index
Naming Standard Name the db objects under defined naming standard Example: table should have a prefix t_ Define abbreviation Example: Cargo -> CGO
Table Types Table Purpose Data Wave Data Size
Table Purpose Transaction Table Log Table / Analysis table Statistics Table Supporting Table
Data Wave Stable Table Increasing Table Volatile Table
Data Size Large Table Small Table
Group Table Group table by business module Group table by relationship
Column Data Type Choose data type Length LOB Char Varchar2 Number Integer Float Length LOB Store in row Store in another tablespace
Assign Primary Key Natural Key Surrogate Key Assign a natural key which is one or more existing data attributes that are unique to the business concept. Surrogate Key Introduce a new column, called a surrogate key, which is a key that has no business meaning.
Natural Key Advantage Disadvantage No need introduce new column Meaningful and understandable Key value is transferable Disadvantage May changed by business requirement change May contain many columns in feature generation Key value may be updated which will also impact children tables
Surrogate Key Advantage Disadvantage Not related to business, be easily maintain Stable Just contain one single column, simplify the foreign key Disadvantage Will lead to recursive relationship Hard to understand the relationship and its type May add redundancy code
How to choose surrogate key? Key assigned by the RDBMS, e.g. SEQUENCE Max()+1 Universally Unique Identifiers (UUID) Global Unique Identifiers (GUID) High-Low strategy
Choose Key Strategies Unique Minimal Columns Not null Stable Fit to the application
Assign Foreign Key Ensure the data integration Delete/Update Cascade Which case no need assign Foreign Key?
How to choose index Proto-index from logical model Eliminate overlapped index Eliminate low-hit index Column sequence in index B-Tree .vs. Bitmap
Proto-index from logical model Inversion Entry Primary Key Candidate Key Foreign Key
Eliminate overlapped index Index overlap index Multiple Option Columns
Eliminate low-hit index Small Table / Cached Table Indexed Column cardinality (1/distinct_value_num)*total_value_num
Column sequence in index High searching column leading the index Low Cardinality column leading the index Conduce to eliminate duplicated index
B-Tree .vs. Bitmap B-Tree Index Bitmap Index OLTP table Low Cardinality Column Bitmap Index DSS/OLAP table High Cardinality Column
Denormalize to improve performance Adding redundancy data to avoid costly table joins can dramatically improve the query performance.
When denormalize? Repeatedly join two table together. Additional query item. Additional order by item.
Which column be redundancy Small data column Static and rarely updated column
Materialized View A materialized view is a database object that contains the results of a query. A view of tables; Query result be stored physically.
Redundancy & Integration Trigger Scheduled Job
Storage Tablesapce Table storage
Tablesapce Dictionary Management Tablespace (DMT) Local Management Tablespace (LMT)
ASSM ASSM (Automatic Segment Space Management) is a method used by Oracle to manage space inside data blocks. It eliminates the need to specify parameters like PCTUSED, Freelists and Freelist groups for objects created in the tablespace.
Table Storage Cached Table Index Organized Table Compressed Table Partition Table Cluster Table External Table Global Temporary Table
Cached Table For data that is accessed frequently, this clause indicates that the blocks retrieved for this table are placed at the most recently used end of the least recently used (LRU) list in the buffer cache when a full table scan is performed. This attribute is useful for small lookup tables. You cannot specify CACHE for an index-organized table. However, index-organized tables implicitly provide CACHE behavior.
Index Organized Table The data rows are held in an index defined on the primary key for the table. Best suited for primary key-based access and manipulation.
Compressed Table Enables data segment compression to reduce disk use. Only for heap-organized tables. LOB data segments are not compressed.
Partition Table Partition the table by rules. Data will be stored at different partition. Cannot partition a table that is part of a cluster. Cannot partition a table containing any LONG or LONG RAW columns.
Cluster Table Specify one column from the table for each column in the cluster key. A clustered table uses the cluster's space allocation. Object tables and tables containing LOB columns cannot be part of a cluster.
External Table It is a read-only table, whose metadata is stored in the database and table data stored in outside database, flat file. can specify only column, datatype, and inline_constraint. cannot specify constraints on an external table. cannot have object type columns, LOB columns, or LONG columns.
Global Temporary Table Table is temporary and that its definition is visible to all sessions. The data in a temporary table is visible only to the session that inserts the data into the table. it contains either session-specific or transaction-specific data, which decided by the ON COMMIT clause.
Maintain Plan Table Sizing Housekeeping Plan Analyze Statistics data
Table Sizing Data type length Index Data growth VARCHAR2 LOB Other type Index Rowid Data growth
Initial sizing method Calculate Row size by summing column length. Insert initial data & analyze table to get the row size Analyze exiting table to get the row size. Space fragment redundancy (5%~30%).
Housekeeping Plan Which table need by housekept? When to perform housekeeping? How to housekeep?
Which table need by housekept? Transaction table / Log table; Increasing table; Large table
When to perform housekeeping? Housekeeping is high cost operation. Should be performed at low-loading or down time. High housekeeping frequency will help to keep low HWM. Should be performed periodically.
How to housekeep? Housekeep condition Time Status Online data ->[Compressed Data ] -> [ Archived Data ] -> Deleted data Schedule Job / Manually
Analyze Statistics data Which table need be analyzed? When to analyze?
Which table need be analyzed? In CBO, all of tables need be analyzed. Different kinds of table have different analyze interval.
When to analyze? Table be online for a time, when data enough. Data volume changed dramatically. Table structure changed.
IV Example Student Course Management System
Student Course Management System Entities Student Course Course Student
Student Course Management System Attributes Student ID Name Sex Age Address College College Address Student Course ID Course Name Teacher ID Teacher Name Course
1NF – Eliminate Repeating Groups Student ID First Name Last Name Sex Age Address College College Address Student Course ID Course Name Teacher ID Teacher First Name Teacher Last Name Course
Student Course Management System Keys Student ID (PK) First Name (AK1) Last Name (AK1) Sex Age Address (AK1) College College Address Student Course ID (PK) Course Name (AK1) Teacher ID (AK1) Teacher First Name Teacher Last Name Course
Student Course Management System Inversion Entry Student ID (PK) First Name (AK1) Last Name (AK1) Sex Age Address (AK1) College (IE1) College Address Student Course ID (PK) Course Name (AK1) (IE1) Teacher ID (AK1) (IE2) Teacher First Name Teacher Last Name Course
Student Course Management System Relationship Student Elect Course Course Open For Student Student ID (PK) First Name (AK1) Last Name (AK1) Sex Age Address (AK1) College (IE1) College Address Student Course ID (PK) Course Name (AK1) (IE1) Teacher ID (AK1) (IE2) Teacher First Name Teacher Last Name Course
Student Course Management System Transform Many-to-Many to One-to-Many Student ID (PK) First Name (AK1) Last Name (AK1) Sex Age Address (AK1) College (IE1) College Address Student Course ID (PK) Course Name (AK1) (IE1) Teacher ID (AK1) (IE2) Teacher First Name Teacher Last Name Course Student ID(FK1) Course ID(FK2) Score Election Times Credit Hour Election Course Open For Student Student Elect Course
Student Course Management System 2NF -- Ensure that all attributes depend on the entity identifier Student ID (PK) First Name (AK1) Last Name (AK1) Sex Age Address (AK1) College (IE1) College Address Student Course ID (PK) Course Name (AK1) (IE1) Teacher ID (AK1) (IE2) Teacher First Name Teacher Last Name Credit Hour Course Student ID(FK1) Course ID(FK2) Score Election Times Election Course Open For Student Student Elect Course
Student Course Management System 3NF -- Ensure that all attributes depend only on the entity identifier Student Election Course Student Elect Course Course Open For Student Student ID (PK) First Name (AK1) Last Name (AK1) Sex Age Address (AK1) College ID(IE1)(FK1) Student ID(FK1) Course ID(FK2) Score Election Times Course ID (PK) Course Name (AK1) (IE1) Teacher ID(AK1)(IE2)(FK1) Credit Hour Teacher Teach Course Teacher ID Teacher First Name Teacher Last Name College ID(FK1) Teacher College ID College Name College Address Rector College Teacher Belong to College Student Belong to College
Q & A
Thanks! www.HelloDBA.com fuyuncat