1 Class Agenda: 03/13 – 3/15  Review Database design – core concepts Review design for ERD Scenarios #3 & #4 Review concepts of normalization. Do practice.

Slides:



Advertisements
Similar presentations
Dimensional Modeling.
Advertisements

Chapter 5 Normalization of Database Tables
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Chapter 10: Designing Databases
1 Class Agenda (04/03 and 04/08)  Review and discuss HW #8 answers  Present normalization process Enhance conceptual knowledge of database design. Improve.
Modeling the Data: Conceptual and Logical Data Modeling
Normalization of Database Tables Special adaptation for INFS-3200
Data Warehouse IMS5024 – presented by Eder Tsang.
Normalization of Database Tables
Chapter 3 Database Management
Agenda for Week 1/31 & 2/2 Learn about database design
Ch1: File Systems and Databases Hachim Haddouti
Normalization of Database Tables
13 Chapter 13 The Data Warehouse Hachim Haddouti.
Chapter 5 Normalization of Database Tables
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 5 Normalization of Database Tables.
9 1 Chapter 9 Database Design Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Chapter 13 The Data Warehouse
Data Warehousing DSCI 4103 Dr. Mennecke Introduction and Chapter 1.
Class Agenda: 02/13/2014 Review Goals of assignments.
Chapter 1 Database Systems. Good decisions require good information derived from raw facts Data is managed most efficiently when stored in a database.
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
Data Warehouse & Data Mining
1 Class Agenda (04/09 through 04/16)  Review HW #8  Present normalization process Enhance conceptual knowledge of database design. Improve practical.
1 Class Agenda (11/07 and 11/12)  Review HW #8 answers  Present normalization process Enhance conceptual knowledge of database design. Improve practical.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
9/14/2012ISC329 Isabelle Bichindaritz1 Database System Life Cycle.
Chapter 7: Database Systems Succeeding with Technology: Second Edition.
MIS 301 Information Systems in Organizations Dave Salisbury ( )
Data Warehousing Concepts, by Dr. Khalil 1 Data Warehousing Design Dr. Awad Khalil Computer Science Department AUC.
Normalization A technique that organizes data attributes (or fields) such that they are grouped to form stable, flexible and adaptive entities.
Database Systems: Design, Implementation, and Management Tenth Edition
5 1 Chapter 5 Normalization of Database Tables Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 6 Normalization of Database Tables.
1 DATABASE SYSTEMS DESIGN IMPLEMENTATION AND MANAGEMENT INTERNATIONAL EDITION ROB CORONEL CROCKETT Chapter 7 Normalisation.
BIS Database Systems School of Management, Business Information Systems, Assumption University A.Thanop Somprasong Chapter # 5 Normalization of Database.
I Information Systems Technology Ross Malaga 4 "Part I Understanding Information Systems Technology" Copyright © 2005 Prentice Hall, Inc. 4-1 DATABASE.
Database Design Part of the design process is deciding how data will be stored in the system –Conventional files (sequential, indexed,..) –Databases (database.
DIMENSIONAL MODELLING. Overview Clearly understand how the requirements definition determines data design Introduce dimensional modeling and contrast.
Storing Organizational Information - Databases
The Data Warehouse “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of “all” an organisation’s data in support.
Methodology - Conceptual Database Design
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
1 Technology in Action Chapter 11 Behind the Scenes: Databases and Information Systems Copyright © 2010 Pearson Education, Inc. Publishing as Prentice.
Chapter 13 Designing Databases Systems Analysis and Design Kendall & Kendall Sixth Edition.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
UNIT-II Principles of dimensional modeling
The Data Warehouse “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of “all” an organisation’s data in support.
1 Agenda – 04/02/2013 Discuss class schedule and deliverables. Discuss project. Design due on 04/18. Discuss data mart design. Use class exercise to design.
IS 320 Notes for April 15, Learning Objectives Understand database concepts. Use normalization to efficiently store data in a database. Use.
MIS2502: Data Analytics Relational Data Modeling
1 Class Agenda (04/06/2006 and 04/11/2006)  Discuss use of Visio for ERDs  Learn concepts and ERD notation for data generalization  Introduce concepts.
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
University of Nevada, Reno Organizational Data Design Architecture 1 Agenda for Class: 02/06/2014  Recap current status. Explain structure of assignments.
MIS2502: Data Analytics Relational Data Modeling David Schuff
McGraw-Hill/Irwin Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 6 Modeling the Data: Conceptual and Logical Data Modeling.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
5 1 Chapter 5 Normalization of Database Tables Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
Database Planning Database Design Normalization.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 9: DATA WAREHOUSING.
Systems Analysis and Design
MIS2502: Data Analytics Relational Data Modeling
Data warehouse and OLAP
Chapter 13 The Data Warehouse
MIS2502: Data Analytics Relational Data Modeling
Database Design Hacettepe University
Chapter 17 Designing Databases
Presentation transcript:

1 Class Agenda: 03/13 – 3/15  Review Database design – core concepts Review design for ERD Scenarios #3 & #4 Review concepts of normalization. Do practice design from forms using the Replica Toy database (ERD scenario #5). Discuss issues in database design and normalization.  Discuss concepts of data warehouse design. Establish environment surrounding DW design. Contrast methods of DW design.

2 Goals for Transaction Database Design  Protect the integrity of the data. Reduce data redundancy. Prevent data anomalies.  Provide for change. Prevent inflexible data structures. Anticipate changes.  Provide access to complete data for decision making.

3 What is normalization?  Normalization is a formal, process-oriented approach to data modeling.  Normalization is the process of: examining groups of data attributes; splitting them into appropriate entities; identifying the relationships between the entities; and identifying appropriate primary and foreign keys.

4 Two methods of applying normalization 1. Use it to help in designing a database. Normalization starts with a single entity. Normalization breaks that entity into a series of additional entities. More entities are discovered and named during the process. Entities are linked during the process. 2. Use it to validate the design of a database. Identify entities from the meaning of the data. Create conceptual and logical data models. Apply the rules of normalization to ensure a stable, non- redundant design.

Normalization Vocabulary: Functional Dependency and Determinants  A social security number determines your name and address. SSN  name, address.  A vehicle id number determines the make and model of a car. VIN  make, model.  Name and address are “functionally dependent” on SSN.  SSN “determines” name and address.  Functional dependency diagram format: CrsNum  CrsDescription, CrsCredits ZipCode  City, State (this implies that a zip code uniquely identifies a city and state in the U.S. postal system) PatID, TrtDateTime  TstResults, TrtID, LocID,

6 Normal forms relevant to business oriented databases  First normal form: Remove repeating groups.  Second normal form: Remove partial functional dependencies.  Third normal form: Remove transitive dependences

7 First Normal Form  First normal form: Remove repeating groups. A repeating group is an attribute or group of attributes that can have more than one value for an instance of an entity.  Example of repeating groups: StudentID  StudentName, StudentAddress, courseID1, DateTaken1, Grade1, courseID2, DateTaken2, Grade2, courseID3, DateTaken3, Grade3, CourseID4, DateTaken4, Grade4…

Other examples of a repeating group  Serial#  model#, customer name, customer address, feature 1 chosen, feature 2 chosen, feature 3 chosen…  PatientID  name, address, zip, first insurance company, second insurance company, third insurance company… 8

9 To remedy a problem discovered with normalization  To get a data model into an appropriate normal form: Identify the problem (repeating group, partial functional dependency, or transitive dependency) and place the “problem” attributes in one or more new separate entities in the model. Identify a primary key for the new entity. The key may be concatenated if it is an associative entity, rather than a strong entity. Create relationships between existing and new entities. Divide m:n relationships with appropriate intersection entities.

10 Second Normal Form  Second normal form: Remove partial functional dependencies.  A partial functional dependency is a situation in which one or more non-key attributes are functionally dependent on part, but not all, of the primary key. Partial functional dependencies occur only with entities that have concatenated primary keys.  Examples of partial functional dependencies: PatID, TrtDateTime  PatName, TstResults, TrtType, TrtDescription, LocName, TrtID, LocID, CourseID, StudentID  CourseTitle, Grade

11 Third Normal Form  Third normal form: Remove transitive dependencies. A transitive dependency occurs when a non-key attribute is functionally dependent on one or more non-key attributes.  Examples of transitive dependencies: TrackingNumber  ShipmentDate, OrderID, ItemID ShipmentLocationID, LocationDescription, QuantityShipped PatID, TrtDateTime  TstResults, TrtType, TrtDescription, LocName, TrtID, LocID,

12 Issues in Database Design  Characteristics of business-oriented databases. Used to store transactions. Updated quickly and frequently, but not always accurately. Accessed online real-time. Support operational decision making.  Assuming that the data stored is accurate, what is the biggest potential problem with a transaction database in third normal form?  How do most organizations solve that problem?  What do organizations potentially lose when they solve that problem?

13 Major purposes of a data warehouse  To create a data storage designed to facilitate managerial decision making. Integrated data. Subject-oriented. Time-variant. Non-volatile.  To create a data storage that has better quality, more consistent data than existing operational databases.

15 Goals of data warehouse design  Make accurate information easily accessible.  Present information consistently.  Be adaptive and flexible to change.  Provide reasonable and expected performance for information to support decision making.  Minimize data redundancy.  Protect/secure information.

16 Three different data models  Transaction (operational) data model: Contains current data required by separate and/or integrated operational systems. Supports the transactional processing of the organization. Is frequently used to support day-to-day decision making. 3 rd normal form.  Reconciled (enterprise data warehouse) data model: Contains detailed, current data intended to be the single, authoritative source for all decision support applications. Usually in 3 rd normal form.  Derived (data mart) data model: Contains data that are selected, formatted and aggregated for end-user decision support applications. Star schema. Probably not normalized.

17 Comparison – Replica Toys  Transaction data model  Reconciled data model  Derived (data mart) data model

Reconciled and Derived Data Models Reconciled (EDW)  Independent of specific decisions  Centralized control; usually owned by IT  Historical  Not summarized  Normalized  Flexible  Many data sources  Long life  Starts large, becomes larger Derived (Data Mart)  Specific decisions  One central subject  Usually accessed directly by users; usually decentralized into user area  Closely defined subject area  Detailed and/or summarized  Usually denormalized  Restrictive – few sources  Short life span  Starts small, becomes large

Two approaches to design Enterprise Data Warehouse (Inmon)  Focus is on enterprise subjects that will be needed to support comprehensive decision making.  Emphasis on creating design that is consistent among subject areas.  Implementation is of a data mart.  Uses ERD for modeling.  Relies on comprehensive blueprint for interrelation of data. Interrelated Data Marts (Kimball)  Focus is on business subject area for data warehouse.  Emphasis on creating simple design that can be implemented quickly.  Implementation is of a data mart.  Uses “dimensional model” for modeling. Kind of like an ERD with UML-type aspects.  Relies on consistent interrelation of data by integration of existing data models.

20 Compare/Contrast Approaches  Similarities: Both focus on subject areas for development of data model. Both require extensive input from data warehouse stakeholders. Both produce a subject-oriented, non-volatile, time-related data warehouse. Both try to quickly implement a prototype data mart.  Differences: Inmon creates a more integrated and consistent data warehouse by attempting to design an enterprise-wide warehouse at the beginning of the first data warehouse project. This is called a “reconciled” DW design. Kimball relies on future project teams referencing existing data warehouse models for new projects.

21 What do both approaches yield?  A design for a data mart.  The design for a data mart relies on the concept of a data warehouse “cube.”  A cube is a logical construct containing a “fact” table that is accessed on multiple “dimension” tables.  A fact table contains values that a manager uses to make decisions.  A dimension table is used as a reference for the values in the fact table.

22 Steps of data warehouse design 1.Identify the stakeholders that need data to support their decisions. 2.Define and describe the data needs of those stakeholders. 3.Define the subject area. 4.Choose (EDW and data mart) or just data mart. 5.Select the data of interest. 6.Add element of time. 7.Add derived data. 8.Determine granularity level. 9.Summarize data. 10.Identify and attempt to solve potential performance issues.

23 How do you identify those people within an organization who require data to support their decision making processes?

24 Define and describe the data needs  Usually termed “stakeholder analysis”.  Differing levels of decision making require differing sets of data. Internal vs. external data. Integrated vs. non-integrated data. Detailed vs. summarized data.  Different stakeholders require different access mechanisms. Online vs. reports. Pre-formatted vs. ad-hoc availability of data.  Different stakeholders require different timing. Online, real time vs. delay. Relative size of delay/timeliness is always an issue.

Stakeholder Analysis Table Example – Replica Toys StakeholderDecision Making Responsibilities Existing Information? Additional Information? Availability of Additional Information? Marketing Analyst Decide what features are most valuable to which customers. Determine trends in toy purchases. No data related to features currently available. Customer order data by distribution outlet. Features selected by customers. Purchases by toy by customer. Not in existing system and cannot be compiled manually. Maybe telephone survey? Maybe registration system? Distribution Manager Determine trends in use of distribution outlets. Determine distribution outlet profitability. Customer order data by distribution outlet. Purchases by toy by customer by distribution outlet. Purchase price by toy by customer by distribution outlet. Need customer order data with more specific parameters. See if available in customer order system. Quality control specialist Evaluate comparative defects of toys within and across product lines. Support call data. Product return data. Detailed problem reports including date, toy, problem, extent of damage. Not available in current support call and product return systems. Could be added. Development engineer Evaluate relative safety issues with existing product line. Determine potential safety issues with new product development. Support call data. Product return data. Safety test data. Detailed problem reports including date, toy, problem, injury, relative impact of injury, potential responsibility. Not available in current support call and product return systems. Could be added. Engineering safety test data is available.

26 Define the subject area  Potential subject areas in common to many businesses: Customers: people and organizations who acquire and/or use the company’s products. Equipment: Machinery, devices, tools and their components. Facilities: Real estate and their components. Sales: Transactions that move a product from company to a customer. Suppliers: Entities that provide a company with goods and services. Products: Goods and services that the company, or its competitors, provide to customers. Materials: Goods and services that the company uses to produce its products. Financials: Information about money that is received, retained, expended, invested or in any way tracked by the company. Human resources: Individuals who perform work for the company – may be employees, contracts, or simply positions.

27 Select the data of interest  Use the existing transaction database model.  Identify and understand the necessary business decisions.  Identify external data that could help support decisions.  Use tables to help sort available attributes. Example: Table 4.1 on pgs of chapter 4 in “Mastering Data Warehouse Design.”

28 Add element of time  Data warehouse is a historical model rather than a current “point in time” model.  Must have a way to incorporate changes that occur over time.  Important issues: Fact table must include a time component. Ranges of time vs. effective period in time Time also relates to dimension tables May have to deal with differing time periods. Examples are fiscal years, “holiday rush,” billing cycle, etc.

29 Add derived data  Derived data includes any kind of calculated field.  Examples: total sales; net sales amount; total funds raised; total cost of products.  Issues: Must be identified, defined and agreed upon by data warehouse stakeholders. Must be documented in metadata. Must be consistent.

30 Determine granularity level  What are the benefits and drawbacks of a low level of granularity?  What are the benefits and drawbacks of a high level of granularity?  What factors should be considered when determining the level of granularity in the data warehouse?

31 Summarize (aggregate) data  What is summarized data?  How is data summarized?  Does summarized data save disk space?  Why summarize data?

32 Identify and solve performance issues  What are the potential performance problems that can occur with a data warehouse?  Why is performance a consideration during data warehouse design?  What can a designer do to alleviate potential performance problems?