ITEC 3220A Using and Designing Database Systems

Slides:



Advertisements
Similar presentations
Database Systems: Design, Implementation, and Management
Advertisements

Chapter 10: Designing Databases
Distributed Database Management Systems
Chapter 13 The Data Warehouse
Database Architectures and the Web
Distributed databases
Transaction.
Introduction to Data Warehouse and Data Mining MIS 2502 Data Analytics
Chapter 13 Business Intelligence and Data Warehouses
Database Systems: Design, Implementation, and Management Tenth Edition
13 1 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Chapter 12 The Data Warehouse
Distributed DBMSs A distributed database is a single logical database that is physically distributed to computers on a network. Homogeneous DDBMS has the.
Distributed Database Management Systems
Chapter 9 : Distributed Database.
Overview Distributed vs. decentralized Why distributed databases
Distributed Database Management Systems
Chapter 12 Distributed Database Management Systems
13 Chapter 13 The Data Warehouse Hachim Haddouti.
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Chapter 13 The Data Warehouse
Designing a Data Warehouse
Chapter 13 – Data Warehousing. Databases  Databases are developed on the IDEA that DATA is one of the critical materials of the Information Age  Information,
DISTRIBUTED DATABASES AND DDBMS.  Understand the concept of “Distributed Data”  Describe various Distributed Data and DDBMS implementations  Explain.
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
Client/Server Databases and the Oracle 10g Relational Database
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Data Warehouse & Data Mining
12 1 Chapter 12 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Database Architectures and the Web Session 5
Database Design – Lecture 16
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Database Systems: Design, Implementation, and Management Tenth Edition
Session-8 Data Management for Decision Support
10 1 Chapter 10 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 12 Distributed Database Management Systems.
Week 5 Lecture Distributed Database Management Systems Samuel ConnSamuel Conn, Asst Professor Suggestions for using the Lecture Slides.
Chapter 10 Distributed Database Management System
13 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management 4th Edition Peter Rob & Carlos Coronel.
2Object-Oriented Analysis and Design with the Unified Process Objectives  Describe the differences and similarities between relational and object-oriented.
10 1 Chapter 10 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 13 Business Intelligence and Data Warehouses.
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
13 1 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Chapter 12 Distributed Database Management Systems.
Distributed Database. Introduction A major motivation behind the development of database systems is the desire to integrate the operational data of an.
Chapter 10 Distributed Database Management System
Fox MIS Spring 2011 Data Warehouse Week 8 Introduction of Data Warehouse Multidimensional Analysis: OLAP.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 13 Business Intelligence and Data Warehouses.
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
1 Database Systems, 8 th Edition 1 Chapter 13 Business Intelligence and Data Warehouses Objectives In this chapter, you will learn: –How business intelligence.
Introduction to Active Directory
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
1 Chapter 22 Distributed DBMS Concepts and Design CS 157B Edward Chen.
ITEC 3220M Using and Designing Database Systems Instructor: Prof. Z.Yang Course Website: c3220m.htm Office: TEL.
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
DISTRIBUTED DATABASES AND DDBMS. Learning Objectives  Describe various DDBMS implementations  Explain how database design affects the DDBMS environment.
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
Chapter 12 Distributed Database Management Systems
Chapter 13 Business Intelligence and Data Warehouses
Chapter 13 The Data Warehouse
MANAGING DATA RESOURCES
Introduction of Week 9 Return assignment 5-2
Introduction of Week 14 Return assignment 12-1
Presentation transcript:

ITEC 3220A Using and Designing Database Systems Instructor: Gordon Turpin Course Website: www.cse.yorku.ca/~gordon/itec3220S07 Office: CSEB3020

Chapter 13 The Data Warehouse

Transaction Processing Versus Decision Support Transaction processing allows organizations to conduct daily business in an efficient manner Operational database Decision support helps management provide medium-term and long-term direction for an organization

Decision Support System (DSS) Components

Operational vs. Decision Support Data Operational data Relational, normalized database Optimized to support transactions Real time updates DSS Snapshot of operational data Summarized Large amounts of data Data analyst viewpoint Timespan Granularity Dimensionality

The DSS Database Requirements Database schema Support complex (non-normalized) data Extract multidimensional time slices Data extraction and filtering End-user analytical interface Database size Very large databases (VLDBs) Contains redundant and duplicated data

Data Warehouse Integrated Subject-Oriented Time Variant Non-Volatile Centralized Holds data retrieved from entire organization Subject-Oriented Optimized to give answers to diverse questions Used by all functional areas Time Variant Flow of data through time Projected data Non-Volatile Data never removed Always growing

Data Marts Single-subject data warehouse subset Decision support to small group Can be tested for exploring potential benefits of Data warehouses Address local or departmental problems

Data Warehouse Versus Data Mart

Star Schema Data-modeling technique Maps multidimensional decision support into relational database Yield model for multidimensional data analysis while preserving relational structure of operational DB Four Components: Facts Dimensions Attributes Attribute hierarchies

Simple Star Schema

Slice and Dice View of Sales

Star Schema Representation Facts and dimensions represented by physical tables in data warehouse DB Fact table related to each dimension table (M:1) Fact and dimension tables related by foreign keys Subject to the primary/foreign key constraints

Star Schema for Sales

Example Canadian financial organization is interested in building a data warehouse to analyze customers’ credit payments over time, location where the payments were made, customers, and types of credit cards. A customer may use the credit card to make a payment in different locations across the country and abroad. If a payment is made abroad it can be based on domestic currency and then converted into Canadian dollars based on currency rate. Time is described by Time_ID, day, month, quarter and year. Location is presented by Location_ID, name of the organization billing the customer, city and country where the organization is located, domestic currency. A credit card is described by credit card number, type of the credit account, and customer’s credit rate. The customer’s rate depends on the type of the credit account. A customer is described by ID, name, address, and phone.

Performance-Improving Techniques for Star Schema Normalization of dimensional tables Multiple fact tables representing different aggregation levels Denormalization of the fact tables Table partitioning and replication

Normalization Example Normalize the star schema that you developed for Canadian financial organization on page 16 into 3NF.

More Example A supermarket chain is interested in building a data warehouse to analyze the sales of different products in different supermarkets at different times using different payment method. Each supermarket is presented by location_ID, city, country, and domestic currency. Time can be measured in time_ID, day, month, quarter, and year. Each product is described by product_ID, product_name, and vendor. Payment method is described by payment_ID, payment_ type. Design a star schema for this problem and then normalize the star schema that you developed into 3NF.

Data Warehouse Implementation Road Map

Distributed Database Management Systems Chapter 12 Distributed Database Management Systems

The Evolution of Distributed Database Management Systems Distributed database management system (DDBMS) Governs storage and processing of logically related data over interconnected computer systems in which both data and processing functions are distributed among several sites

Distributed Database Environment

Database Systems: Levels of Data and Process Distribution

Single-Site Processing, Single-Site Data (SPSD) All processing is done on single CPU or host computer (mainframe, midrange, or PC) All data are stored on host computer’s local disk Processing cannot be done on end user’s side of the system Typical of most mainframe and midrange computer DBMSs DBMS is located on the host computer, which is accessed by dumb terminals connected to it Also typical of the first generation of single-user microcomputer databases

Single-Site Processing, Single-Site Data (Centralized)

Multiple-Site Processing, Single-Site Data (MPSD) Multiple processes run on different computers sharing a single data repository MPSD scenario requires a network file server running conventional applications that are accessed through a LAN Many multi-user accounting applications, running under a personal computer network, fit such a description

Multiple-Site Processing, Single-Site Data

Multiple-Site Processing, Multiple-Site Data (MPMD) Fully distributed database management system with support for multiple data processors and transaction processors at multiple sites Classified as either homogeneous or heterogeneous Homogeneous DDBMSs Integrate only one type of centralized DBMS over a network

Multiple-Site Processing, Multiple-Site Data (MPMD) (Cont’d) Heterogeneous DDBMSs Integrate different types of centralized DBMSs over a network Fully heterogeneous DDBMS Support different DBMSs that may even support different data models (relational, hierarchical, or network) running under different computer systems, such as mainframes and microcomputers

Distributed Database Design Data fragmentation: How to partition the database into fragments Data replication: Which fragments to replicate Data allocation: Where to locate those fragments and replicas

Data Fragmentation Breaks single object into two or more segments or fragments Each fragment can be stored at any site over a computer network Information about data fragmentation is stored in the distributed data catalog (DDC), from which it is accessed by the TP to process user requests

Data Fragmentation Strategies Horizontal fragmentation: Division of a relation into subsets (fragments) of tuples (rows) Vertical fragmentation: Division of a relation into attribute (column) subsets Mixed fragmentation: Combination of horizontal and vertical strategies

Data Replication Storage of data copies at multiple sites served by a computer network Fragment copies can be stored at several sites to serve specific information requirements Can enhance data availability and response time Can help to reduce communication and total query costs

Replication Scenarios Fully replicated database: Stores multiple copies of each database fragment at multiple sites Can be impractical due to amount of overhead Partially replicated database: Stores multiple copies of some database fragments at multiple sites Most DDBMSs are able to handle the partially replicated database well Unreplicated database: Stores each database fragment at a single site No duplicate database fragments

Data Allocation Deciding where to locate data Allocation strategies: Centralized data allocation Entire database is stored at one site Partitioned data allocation Database is divided into several disjointed parts (fragments) and stored at several sites Replicated data allocation Copies of one or more database fragments are stored at several sites Data distribution over a computer network is achieved through data partition, data replication, or a combination of both