Planning Warehouse Storage Chapter 9. Data Partitioning zBreaking up a data into separate physical units that can be handled independently zEase of: -

Slides:



Advertisements
Similar presentations
Redundant Array of Independent Disks (RAID) Striping of data across multiple media for expansion, performance and reliability.
Advertisements

What is RAID Redundant Array of Independent Disks.
CS 346 – April 4 Mass storage –Disk formatting –Managing swap space –RAID Commitment –Please finish chapter 12.
1 Jason Drown Mark Rodden (Redundant Array of Inexpensive Disks) RAID.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
RAID Redundant Array of Independent Disks
5 Copyright © 2005, Oracle. All rights reserved. Managing Database Storage Structures.
 RAID stands for Redundant Array of Independent Disks  A system of arranging multiple disks for redundancy (or performance)  Term first coined in 1987.
Database Administration and Security Transparencies 1.
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
RAID Technology. Use Arrays of Small Disks? 14” 10”5.25”3.5” Disk Array: 1 disk design Conventional: 4 disk designs Low End High End Katz and Patterson.
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
Computer ArchitectureFall 2007 © November 28, 2007 Karem A. Sakallah Lecture 24 Disk IO and RAID CS : Computer Architecture.
IS 4420 Database Fundamentals Chapter 6: Physical Database Design and Performance Leon Chen.
Physical Design CS 543 – Data Warehousing. CS Data Warehousing (Sp ) - Asim LUMS2 Physical Design Steps 1. Develop standards 2.
Data Warehousing - 3 ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension.
Chapter 12 File Management Systems
PARTITIONING “ A de-normalization practice in which relations are split instead of merger ”
Architecting a Large-Scale Data Warehouse with SQL Server 2005 Mark Morton Senior Technical Consultant IT Training Solutions DAT313.
Redundant Array of Inexpensive Disks (RAID). Redundant Arrays of Disks Files are "striped" across multiple spindles Redundancy yields high data availability.
Chapter 6 RAID. Chapter 6 — Storage and Other I/O Topics — 2 RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f.
LAN / WAN Business Proposal. What is a LAN or WAN? A LAN is a Local Area Network it usually connects all computers in one building or several building.
Managing Multi-User Databases AIMS 3710 R. Nakatsu.
Practical Database Design and Tuning. Outline  Practical Database Design and Tuning Physical Database Design in Relational Databases An Overview of Database.
IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.
1 Chapter 12 File Management Systems. 2 Systems Architecture Chapter 12.
1 © Prentice Hall, 2002 Physical Database Design Dr. Bijoy Bordoloi.
Oracle9i Database Administrator: Implementation and Administration 1 Chapter 9 Index Management.
1 Chapter 7: Storage Systems Introduction Magnetic disks Buses RAID: Redundant Arrays of Inexpensive Disks.
RAID COP 5611 Advanced Operating Systems Adapted from Andy Wang’s slides at FSU.
Parity Logging O vercoming the Small Write Problem in Redundant Disk Arrays Daniel Stodolsky Garth Gibson Mark Holland.
Redundant Array of Inexpensive Disks aka Redundant Array of Independent Disks (RAID) Modified from CCT slides.
Disk Structure Disk drives are addressed as large one- dimensional arrays of logical blocks, where the logical block is the smallest unit of transfer.
CS 345: Topics in Data Warehousing Tuesday, October 19, 2004.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
© Pearson Education Limited, Chapter 16 Physical Database Design – Step 7 (Monitor and Tune the Operational System) Transparencies.
Physical DB Issues, Indexes, Query Optimisation Database Systems Lecture 13 Natasha Alechina.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
Data Warehouse and the Star Schema CSCI 242 ©Copyright 2015, David C. Roberts, all rights reserved.
Partitioning Design For Performance and Maintainability Martin Cairns
RAID SECTION (2.3.5) ASHLEY BAILEY SEYEDFARAZ YASROBI GOKUL SHANKAR.
IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida Physical Database Implementation—Topics.
"1"1 Introduction to Managing Data " Describe problems associated with managing large numbers of disks " List requirements for easily managing large amounts.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
5 Copyright © 2005, Oracle. All rights reserved. Managing Database Storage Structures.
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Lec 7 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
Chapter 5 Index and Clustering
1 Copyright © 2005, Oracle. All rights reserved. Following a Tuning Methodology.
Storage and File structure COP 4720 Lecture 20 Lecture Notes.
Chapter 14: Mass-Storage Systems Disk Structure. Disk Scheduling. RAID.
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
Hands-On Microsoft Windows Server 2008 Chapter 7 Configuring and Managing Data Storage.
Ahsan Abdullah 1 Data Warehousing Lecture-8 De-normalization Techniques Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
4 Copyright © Oracle Corporation, All rights reserved. Modeling the Data Warehouse.
Database Systems, 8 th Edition SQL Performance Tuning Evaluated from client perspective –Most current relational DBMSs perform automatic query optimization.
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
SQL Basics Review Reviewing what we’ve learned so far…….
Network-Attached Storage. Network-attached storage devices Attached to a local area network, generally an Ethernet-based network environment.
I/O Errors 1 Computer Organization II © McQuain RAID Redundant Array of Inexpensive (Independent) Disks – Use multiple smaller disks (c.f.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
Practical Database Design and Tuning
Module 11: File Structure
Physical Changes That Don’t Change the Logical Design
ITD1312 Database Principles Chapter 5: Physical Database Design
Practical Database Design and Tuning
CSE 451: Operating Systems Autumn 2004 Redundant Arrays of Inexpensive Disks (RAID) Hank Levy 1.
Presentation transcript:

Planning Warehouse Storage Chapter 9

Data Partitioning zBreaking up a data into separate physical units that can be handled independently zEase of: - Restructuring - Reorganization - Removal - Recovery - Monitoring - Management - Archiving - Indexing Order table Drop Other data is not affected Add

Objects to Partition zTables: - Fact - Dimension zIndexes

Horizontal Partitioning zTable and index data are split by: - Time - Sales region or person - Geography - Organization - Line of business zCandidate columns appear in WHERE clause zAnalysis determines requirement

Vertical Partitioning You may use vertical partitioning when: zSpeed of query and update actions is improved by it zUsers require access to specific columns zSome data is changed infrequently zDescriptive dimension text may be better moved away from the dimension itself

Partitioning Methods zRange partitioning (Oracle8 and Oracle8i) zHash partitioning (Oracle8i) zComposite partitioning (Oracle8i)

Star Query Optimization Optimum performance with star schema models 1. Dimensions are queried to create a 2. Cartesian product, computed against 3. Smaller reference table. 4. The result is joined to 5. A fact table to produce a query result.

Star Transformation Time_Table Fact_Table Product_Table Market_Table

Indexing Indexing is used because: zIt is huge cost saving, greatly improving performance and scalability zCan replace a full table scan by a quick read of the index followed by a read of only those disk blocks that contain the rows needed

B-Tree Index zMost common type of indexing zUsed for high cardinality columns zDesigned for few rows returned

Bitmap Indexes zProvide performance benefits and storage savings zStore values as 1s and 0s zUse instead of B-tree indexes when: - Tables are large - Columns have relatively low cardinality

Oracle8 and Oracle8i Index Enhancements zOracle8 index enhancements: - Partitioned index - Index-organized tables zOracle8i index enhancements: - Function-based index - New bitmap index improvements - Online index build and rebuild - Descending index - Statistics can be collected when an index is created

Protecting the Database zRAID is essential with large databases zRAID improves: - Reliability - Storage management zThere are different levels of RAID zYou can eliminate disk contention with disk striping

RAID 0: Striping The file is written to a four-drive disk array: zBlock 1 on Drive 1 zBlock 2 on Drive 2… zBlock 5 in another sector on Drive 1 Disk array controller

RAID 0: Striping zBenefits: - Good for simultaneous reads and writes - No redundancy - Scalable zLimitations: - Not recommended for mission-critical systems - No recovery from data loss - One bad sector affects entire disk of data

RAID 1: Mirrored Disk Disk array controller Disk 1 Mirror Disk 2 Mirror Copy of files stored on mirror disk

RAID 1: Mirrored Disk zBenefits: - Complete data redundancy - No performance penalty - Improves reads - Scalability zLimitations: - Highest cost of all RAID configurations

RAID 5: Independent Disk Array Disk array controller Disk 1Disk 2Disk 3Disk 4 Data striped with parity across array

RAID 5: Independent Disk Array zBenefits: - Efficient data integrity - Data reconstruction - Multiple concurrent seeks across array - Scalable zLimitations: - Disk overhead - Data write rate

Backup zPlan at the design stage zUse hot backups for VLDBs zBack up necessary components: - Fact and dimension data - Warehouse schema - Metadata schema - Metadata zExport/Import utility - Disk space - Time

Summary This lesson discussed the following topics: zExplaining vertical partitioning and horizontal partitioning zDistinguishing the different types of partitioning methods zDistinguishing between B-tree index and bitmap index zUnderstanding why warehouse typically uses RAID 0, or 5 to protect the database