SCALING AND PERFORMANCE CS 260 Database Systems. Overview  Increasing capacity  Database performance  Database indexes B+ Tree Index Bitmap Index 

Slides:



Advertisements
Similar presentations
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Advertisements

Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
What is a Database By: Cristian Dubon.
CpSc 3220 File and Database Processing Lecture 17 Indexed Files.
Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 18 Indexing Structures for Files.
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
Chapter 13 (Web): Distributed Databases
Distributed Databases Logical next step in geographically dispersed organisations goal is to provide location transparency starting point = a set of decentralised.
IS 4420 Database Fundamentals Chapter 6: Physical Database Design and Performance Leon Chen.
Data Indexing Herbert A. Evans. Purposes of Data Indexing What is Data Indexing? Why is it important?
Overview Distributed vs. decentralized Why distributed databases
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.
1 CS 728 Advanced Database Systems Chapter 17 Database File Indexing Techniques, B- Trees, and B + -Trees.
CS4432: Database Systems II
Distributed Databases
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
Indexing and Hashing (emphasis on B+ trees) By Huy Nguyen Cs157b TR Lee, Sin-Min.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida Distributed Databases Business needs.
Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.
Practical Database Design and Tuning. Outline  Practical Database Design and Tuning Physical Database Design in Relational Databases An Overview of Database.
IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.
Oracle Data Block Oracle Concepts Manual. Oracle Rows Oracle Concepts Manual.
1 © Prentice Hall, 2002 Physical Database Design Dr. Bijoy Bordoloi.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
Lecture 5: Sun: 1/5/ Distributed Algorithms - Distributed Databases Lecturer/ Kawther Abas CS- 492 : Distributed system &
1 Index Structures. 2 Chapter : Objectives Types of Single-level Ordered Indexes Primary Indexes Clustering Indexes Secondary Indexes Multilevel Indexes.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
Views Lesson 7.
Database structure and space Management. Database Structure An ORACLE database has both a physical and logical structure. By separating physical and logical.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
Maintaining a Database Access Project 3. 2 What is Database Maintenance ?  Maintaining a database means modifying the data to keep it up-to-date. This.
1 Distributed Databases BUAD/American University Distributed Databases.
Ing. Erick López Ch. M.R.I. Replicación Oracle. What is Replication  Replication is the process of copying and maintaining schema objects in multiple.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Use of ICT in Data Management AS Applied ICT. Back to Contents Back to Contents.
Spring 2003 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2003 Yanyong Zhang
MBA 664 Database Management Systems Dave Salisbury ( )
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Lec 7 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
Chapter 15 A External Methods. © 2004 Pearson Addison-Wesley. All rights reserved 15 A-2 A Look At External Storage External storage –Exists beyond the.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
CIS 250 Advanced Computer Applications Database Management Systems.
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
Chapter 11 Indexing And Hashing (1) Yonsei University 1 st Semester, 2016 Sanghyun Park.
SQL Basics Review Reviewing what we’ve learned so far…….
Distributed Databases
Database Applications (15-415) DBMS Internals- Part III Lecture 13, March 06, 2016 Mohammad Hammoud.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
Practical Database Design and Tuning
Data Indexing Herbert A. Evans.
Indexing Structures for Files and Physical Database Design
Record Storage, File Organization, and Indexes
CS 540 Database Management Systems
Indexing and hashing.
Physical Database Design and Performance
COMP 430 Intro. to Database Systems
Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.
Physical Database Design
Practical Database Design and Tuning
The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited)
Chapter 11 Indexing And Hashing (1)
Presentation transcript:

SCALING AND PERFORMANCE CS 260 Database Systems

Overview  Increasing capacity  Database performance  Database indexes B+ Tree Index Bitmap Index  Denormalization  Distributed databases  Improved application design

Increasing Capacity

 Database  Consists of one or more tablespaces  Tablespace  Logical structure stored on one or more datafiles  Datafile  Physical structure (file) that stores the database’s data  File structure depends on OS in which Oracle is running  Examples  A simple database may consist of a single tablespace that is stored on a single datafile  Another database may consist of multiple tablespaces each stored across multiple datafiles

Increasing Capacity  Enlarging an Oracle database  Add a datafile to a tablespace Allows more space for both new and existing objects  Add a new tablespace Allows more space for new objects  Increase the size of a datafile Allows more space for both new and existing objects

 Add a datafile to a tablespace Increasing Capacity

 Add a new tablespace

Increasing Capacity  Increase the size of a datafile  This solution will allow the datafile to automatically grow in size in 20M increments up to 1000M

Overview  Increasing capacity  Database performance  Database indexes B+ Tree Index Bitmap Index  Denormalization  Distributed databases  Improved application design

Database Performance  Measures  Response time Often measured in average query execution time  Throughput Often measured in transactions per second  These measures deteriorate as: The number of records stored in a database increases The volume of data stored in a table increases Particularly due to BLOB data The number of transactions that the database services increases The number of queries that join large tables increases

Database Performance  Why does performance deteriorate as the data volume increases?  New table records are stored in the first available tablespace segment Records within the same table are probably stored in different segments throughout the disk(s) Parent and child records with foreign key relationships are probably not stored in the same physical location Records that match a specific search condition are probably not stored in the same physical location  As a result, these operations may require multiple disk accesses, which are slow

Database Performance  Why does performance deteriorate as the number of transactions increases?  Uncommitted queries may cause tables to be “locked” Transactions can optionally “lock” an entire table or individual records in a table until committed or rolled back These locks can optionally allow locked records to be read but not modified  A transaction may need to wait until another transaction has released a lock on one or more records or tables

Database Performance  Approaches for improving system performance  Database indexes  Denormalization  Distributed databases  Improved application design

Overview  Increasing capacity  Database performance  Database indexes B+ Tree Index Bitmap Index  Denormalization  Distributed databases  Improved application design

Database Indexes  Oracle database indexes  Data in datafiles is not stored in any particular order DBMS places data in the next available segment in a tablespace  An index is a database object that stores information about data in a data structure that facilitates fast searching and sorting When queries contain search conditions or joins using an indexed field, the index is used to facilitate the searching and sorting Oracle index types B+ Tree index (default index) Bitmap index

B+ Tree Index  B+ Tree index (“balanced tree”)  Consists of leaf nodes and internal nodes each containing sorted database field values Every path from the root of the tree to a leaf is of the same length (“balanced”)  Each leaf node value is associated with a pointer to the corresponding database record The leaf node itself additionally points to the “next” leaf node  Each internal node value is associated with a pointer to a child node containing values less than the value and/or a pointer to a child node containing values equal to or greater than the value  All database field values (for the indexed field) are ultimately present in leaf nodes, forming a “dense” index The internal nodes at a given level form a “sparse” index, in which entries appear for only some of the database field values

B+ Tree Index

 B+ Tree updates  Insertion New values are added to leaf nodes If a leaf node has exceeded its maximum size, it is split into two sibling nodes and a new entry is added to their parent node If the parent node then has its maximum number of pointers, it too is split, and a new entry is added to its parent node  Deletion If a value’s deletion causes a node to have too few pointers, it is merged with a sibling If the maximum number of pointers is exceeded, the pointers need to be redistributed amongst its siblings This redistribution may require changes in internal nodes These steps propagate upwards when a deleted value is present in internal nodes

B+ Tree Index B+ Tree before and after insertion of “Adams”

B+ Tree Index B+ Tree before and after deletion of “Srinivasan”

B+ Tree Index  Duplicate values  If duplicate values are present in the indexed database field, their index search keys are made unique by creating a composite search key typically using the record’s primary key  Benefits  Maintains efficiency despite insert/update/delete operations  Very helpful for full ordered traversals  Most useful for unique or mostly-unique field values  Automatically created by Oracle for primary keys and fields with a UNIQUE constraint

B Tree Index  B Tree indexes are similar to B+ Tree indexes  Differences Internal node values point to database records in addition to pointing to child nodes Internal node values do not appear again in leaf nodes As a result, no linking between leaf nodes exists  Comparison Records with index values in internal nodes are found more quickly in a B Tree than in a B+ Tree B+ Trees allow a full ordered traversal more easily than B Trees due to the links between leaf nodes

Bitmap Index  Bitmap indexes are designed for efficiently querying tables using multiple field values  Records are assumed to be numbered sequentially  Done automatically by the database  A bitmap index is an array of bits that corresponds to a particular field value  One bitmap per field value  One bit per record  So, if a field has 2 distinct values amongst 5 records, then 2 bitmaps of 5 bits each will be used for the bitmap index If the n th record has value x, then the value of the n th bit in the bitmap for x will be 1 (the value of the n th bit in the bitmap for the other field value will be 0)

Bitmap Index

 Queries involving multiple bitmaped indexes are answered using bitmap operations  Intersection (AND)  Union (OR)  Complementation (NOT)  Each operation takes two bitmaps of the same size and applies the operation to get the result bitmap  Males with income level L1 (from previous example) AND = Only the first bit is 1, so only the first record matches

Bitmap Index  Benefits  Useful in situations where records in a given table may be queried using multiple field values Particularly useful when one or more of these fields have relatively little variation in values  Relatively little space overhead  Drawbacks  Updates are expensive

Database Indexes  Syntax for creating an index (Oracle)  B Tree Example  Bitmap Example CREATE [BITMAP] INDEX ON ( ) CREATE INDEX inst_lname_idx ON instructor(lname) CREATE BITMAP INDEX inst_info_idx ON instructor(gender, income_level)

Database Indexes  When should you create an index?  Query performance is objectionable  At least one of the tables in a common query contains a large number of records >100,000 records  One of the search/join fields in a common query contains a wide range of values

Overview  Increasing capacity  Database performance  Database indexes B+ Tree Index Bitmap Index  Denormalization  Distributed databases  Improved application design

Denormalization  Create a summary table that duplicates the data associated with common join queries  Create triggers that automatically update the summary table when underlying table values change  This is similar to materialized views… Denormalized Summary Table

Denormalization  Materialized view  Stores copies of the view fields in a separate table Normal views are just stored queries  These copies can be refreshed on demand or on commit  Materialized views can be configured to allow updates directly to the views These updates are then propagated to the original tables  Faster than using JOIN queries, but lots of system overhead and potential inconsistencies

Denormalization  Materialized view creation syntax  If FOR UPDATE is omitted, the data in the materialized view will be read-only  If REFRESH FAST ON COMMIT is present, the data in the materialized view will be updated when its underlying data is changed Other statements can be used with the REFRESH command to control the frequency with which the data in the view is updated CREATE MATERIALZED VIEW [FOR UPDATE] [REFRESH FAST ON COMMIT] AS

Overview  Increasing capacity  Database performance  Database indexes B+ Tree Index Bitmap Index  Denormalization  Distributed databases  Improved application design

Distributed Databases  A distributed database consists of networked servers running independent DBMS instances that work together  This fragmentation must be transparent to users  Distribution types  Full replication Every node runs the same DBMS and contains the same data  Homogeneous Every node runs the same DBMS but may contain different data Each node has the same schema design  Heterogeneous Nodes can run different DBMSs and can contain different schemas Nodes agree to share certain data values

Fully Replicated Distributed Databases  Consists of a publisher and subscribers  The publisher contains the master copy of the data  The subscribers receive updated copies from the publisher and deliver it to users Publisher Subscriber

Fully Replicated Distributed Databases  Replication approaches  Snapshot The publisher distributes a snapshot of the entire database to each subscriber  Transactional replication Changes are made to the publisher and either immediately or periodically distributed to subscribers  Merged replication Changes are made separately to the publisher and subscribers and are merged periodically Conflicting changes are controlled by a combination of transaction management and priority algorithms

Fully Replicated Distributed Databases  Advantage  If one site fails, others can take over  Queries may be processed by multiple nodes in parallel  Disadvantage  Time and resource intensive More space, processing, management, inconsistencies, etc.  As a result, fully replicated distributed databases are best for databases whose contents don’t change often

Homogenous Distributed Databases  Nodes have the same DBMSs and schemas but different data  The data stored at each node should be that most likely to be used by its local users  Fragmentation  How data is divided among nodes  Approaches Horizontal Vertical

Homogenous Distributed Databases  Horizontal fragmentation  All table fields are included at each node  Appropriate records are distributed to each location Typically determined via some field value Node 1 Node 2

Homogenous Distributed Databases  Vertical fragmentation  All table records are included at each location  Appropriate fields are distributed to each location Node 1Node 2 Billing Sales

Heterogeneous Distributed Databases  Nodes may have different DBMSs, schemas, and data  This makes query and transaction processing difficult  Users must be able to make requests in a database language used at their local sites  The heterogeneous system must appear as a single local database to users  Translations are required to allow communication between different nodes  DBMSs typically provide services to facilitate a heterogeneous connection to another node

Overview  Increasing capacity  Database performance  Database indexes B+ Tree Index Bitmap Index  Denormalization  Distributed databases  Improved application design

Improved Application Design  Bottlenecks are more likely to reside in application design rather than in the database itself  Create stored procedures for complex operations  Offload work to the database when possible Sorting, filtering, etc.  Don’t retrieve more data than you absolutely need  Use prepared statements for queries with user input

Improved Application Design  Use asynchronous queries  Fetch and display a subset of the requested data  Continue fetching records in the background while allowing the user to work in the foreground  May be accomplished using separate threads for query execution  Useful for time consuming queries or those that return lots of records

Summary  When throughput and/or response time is a problem in a relational database  Test  Create indexes  Denormalize  Modify the application  Create a distributed database Easier Harder