Download presentation
Presentation is loading. Please wait.
Published byStephen Cannon Modified over 8 years ago
1
IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Physical Database Implementation—Topics Reasons for concern Data Volume Analysis Data Usage Analysis Index Design
2
IMS 4212: Database Implementation 2 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Reasons for Concern Write the SQL query to calculate your GPA
3
IMS 4212: Database Implementation 3 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Reasons for Concern (cont.) Normalization spreads data around the database –Optimizes storage space needed (usually) –Protects against anomalies Normalization may make retrieval inefficient
4
IMS 4212: Database Implementation 4 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Reasons for Concern (cont.) Each retrieval efficiency enhancement introduces some level of inefficiency with respect to a fully normalized database We don’t introduce retrieval enhancements with wild abandon –Analyze the nature of the problem –Know capabilities of DBMS & Operating System –Select carefully from available options Must understand tradeoffs (advantages and disadvantages) of each retrieval enhancement
5
IMS 4212: Database Implementation 5 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu How Computers Work Data flows through different parts of the computer as application instructions are executed
6
IMS 4212: Database Implementation 6 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu SQL Server Data Storage Data in tables is stored on pages and there are eight pages per extent. When more space is needed an entire extent is added to the database Each row (record) in the database is physically stored on a page and in an extent Each row has a RowID that identifies it and it’s location in the page
7
IMS 4212: Database Implementation 7 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu SQL Server Data Storage (cont.) Without a clustered index (covered soon) rows are added to pages in the order of insertion. When pages are full rows are added to the next page in the extent. When extents are full new extents are created Tables keep track of the sequence of extents that contain their contents
8
IMS 4212: Database Implementation 8 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Data Retrieval By default, queries of tables require that each page be loaded into memory in sequence and each row examined to see if it meets the query conditions This is a full table scan
9
IMS 4212: Database Implementation 9 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Data Retrieval (cont.) The Page is the basic unit of IO –Entire page is moved from physical storage to RAM for evaluation In a pure table scan (the default method of retrieval) each record is examined to see if it matches the WHERE clause conditions (if any) –Test value and column value moved to CPU for testing –Records where condition is TRUE are added to result set Pages are cached and the cached copy will be read if available and needed
10
IMS 4212: Database Implementation 10 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Data Retrieval (cont.) In SQL Server page sizes are fixed at 8 KB –(Entire extent is 64 KB) –Some DBMS have different sizes –Some DBMS allow tuning on a table by table basis –8 KB is also the maximum record size Number of Records on a page depends on record size –Sum of data sizes of each column IO time for a pure scan increases with –Number of records –Record size
11
IMS 4212: Database Implementation 11 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Data Retrieval Costs Two levels of costs associated with data retrieval –Most Important: IO moving page from disk storage to RAM –Less Important: CPU effort to evaluate records –In default mode records cannot be evaluated until they have been moved into RAM We also care about physical storage space –Less important as a performance issue We also care about costs of reorganizing data as it is added to the DB or updated (later)
12
IMS 4212: Database Implementation 12 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Data Retrieval Costs (cont.) ALL Retrieval Enhancement mechanisms must be evaluated on the dimensions from the previous slide None of the enhancements come without cost Decisions affected by use of the data, not just pure database characteristics –Understanding organizational tasks and priorities key –Requires balance between technical and organizational knowledge –MIS graduates ideally positioned to participate in this analysis
13
IMS 4212: Database Implementation 13 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Data Retrieval Costs (cont.) Degree of the cost changes with many factors –Table sizes –Access mechanisms (paths—more later) –Nature of query –Number of tables needed in query –Nature of the enhancement approach Remember that our DB design goal of minimizing storage space and redundancy (normalization) spread data around the database –More tables containing transaction logic –More complicated queries
14
IMS 4212: Database Implementation 14 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Data Volume Analysis We don’t have retrieval problems with small tables Need to know how big a table will get over the life of the system to understand the potential magnitude of the problem Q: How many records are expected in the ENROLLMENT table? Document in the data dictionary –Estimate of number of records expected –How estimate was computed
15
IMS 4212: Database Implementation 15 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Data Volume Analysis (cont.) Estimating DV –Absolute count: We know there are 12 possible grades that can be contained in the GRADE table –Estimate: We think that we will have 32,000 students next year (use your statistics!) –Derived: Each enrolled student takes an average of four sections per semester and there are an average of 35 students per section –Historical trends: Enrollment is growing at 2% per year
16
IMS 4212: Database Implementation 16 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Data Volume Analysis (cont.) Don’t forget historical data! –Are graduated or withdrawn student records retained in the STUDENT and ENROLLMENT tables? –How long will they be kept? –What is the potential size of the ENROLLMENT table if records are never discarded? Precise entity definitions are critical in DVA Document where or how you came up with volume estimates
17
IMS 4212: Database Implementation 17 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Data Usage Analysis DUA is concerned with three factors –How frequently are tables accessed? –How urgent are the table accesses? –What is the access path into the table? Usually means what fields are being compared in a WHERE clause Fields in a JOIN expression Goal is to find the high frequency, important retrievals and to put enhancements on the path used by the retrieval
18
IMS 4212: Database Implementation 18 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Data Usage Analysis (cont.) Many frequency and urgency estimates will come from an analysis of the organization’s business practices and needs –What is max time a customer can be allowed to wait for a response? –How many sales take place a day? –Can this transaction take place in batch overnight? How many sales are made per hour? Do we expect it to grow? Consider electronic credit card clearing from retail stores
19
IMS 4212: Database Implementation 19 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Data Usage Analysis (cont.) The access path is the fields being searched to find appropriate records in a transaction What is the path taken through the sample ERD to: –Calculate your GPA? –Determine if you have met a course prerequisite? Don’t forget checks of operational business rules made in conjunction with a transaction –What if we had a business rule that said only students with a 3.0 GPA could take ISM 4212? –How about checking prerequisites?
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.