Presentation is loading. Please wait.

Presentation is loading. Please wait.

Planning Data Warehouse Infrastructure

Similar presentations


Presentation on theme: "Planning Data Warehouse Infrastructure"— Presentation transcript:

1 Planning Data Warehouse Infrastructure
Module 2 The 20767A-MIA-SQL virtual machine that is used in the lab for this module includes software services that can take a while to start. For the best experience, have students start the 20767A-MIA- DC and 20767A-MIA-SQL virtual machines at the beginning of the module so that the services have time to start before students begin the lab. The forward links that are provided may refer to earlier versions of Microsoft® SQL Server® because documentation may not be available at the time of writing. Please check for any post- release content. Planning Data Warehouse Infrastructure

2 Planning Data Warehouse Hardware
Module Overview 2: Planning Data Warehouse Infrastructure Planning Data Warehouse Hardware

3 Lesson 1: Considerations for Data Warehouse Infrastructure
2: Planning Data Warehouse Infrastructure Planning for High Availability Question Which of these options is not a consideration for system sizing? ( )Option 1: Data volume ( )Option 2: Number of users ( )Option 3: Data center location ( )Option 4: Analysis and reporting complexity ( )Option 5: Availability requirements Answer (√) Option -2: Data center location Which four factors determine the size of a BI solution? Data volume. Analysis and reporting complexity. Number of users. Availability requirements. Discuss the considerations for deciding between single-server and distributed architecture. Higher hardware costs, software costs, and configuration complexity. Better scalability and performance, and better flexibility.

4 System Sizing Considerations
2: Planning Data Warehouse Infrastructure The suggested values are based on research that was conducted by the Fast Track Data Warehouse team at Microsoft, but are not definitive. Some organizations might consider a 5-TB data warehouse to be extremely large, whereas others might consider 100 users to be small. Similarly, the definitions for “simple,” “medium,” and “complex” queries are deliberately generic. The Fast Track sizing tool (available from warehousing/fast-track.aspx) uses the definitions that are listed below, which compare query Maximum Consumption Rate (MCR) to the rated MCR of the SQL Server and CPU combination. MCR is discussed later in this lesson. Suggested values: Simple: Query MCR is 90–100 percent of rated MCR. Medium: Query MCR is 50–89 percent of rated MCR. Complex: Query MCR is 25–49 percent of rated MCR. Data Volume Analysis/Report Complexity Number of Users Availability Requirements

5 Data Warehouse Workloads
2: Planning Data Warehouse Infrastructure ETL Control flow tasks Data query and insert Network data transfer In-memory data pipeline SSIS Catalog or msdb I/O Data Models Processing Aggregation storage Multidimensional on disk Tabular in memory Query execution Talk through each of the workloads in the module, highlighting the potential for different types to incur resource contention. Operations and Maintenance DW OS activity Logging SQL Server Agent Jobs SSIS packages Indexes Backups Reporting Client requests Data source queries Report rendering Caching Snapshot execution Subscription processing Report Server Catalog I/O

6 Typical Server Topologies for a BI Solution
2: Planning Data Warehouse Infrastructure DW Use a whiteboard or flipchart to create diagrams that show various distributed architectures while they are discussed. Point out that most guidance for estimating data warehouse hardware requirements assumes that the data warehouse is stored on a dedicated server. Single-Server BI Architecture Distributed BI Architecture Servers Few Many Hardware costs Software license costs Configuration complexity Scalability and performance Flexibility

7 Scaling Out a BI Solution
2: Planning Data Warehouse Infrastructure Analysis Services Data Warehouse Point out that scale-out solutions are usually limited to large BI solutions. Also, scale-out architectures for SSIS and the data warehouse are generally required only in extremely large-scale solutions that must support thousands of users who are performing complex queries against hundreds of terabytes of data. Before teaching this topic, make sure that you review the information in the further reading articles that are referenced in the student notes. Integration Services Reporting Services

8 Planning for High Availability
2: Planning Data Warehouse Infrastructure If students have experience of database engine high-availability technologies in previous releases of SQL Server, point out that database mirroring is superseded by the new AlwaysOn Availability Groups technology in this release. Integration Services (SSIS) in a Cluster Data Warehouse AlwaysOn Failover Cluster RAID Storage Analysis Services AlwaysOn Failover Cluster Reporting Services NLB Report Servers AlwaysOn Availability Group Or AlwaysOn Failover Cluster Integration Services AlwaysOn Availability Group

9 Lesson 2: Planning Data Warehouse Hardware
2: Planning Data Warehouse Infrastructure SQL Server Parallel Data Warehouse Question Which of these equations defines maximum consumption rate? ( )Option 1: (average logical writes ÷ average CPU time) × 4 ÷ 1024 ( )Option 2: (average logical reads ÷ maximum CPU time) × 12 ÷ 1024 ( )Option 3: (average physical reads ÷ minimum CPU time) × 8 ÷ 1024 ( )Option 4: (average logical reads ÷ average CPU time) × 8 ÷ 1024 ( )Option 5: (average physical reads ÷ average CPU time) × 8 ÷ 1024 Answer (√) Option -2: (average logical reads ÷ average CPU time) × 8 ÷ 1024 What factors are used in estimating CPU requirements? Average query size in megabytes. Maximum consumption rate. Number of concurrent users. Target response time. What factors are involved when planning a storage solution? Disk size Disk speed RAID DAS or SAN

10 SQL Server Fast Track Data Warehouse Reference Architectures
2: Planning Data Warehouse Infrastructure Pre tested and approved hardware specifications and guidance Available from multiple hardware vendors in partnership with Microsoft Support for a range of data warehouse sizes Tools provided to calculate required specification Although some may see this topic as a “marketing pitch,” many students can benefit from awareness of the Fast Track reference architectures, especially if their organizations already have supplier relationships with Fast Track hardware vendors. Using a Fast Track reference architecture can significantly reduce the time and cost that are required to build an effective data warehouse system. Make sure that you review the information on Fast Track Data Warehouse reference architectures in the technical articles that are mentioned in the student notes before teaching this topic. You may also want to consider downloading and demonstrating the Fast Track System Sizing Tool.

11 Core-Balanced System Architecture
2: Planning Data Warehouse Infrastructure Per-Core MCR = 200 MBps Total MCR = 1,600 MBps 2 x FC Port per processor Max I/O Rate = 2,000 MBps Fiber Switch .. Server SQL Server Windows Server Quad Core CPU Dual Port FC HBA Storage Enclosure Storage Processors 4-Spindle RAID 10 Disk Groups Point out that, instinctively, most database professionals want to start estimating hardware requirements for a data warehouse by determining the volume of data to be stored and choosing the biggest and fastest possible storage solution. The core-balanced approach starts with the throughput of the CPU core, and then builds a balanced system that is based on that metric. It is important to realize that MCR is purely a measure of SQL Server data throughput for a single core and does not include disk read operations or network I/O. The goal is to determine the core’s MCR and use that to calculate the number of cores that are required to support the target data throughput rate. You can then design the storage and network subsystem with a balanced set of components that will feed the data to the CPU cores at a sufficient rate to maintain the MCR. Max I/O Rate = 2,000 MBps Max I/O Rate = 1,800 MBps

12 Demonstration: Calculating Maximum Consumption Rate (MCR)
2: Planning Data Warehouse Infrastructure In this demonstration, you will see how to: Create tables for benchmark queries Execute a query to retrieve I/O statistics Calculate MCR from the I/O statistics Tell students that they will get a chance to try this for themselves in the lab. Preparation Steps Start the 20767A-MIA-DC and 20767A-MIA-SQL virtual machines. Demonstration Steps Create tables for benchmark queries Ensure that the 20767A-MIA-DC and 20767A-MIA-SQL virtual machines are both running, and then log on to 20767A-MIA-SQL as ADVENTUREWORKS\Student with the password Pa$$w0rd. In the D:\Demofiles\Mod02 folder, run Setup.cmd as Administrator. In the User Account Control window, click Yes. Start SQL Server Management Studio, and then connect to the MIA-SQL database engine by using Windows authentication. In SQL Server Management Studio, open the Create BenchmarkDB.sql query file from the D:\Demofiles\Mod02 folder. Click Execute, and then wait for query execution to complete. This query creates a database that contains two tables, one with a clustered index and one without. Both tables contain a substantial number of rows. Execute a query to retrieve I/O statistics In SQL Server Management Studio, open the Measure MCR.sql query file from the D:\Demofiles\Mod02 folder. Click Execute, and then wait for query execution to complete. The queries retrieve an aggregated value from each table, and are performed twice. This ensures that on the second execution (for which statistics are shown), the data is in cache, so the I/O statistics do not include disk reads. Note that the MAXDOP=1 clause ensures that only a single core is used to process the query. (More notes on the next slide)

13 Determining Processor and Memory Requirements
2: Planning Data Warehouse Infrastructure Estimating CPU Requirements: Determine core MCR Apply formula to estimate required number of cores: Spread cores across CPUs based on the number of storage arrays Estimating RAM Requirements: Use a minimum of 4 GB per core (or 64–128 GB per socket) Target 20% of data volume Emphasize that MCR is specific to the system on which it was calculated. However, similar processor and motherboard architectures are likely to exhibit comparable MCRs. If students intend to use systems from Fast Track Data Warehouse hardware partners, the MCR for these systems is published (and is usually 200 MBps). ((Average query size in MB ÷ MCR) x Concurrent users) ÷ Target response time

14 Determining Storage Requirements
2: Planning Data Warehouse Infrastructure Data Warehouse Estimating Data Volumes for the Data Warehouse Estimate Initial Fact Data Number of fact table rows × row size Use 100 bytes per row as an estimate if unknown Allow for Indexes and Dimensions Add 30–40% for dimensions and indexes Project Fact Data Growth Number of new fact rows per month Factor in compression Typically 3:1 Ask students for examples of data volumes that they have encountered. For some organizations, 500 million rows is a huge number of facts, whereas others might handle billions of fact table rows. Point out that, in practice, many data warehouses include multiple fact tables, and data volume estimation must take into account all fact tables in the data warehouse. Other storage requirements Configuration databases Log files tempdb Staging tables Backups Analysis Services models

15 Considerations for Storage Hardware
2: Planning Data Warehouse Infrastructure Use more smaller disks instead of fewer larger disks Use the fastest disks you can afford Consider solid state disks―especially for random I/O Use RAID 10, or minimally RAID 5 Consider a dedicated storage area network for manageability and extensibility Balance I/O across enclosures, storage processors, and disk groups When discussing the choice between mechanical and solid state disks, consider a hybrid solution that uses solid state disks for the 20 percent of data that is queried most frequently, and mechanical disks for the remaining 80 percent. If the students are interested in finding out more, refer them to the following link to vendor-specific solutions (scroll down the page): Find out more about Microsoft Data Warehouse Fast Track Partners

16 SQL Server Data Warehouse Appliances
2: Planning Data Warehouse Infrastructure Pre built hardware and software solutions based on tested configurations Part of a range of appliances that are based on SQL Server Available from multiple hardware vendors

17 SQL Server Parallel Data Warehouse
2: Planning Data Warehouse Infrastructure A special SQL Server edition only available in hardware appliances Shared-nothing architecture Massively parallel processing Dedicated control nodes, compute nodes, and storage nodes Dual Fiber Channel Database servers (compute nodes) Infiniband Storage Arrays Control Node Cluster Management Servers Landing Zone (ETL Interface) Backup Nodes

18 Lab: Planning Data Warehouse Infrastructure
Exercise 1: Planning Data Warehouse Hardware Point out that the lab instructions are deliberately designed to be high-level so that students need to think carefully about what they are trying to accomplish and work out how best to proceed. Encourage students to read the scenario information carefully and collaborate with each other to meet the scenario requirements. Remind students that, if they find a particular task or exercise too challenging, they can find step-by-step instructions in the lab answer key. Point out that, like all other labs in this course, students must start by running a setup script to prepare the lab environment. Exercise 1: Planning Data Warehouse Hardware Now that you have planned the server infrastructure, you must create a hardware specification for the data warehouse server. You will begin by calculating the MCR of the system that you are currently using, and then complete a planning worksheet for a new system with a published MCR figure. Logon Information Virtual machine: 20767A-MIA-SQL User name: ADVENTUREWORKS\Student Password: Pa$$w0rd Estimated Time: 30 minutes

19 20767A Lab Scenario 2: Planning Data Warehouse Infrastructure You are planning a data warehouse solution for Adventure Works Cycles, and have been asked to specify the hardware that is required. You must design a solution that is based on SQL Server that provides the right balance of functionality, performance, and cost.

20 20767A Lab Review 2: Planning Data Warehouse Infrastructure Review DWHardwareSpec.xlsx in the D:\Labfiles\Lab02\Solution folder. How does the hardware specification in this workbook compare to the one that you created in the lab? Question Review DWHardwareSpec.xlsx in the D:\Labfiles\Lab02\Solution folder. How does the hardware specification in this workbook compare to the one that you created in the lab? Answer Answers will vary. Key points about the suggested solution are: The amount of suggested memory is 64 GB per processor, and totals more than 20 percent of the data volume. The suggested storage solution is extensible (more disks can be added to the SAN) and offloads I/O processing overheads to the SAN. It also enables the solution to balance disk I/O for the storage arrays symmetrically with the processors.

21 Module Review and Takeaways
2: Planning Data Warehouse Infrastructure Review Question(s) Review Question(s) Question In a growing number of organizations, virtualization has become a core platform for infrastructure. Microsoft Hyper-V®, which is included as a feature from Windows 10, together with enterprise operations and management software such as Microsoft System Center, have enabled IT departments to benefit from simpler provisioning, management, mobility, and recoverability of services. What components of a BI infrastructure would you consider virtualizing, and why? Answer Many database professionals are resistant to virtualization, particularly regarding data warehouses, because of the additional layer of abstraction that it adds between the database server and the physical hardware (in particular, the disk subsystem). However, advances in virtualization, such as support for virtual host bus adapters in Windows Server, mean that a virtualized environment can provide near- equivalent performance to a physical server. There are also advantages with portability and recoverability because the entire data warehouse server can be easily copied to new physical hardware or backed up. For Reporting Services and Analysis Services, a virtualized infrastructure could be extremely beneficial because you can easily move entire virtual servers across physical hosts for hardware maintenance operations or to recover from a hardware failure. There is also the ability to dynamically scale up virtual hardware resources to match demand. For example, suppose an organization experiences a sharp rise in demand for report processing at the end of the financial year, and a corresponding drop in specific analysis as users shift their attention to generating year-end reports. If necessary, the allocation of physical memory and processor resources to the virtual server that is hosting Reporting Services could be increased by reducing the resources that are allocated to the virtual Analysis Services server. Then, after the year-end reporting activity is over, the resource allocations could be readjusted to support the normal workload balance. The SQLCAT team at Microsoft has conducted research into the performance of the SQL Server database engine and Analysis Services on virtualized infrastructure. You can review their findings at:


Download ppt "Planning Data Warehouse Infrastructure"

Similar presentations


Ads by Google