DAT353 Analysis Service: Server Internals Tom Conlon Program Manager SQL Server Business Intelligence Unit Microsoft Corporation.

Slides:



Advertisements
Similar presentations
Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Advertisements

Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Adam Jorgensen Pragmatic Works Performance Optimization in SQL Server Analysis Services 2008.
Cache –Warming Strategies for Analysis Services 2008 Chris Webb Crossjoin Consulting Limited
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.
Materialization and Cubing Algorithms. Cube Materialization Each cell of the data cube is a view consisting of an aggregation of interest. The values.
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
SQL Server Accelerator for Business Intelligence (SSABI)
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Tables Lesson 6. Skills Matrix Tables Tables store data. Tables are relational –They store data organized as row and columns. –Data can be retrieved.
Implementing Business Analytics with MDX Chris Webb London September 29th.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates
Architecting a Large-Scale Data Warehouse with SQL Server 2005 Mark Morton Senior Technical Consultant IT Training Solutions DAT313.
Understanding Analysis Services Architecture. Microsoft Data Warehousing Overview OLTP Source DTS DW Storage Analysis Services Clients OLE DB for OLAP,
MS Access: Database Concepts Instructor: Vicki Weidler.
© 2011 IBM Corporation 11 April 2011 IDS Architecture.
By Matthew Smith, John Allred, Chris Fulton. Requirements Relocation Protection Sharing Logical Organization Physical Organization.
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
Introduction to Solving Business Problems with MDX Robert Zare and Tom Conlon Program Managers Microsoft.
Analysis Services 101 Dave Fackler, MCDBA, MCSE, MCT Director, Business Intelligence Practice Intellinet Corporation.
Performance Tuning Cubes and Queries in Analysis Services 2008 Chris Webb
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.
Scaling SQL Server 2000 Analysis Services to the MAX Dave Wickert Program Manager SQL Server BI Practices Team Microsoft Corporation.
Introduction to OLAP / Microsoft Analysis Services
DAY 14: ACCESS CHAPTER 1 Tazin Afrin October 03,
Solving Business Problems In OLAP Services Using MDX – Part I Amir Netz – Dev Manager & Architect Ariel Netz – Program Manager SQL Server OLAP Services.
MySQL. Dept. of Computing Science, University of Aberdeen2 In this lecture you will learn The main subsystems in MySQL architecture The different storage.
Module 7 Reading SQL Server® 2008 R2 Execution Plans.
1 Cube Computation and Indexes for Data Warehouses CPS Notes 7.
Architecture Rajesh. Components of Database Engine.
Introduction to Analysis Services and OLAP Technology Tom Conlon and Rob Zare Program Managers SQL Server Business Intelligence Unit Microsoft Corp.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.
BI Terminologies.
DAT 360: DTS in SQL Server 2000 Best Practices Euan Garden Group Manager, SQL Server Microsoft Corporation.
Database Management Systems, R. Ramakrishnan and J. Gehrke 1 External Sorting Chapter 13.
Designing Aggregations. Performance Fundamentals - Aggregations Pre-calculated summaries of data Intersections of levels from each dimension Tradeoff.
Amit Bansal CTO | Peopleware India (unit of eDominer Systems) | |
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Virtual Memory Hardware.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Relational Operator Evaluation. Overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g.,
Session 1 Module 1: Introduction to Data Integrity
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
Introduction to Database Systems1 External Sorting Query Processing: Topic 0.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapters 13: 13.1—13.5.
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
for all Hyperion video tutorial/Training/Certification/Material Essbase Optimization Techniques by Amit.
Bigtable: A Distributed Storage System for Structured Data
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.
DAT 378 SQL Server 2000 Bringing The Best of Reporting Services and Analysis Services Together Sean Boon Program Manager, BI Systems
Hands-On Microsoft Windows Server 2008 Chapter 7 Configuring and Managing Data Storage.
CS4432: Database Systems II
October 15-18, 2013 Charlotte, NC Accelerating Database Performance Using Compression Joseph D’Antoni, Solutions Architect Anexinet.
Advanced Analysis Services Security Chris Webb Crossjoin Consulting Limited.
Select Operation Strategies And Indexing (Chapter 8)
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
6/12/2018 2:19 PM BRK3245 DirectQuery in Analysis Services: best practices, performance, and use cases Marco Russo SQLBI © Microsoft Corporation. All rights.
Informatica PowerCenter Performance Tuning Tips
CSE-291 (Cloud Computing) Fall 2016
Database Performance Tuning and Query Optimization
CHAPTER 5: PHYSICAL DATABASE DESIGN AND PERFORMANCE
Physical Database Design
Chapter 11 Database Performance Tuning and Query Optimization
Building a Threat-Analytics Multi-Region Data Lake on AWS
Presentation transcript:

DAT353 Analysis Service: Server Internals Tom Conlon Program Manager SQL Server Business Intelligence Unit Microsoft Corporation

Purpose of this Session Remove some of the mystery Remove some of the mystery Explain how it is that we do some things so much better than our competitors Explain how it is that we do some things so much better than our competitors Things are easier to understand when the internals are understood Things are easier to understand when the internals are understood Requirements: Requirements: – You already know the basics – this is for the experienced

Agenda Architecture Review Architecture Review Aggregations Aggregations Data and Dimension Storage Data and Dimension Storage Processing Processing Querying Querying Server Memory Management Server Memory Management Distinct Count Distinct Count

Architecture – Single Server OLAPStore Application ADO MD PivotTable Service OLEDB for OLAP AnalysisServerProcessing Querying AnalysisManager DSO SQL Server DataWarehouse Other OLE DB Providers OLEDB

Component Architecture - Query MMSMDSRV.EXE CACHE Server Storage Engine METADATA MANAGER MSOLAP80.DLL CACHE FORMULA ENGINE METADATA MANAGER AGENT FORMULA ENGINE MDX

Component Architecture - Management MMSMDSRV.EXE CACHE Server Storage Engine METADATA MANAGER MSOLAP80.DLL CACHE FORMULA ENGINE METADATA MANAGER AGENT FORMULA ENGINE METADATA MANAGER MSMDGD80.DLL DCube PARSER MSMDCB80.DLL DCube Storage Engine MDXDDL

Component Architecture - Distributed MMSMDSRV.EXE CACHE Server Storage Engine METADATA MANAGER MSOLAP80.DLL CACHE FORMULA ENGINE METADATA MANAGER AGENT FORMULA ENGINE METADATA MANAGER MSMDGD80.DLL MMSMDSRV.EXE CACHE Server Storage Engine METADATA MANAGER DCube PARSER MSMDCB80.DLL DCube Storage Engine MDX

Agenda Architecture Review Architecture Review Aggregations Aggregations Data and Dimension Storage Data and Dimension Storage Processing Processing Querying Querying Server Memory Management Server Memory Management Distinct Count Distinct Count

Why Aggregations? Aggregations can result in orders of magnitude improvement in performance Aggregations can result in orders of magnitude improvement in performance – Don’t have to access every fact table record to determine query result – Further savings with data compression – Biggest savings: reduce disk scan

Aggregations - Overview Customers All Customers CountryStateCityNameProduct All Products CategoryBrandNameSKU Facts custIDSKU Units Sold Sales $ $67.32 … Highest Level Aggregation CustomerProduct Units Sold SalesAllAll $345,212,301.3 Intermediate Aggregation countryCodeproductID Units Sold SalesCansd $23, USyu $57, …

Partial Aggregation Don’t want to create all possible aggregations Don’t want to create all possible aggregations – Data explosion! What if a query is made to a combination of levels where no aggregation exists? What if a query is made to a combination of levels where no aggregation exists? – Can compute from lower level aggregations – Don’t need to compute every possible aggregation Customers All Customers CountryStateCityNameProduct All Products CategoryBrandNameSKU Queries including a combination of Country and Brand can be answered if aggregation Country by Name exists.

Fact Table Highest level of aggregation (1,1,1,1,…) Most detailed Aggregations (m,m,m,…) Show me all sales for all products for all... Aggregations

Fact Table Show me all sales for all products for all... Most detailed Aggregations Highest level of aggregation Partial Aggregation

Aggregation Design Fact Table MonthProducts Quarter Pro. Family QuarterProduct Month

Fact Table Aggregation Design Results Result: aggregations designed in waves from the top of the pyramidResult: aggregations designed in waves from the top of the pyramid At 100% aggregations, ‘waves’ all touch: overkillAt 100% aggregations, ‘waves’ all touch: overkill 20-30% Generally adequate (0% for the smaller cubes)20-30% Generally adequate (0% for the smaller cubes)

Aggregation Design Which aggregations are more important than others? Which aggregations are more important than others? – All are equal – From design perspective, select the ones that result in overall improved query performance Usage Based Optimization: Weightings on each aggregation based on usage frequency Usage Based Optimization: Weightings on each aggregation based on usage frequency

Flexible and Rigid Aggregations ‘Flexible’ aggregations deleted when a changing dimension is incrementally processed. ‘Flexible’ aggregations deleted when a changing dimension is incrementally processed. ‘Rigid’ aggregations remain valid ‘Rigid’ aggregations remain valid Changing dimensions allow members to be moved, added and deleted. Changing dimensions allow members to be moved, added and deleted. After members move, only incremental process of dimension is required After members move, only incremental process of dimension is required A B C When member X is moved from a child of A to a child of C, all aggregations involving A or C are invalided X X

Aggregation Data Storage No impact on fact data or rigid aggregation data when changing dimension incrementally processed No impact on fact data or rigid aggregation data when changing dimension incrementally processed Flexible aggregations are invalidated when changing dimension incrementally processed Flexible aggregations are invalidated when changing dimension incrementally processed Data is in three files: Data is in three files: – partitionName.fact.data – partitionName.agg.rigid.data – [partitionName.agg.flex.data] Aggregations including (All) level are rigid (if all other levels in the agg are rigid) Aggregations with this level are always flexible Aggregations with this level are rigid (if all other levels in the agg are rigid) A B C X X

Incremental Dimension Processing (Chg Dimension) Query and process dimension data: Keys, member names, member properties For each cube using this dimension Delete flexible Aggs and indexes Start lazy indexing aggregating Potential Resource Competition during lazy processing after changing dimension incrementally processedPotential Resource Competition during lazy processing after changing dimension incrementally processed Fewer Aggregations!Fewer Aggregations! Result: Query performance degradationResult: Query performance degradation

Flexible Aggregation Demo demo demo

Agenda Architecture Review Architecture Review Aggregations Aggregations Data and Dimension Storage Data and Dimension Storage Processing Processing Querying Querying Server Memory Management Server Memory Management Distinct Count Distinct Count

Data Storage No data stored for empty member combinations No data stored for empty member combinations With compression – data storage approx 1/3 of space required in RDBMS source With compression – data storage approx 1/3 of space required in RDBMS source Data is stored by record in pages Data is stored by record in pages Each record contains all measures at an intersection of dimension members Each record contains all measures at an intersection of dimension members Record 1: mbr d1, mbr d2,…mbr dn m 1, m 2,…m n Record 2: mbr d1, mbr d2,…mbr dn m 1, m 2,…m n … Record 256: mbr d1, mbr d2,…mbr dn m 1, m 2,…m n Page

Data Structures Partition data stored in a file divided into Segments Partition data stored in a file divided into Segments Each Segment contains 256 pages (each with 256 records) = 64K records Each Segment contains 256 pages (each with 256 records) = 64K records Segment 1 … Page 1 Page 2 Page 3 Page 256 … Segment 2 Segment n … Data File Only last segment has fewer than 256 pages

Clustering Physical order of the records in each page and segment is organized to improve performance Physical order of the records in each page and segment is organized to improve performance – Keeps records with same or close members together – Similar in concept to SQL clustered index where data sorted by key values Try to minimize distribution of records with the same member across segments and pages Try to minimize distribution of records with the same member across segments and pages – Optimized, but no algorithm can keep records for the same member (unless the cube contains a single dimension) – Similarly – SQL can only have a single clustered index Records with identical dimension members can be in multiple segments Records with identical dimension members can be in multiple segments – Data is read and processed in chunks (more on this later…)

Indexing How is the data retrieved? How is the data retrieved? – Cubes can be in the terabyte range – Scanning data files not an option Need an index by dimension member Need an index by dimension member – Answers question “Where is the data associated with this combination of dimension members?” Map files provide this Map files provide this

Map Files There is a map for each dimension which indicates the page where the member is included in a data record MemberMap …… 1213 …… … Page 1 Page 2 Page 3 Page 256 Page 4 Page 5 Page 6 Segment 1 Dimension 1 Map MemberMap…… 1324 …… Dimension 2 Map To resolve a query containing a member from each dimension, get list of pages containing all members

Other Approaches Array Based Array Based – Normally allocates a cell for every combination. – Result: Data explosion - much more disk space and longer processing times Mix of Record and Array Mix of Record and Array – ‘Dense’ dimensions are record like – Sparse are array like Bit used per empty cell – sparsity explodes db sizes Bit used per empty cell – sparsity explodes db sizes – User chooses decides whether a dimension is dense or sparse

Agenda Architecture Review Architecture Review Aggregations Aggregations Data and Dimension Storage Data and Dimension Storage Processing Processing Querying Querying Server Memory Management Server Memory Management Distinct Count Distinct Count

Processing Buffer Memory Settings ‘Read-Ahead Buffer Size’ is the buffer containing data read from source db ‘Read-Ahead Buffer Size’ is the buffer containing data read from source db – Defined in Server Property Dialog. Default: 4Meg – Rarely important – little effect when changed Data is processed in chunks of ‘Process Buffer Size’ Data is processed in chunks of ‘Process Buffer Size’ – Defined in Server Property Dialog – Data is clustered within Process Buffer Size Bigger Process Buffer Size the better – make as big as possible Bigger Process Buffer Size the better – make as big as possible – Data for dimension members is clustered to keep data for ‘close’ members close together – The larger these memory settings are, the more effective clustering

Incremental Processing … … … … Original partition Incremental Partition + Two Step Process First, a partition is created with the incremental data First, a partition is created with the incremental data Second, the partition is merged with the original Second, the partition is merged with the original Complete Segments of both partitions left intact – incomplete ones are merged Complete Segments of both partitions left intact – incomplete ones are merged After many incremental processes, data distributed: degraded performance After many incremental processes, data distributed: degraded performance Reprocess (if you have a large Process Buffer size) can provide improved performance Reprocess (if you have a large Process Buffer size) can provide improved performance

Agenda Architecture Review Architecture Review Aggregations Aggregations Data and Dimension Storage Data and Dimension Storage Processing Processing Querying Querying Server Memory Management Server Memory Management Distinct Count Distinct Count

Querying a Cube CLIENT CLIENT – Select {[North America],[USA],[Canada]} on rows, Measures.members on columns from myCube Need two things: Need two things: – getting the dimension members – the axes – getting the data

Resolve Axis ‘Christmas trees’ Dimension members cached on client in ‘Client Member Cache’ Dimension members cached on client in ‘Client Member Cache’ – Levels with #members < Large Level Threshold sent in group – Levels with #members > Large Level Threshold retrieved as needed – Large Level Threshold default value:1000, can be changed in server property and in connection string Where members not cached, members and descendents retrieved to client until needed member retrieved Where members not cached, members and descendents retrieved to client until needed member retrieved Levels with members with 1000s of siblings result in degraded performance Levels with members with 1000s of siblings result in degraded performance Member cache not cleaned except for disconnect or when cube structure changes. Member cache not cleaned except for disconnect or when cube structure changes. Cached members Non-cached members Requested member

Client Data Cache Client retains data of previous queries in client data cache Client retains data of previous queries in client data cache Client Cache Size property controls how much data is in the client cache Client Cache Size property controls how much data is in the client cache – When 0: unlimited – 1-99 (inclusive), percent of physical memory – >99 use up to the value in KB Default value: 25 Default value: 25 When exceeded, client cache is cleaned at cube granularity When exceeded, client cache is cleaned at cube granularity

How Cubes Are Queried Segment 6 Segment 5 Segment 4 Segment 3 Segment 2 Segment 1 Partition: Canada Segment 6 Segment 5 Segment 4 Segment 3 Segment 2 Segment 1 Partition: Mexico Segment 6 Segment 5 Segment 4 Segment 3 Segment 2 Segment 1 Partition: USA DimensionMemory CacheMemory Client Data Cache Data on disk Service Client QueryProcessor

Agenda Architecture Review Architecture Review Aggregations Aggregations Data and Dimension Storage Data and Dimension Storage Processing Processing Querying Querying Server Memory Management Server Memory Management Distinct Count Distinct Count

Service Start Up Minimum Allocated Memory defines the amount of memory completely dedicated to the server Minimum Allocated Memory defines the amount of memory completely dedicated to the server All dimensions in the database are retained in memory All dimensions in the database are retained in memory – Tip: invalidate a dimension if not used in a cube Dimension requirements: ~125 bytes per member plus member properties Dimension requirements: ~125 bytes per member plus member properties – 1M members: 125M – With 25 char member property (eg, Address): 175M Large dimensions can migrate to separate process space Large dimensions can migrate to separate process space DimensionMemory Minimum allocated memory

During Processing Shadow dimensions Shadow dimensions – 2 copies of dimensions stored in memory while processing Processing Buffers Processing Buffers – Read Ahead Buffer size – Process Buffer Size If dimension and processing buffers memory requirements exceed Memory Conservation Threshold - no room for data cache If dimension and processing buffers memory requirements exceed Memory Conservation Threshold - no room for data cache DimensionMemory ShadowDimensions AvailableCache ProcessingBuffers Minimum allocated memory Memory conservation threshold

During Querying Data cache stores query data for reuse Data cache stores query data for reuse – Faster than retrieving from storage If Dimension Memory requirements > Memory Conservation Threshold, no Data Cache If Dimension Memory requirements > Memory Conservation Threshold, no Data Cache ‘Cleaner’ wakes up periodically to reclaim memory from data cache ‘Cleaner’ wakes up periodically to reclaim memory from data cache – BackgroundInterval registry setting. Default value: 30 seconds DimensionMemory Minimum allocated memory Memory conservation threshold AvailableCache <= 0.5 * (Minimum Allocated Memory+ Memory Conservation Threshold): No cleaning  0.5 * (Minimum Allocated Memory + Memory Conservation Threshold) and < Memory Conservation Threshold: mild cleaning  Memory Conservation Threshold: aggressive cleaning

Setting Server Properties demo demo

Agenda Architecture Review Architecture Review Aggregations Aggregations Data and Dimension Storage Data and Dimension Storage Processing Processing Querying Querying Server Memory Management Server Memory Management Distinct Count Distinct Count

Distinct Count Business Problem: Sales Manager wants to know: Business Problem: Sales Manager wants to know: – “How many customers are buying Computers?” – “How many active customers do I have?” 500 Printers 700 Games 2500 Business 1500 Home 4700 Software 800 Monitors 2000 Computers 3300 Hardware 8000All products Sales Number of Customers

Distinct Count: Changes to Data Structure DC Measure stored with each fact and aggregation record DC Measure stored with each fact and aggregation record – Just like a new dimension Data ordered by DC measure Data ordered by DC measure – “Order by” included in SQL statement during processing Number of records can be increased by orders of magnitude Number of records can be increased by orders of magnitude – Dependant on number of distinct values per record Sample aggregate record without Distinct Count… countryCodeproductID Units Sold Sales Cansd $23, …# records increases with distinct count on customers CustIDcountryCodeproductID Units Sold Sales132-45Cansd , Cansd , Cansd , Cansd , Cansd ,490.23

Distinct Count: Changes to Query Single thread per partition instead of per segment Single thread per partition instead of per segment – Unlike regular cubes, cannot do a single aggregation of results from each segment as a single value of the DC measure can cross segments – Consequently – performance impact Dimension slice requires much more disk scan than before Dimension slice requires much more disk scan than before – Segments clustered by DC measure – Expensive

Distinct Count Tips Keep DC measures in their own cube Keep DC measures in their own cube – All measures are retrieved on query – even if some are not asked for – Create virtual cube to merge DC with other measures Incremental processing DC cubes is very expensive Incremental processing DC cubes is very expensive – Segments restructured and reordered to keep records ordered by DC measure – Time and memory intensive

Distinct Count Tips Unlike regular cubes, best to distribute DC values evenly across each partition Unlike regular cubes, best to distribute DC values evenly across each partition – Most effective use of multiple threads for query processing If you have a dimenion that corresponds to DistinctCount Measure If you have a dimenion that corresponds to DistinctCount Measure – Aggregations recommended only on lowest level – (Example, Customer dimension in cube, Customer as distinct count measure)

Summary Architecture Review Architecture Review Aggregations Aggregations Data and Dimension Storage Data and Dimension Storage Processing Processing Querying Querying Server Memory Management Server Memory Management Distinct Count Distinct Count

Don’t forget to complete the on-line Session Feedback form on the Attendee Web site