SQLUG.be Case study: Redesign CDR archiving on SQL Server 2012 By Ludo Bernaerts April 16,2012.

Slides:



Advertisements
Similar presentations
Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Advertisements

Adam Jorgensen Pragmatic Works Performance Optimization in SQL Server Analysis Services 2008.
Data Management and Index Options for SQL Server Data Warehouses Atlanta MDF.
Parallel Query Processing in SQL Server Lubor Kollar.
SQL SERVER 2012 XVELOCITY COLUMNSTORE INDEX Conor Cunningham Principal Architect SQL Server Engine.
Big Data Working with Terabytes in SQL Server Andrew Novick
Dos and don’ts of Columnstore indexes The basis of xVelocity in-memory technology What’s it all about The compression methods (RLE / Dictionary encoding)
IS 4420 Database Fundamentals Chapter 6: Physical Database Design and Performance Leon Chen.
Making Data Warehouse Easy Conor Cunningham – Principal Architect Thomas Kejser – Principal PM.
Architecting a Large-Scale Data Warehouse with SQL Server 2005 Mark Morton Senior Technical Consultant IT Training Solutions DAT313.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
CS 345: Topics in Data Warehousing Thursday, October 28, 2004.
Exam QUESTION CertKiller.com has hired you as a database administrator for their network. Your duties include administering the SQL Server 2008.
Systems analysis and design, 6th edition Dennis, wixom, and roth
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
1 © Prentice Hall, 2002 Physical Database Design Dr. Bijoy Bordoloi.
ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Loading Ola Ekdahl IT Mentors 9/12/08.
CS 345: Topics in Data Warehousing Tuesday, October 19, 2004.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
TM 7-1 Copyright © 1999 Addison Wesley Longman, Inc. Physical Database Design.
Data and its manifestations. Storage and Retrieval techniques.
Architecture Rajesh. Components of Database Engine.
Columnstore Indexes in SQL Server 2012 Conor Cunningham Principal Architect, Microsoft SQL Server Representing Microsoft Development.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
Unit 6 Data Storage Design. Key Concepts 1. Database overview 2. SQL review 3. Designing fields 4. Denormalization 5. File organization 6. Object-relational.
Database Management COP4540, SCS, FIU Physical Database Design (ch. 16 & ch. 3)
© All rights reserved. U.S International Tech Support
Indexes / Session 2/ 1 of 36 Session 2 Module 3: Types of Indexes Module 4: Maintaining Indexes.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
Database Fundamental & Design by A.Surasit Samaisut Copyrights : All Rights Reserved.
Indexes and Views Unit 7.
SQL/Lesson 7/Slide 1 of 32 Implementing Indexes Objectives In this lesson, you will learn to: * Create a clustered index * Create a nonclustered index.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Chapter 4 Logical & Physical Database Design
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
Session 1 Module 1: Introduction to Data Integrity
INTRODUCING SQL SERVER 2012 COLUMNSTORE INDEXES Exploring and Managing SQL Server 2012 Database Engine Improvements.
IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida Physical Database Implementation—Topics.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
 An independent SQL Consultant  A user of SQL Server from version 2000 onwards with 12+ years experience.
Best Practices in Loading Large Datasets Asanka Padmakumara (BSc,MCTS) SQL Server Sri Lanka User Group Meeting Oct 2013.
Aggregator Stage : Definition : Aggregator classifies data rows from a single input link into groups and calculates totals or other aggregate functions.
October 15-18, 2013 Charlotte, NC Accelerating Database Performance Using Compression Joseph D’Antoni, Solutions Architect Anexinet.
Boosting DWH-Performance with SQL Server 2016 ColumnStore Index.
--A Gem of SQL Server 2012, particularly for Data Warehousing-- Present By Steven Wang.
Turbocharge your DW Queries with ColumnStore Indexes Susan Price Senior Program Manager DW and Big Data.
Select Operation Strategies And Indexing (Chapter 8)
Doing fast! Optimizing Query performance with ColumnStore Indexes in SQL Server 2012 Margarita Naumova | SQL Master Academy.
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
Introducing Hekaton The next step in SQL Server OLTP performance Mladen Prajdić
Introduction to columnstore indexes Taras Bobrovytskyi SQL wincor nixdorf.
With Temporal Tables and More
Columnstore Indexing: From SQL Server 2012 to SQL Server 2014
Indexing Structures for Files and Physical Database Design
Lecture 16: Data Storage Wednesday, November 6, 2006.
Physical Database Design and Performance
Blazing-Fast Performance:
Migrating a Disk-based Table to a Memory-optimized one in SQL Server
ColumnStore Index Primer
Introduction to columnstore indexes
Physical Database Design
11/29/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks.
The Five Ws of Columnstore Indexes
The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited)
Sunil Agarwal | Principal Program Manager
Four Rules For Columnstore Query Performance
Clustered Columnstore Indexes (SQL Server 2014)
SQL Server Columnar Storage
Presentation transcript:

SQLUG.be Case study: Redesign CDR archiving on SQL Server 2012 By Ludo Bernaerts April 16,2012

Agenda Purpose of the CDR archiving Why the redesign The current environment The new environment Columnstore index, what is it? CSIX brief explanation

Agenda Initial loading process Threating the loaded data How to deal with CSIX Partition layout Database layout Process Flow Q & A

What is the purpose of CDR archiving -Creation of an archive for Call Data Records with a retention period of 36 months. -Extract Call information from this archive system on daily base -Extension of an existing dataflow 1 Month3 Year

What is the purpose of CDR archiving -The first goal is the ability for the financial department to deal with the disputes. When dispute occurs, finance need to be able to prove that this customer sent us the traffic. -CDR’s for specific customers usually retrieved for a 15 days or one month period depending on the billing cycle run. There is also the option to filter out the request for some customers who send a lot of traffic. -Last but not least there are technical network guys launching requests on the system to track and find issues on routes used by the calls.

Why upgrading To implement new solutions to improve query performance and manageability Up till partitions so we can partition on daily base Currently on 1000 partitions possible with SQL 2008 Colum store index Ready to accept more load Currently only +/- 20 queries a day possible (old system was designed for 15) Goal is to go to more than 50 queries a day en beyond Ready for the future Currently only CDR’s stored Future, XDR archiving needed (A lot more data volume) Make use of the same Hardware infrastructure in a more efficient way

The current system (1) SQL Server 2008 used Size: 11TB allocated Uncompressed data  > 10 TB Compressed data  > 6TB (Page level) Page compression activated 3 file groups created on DB FG_CDR_CALL  Voice Call fact data  10 database files (1 file/Disk) FG_CDR_DATA  Data Call fact data  1 database file FS_DATA  Staging & dimension tables  1 database file

The current system (2) Storage on SAN Transaction Log files & staging data on FC disks, other mix SATA / FC Staging table unused Daily import directly to partitioned tables Import transaction based, no bulk load 3 years of data in database Partitioned on week level Only one clustered index on tables, no other indexes implemented

The current system (3)

The new system (1) -Keep the cost low -A new redesigned database layout for better I/O performance -More file-groups with one file / file-group to make sure I/O reading is sequential and not random -Make also use of page compression -Make use of new type of index available in SQL Server column store index

The new system (2) -More partitions possible, max Partitions on daily base instead of weekly -3 years = 1100 partitions -Easier to clean up out of date CDR’s -Smaller datasets to Query -Make use of staging tables for faster load (bulk load) -Do 6 parallel load streams for daily load

ColumnStore index, what is it? Result of Project Apollo Index that stores data column-wise instead of row-wise Create for data warehouse purpose. For Fact tables and large dimension tables (> 10Mlj) Index optimized for scans Only one per Table and currently only non-clustered Must contain columns from cluster index key No concept of key columns, 16 limit doesn’t apply If partitioned, must contain partitioning column No 900 bytes limit It’s for read only purpose

ColumnStore index (1) Data Stored in a Traditional “Row Store” Format Same in column store format ColumnStore Ross Sherry Gus Stan Lijon San Francisco New York Seattle San Jose Sacramento CA NY WA CA CA Row Store 1RossSan FranciscoCA 2SherryNew YorkNY 3GusSeattleWA 4StanSan JoseCA 5LijonSacramentoCA

ColumnStore index (2) Each page stores data from a single column

Colum store index (3) Base table 1M rows/group New system table: sys.column_store_segments Includes segment metadata: size, min, max, … Column store index S e g m e n t d i r e c t o r y Blobs Row group2 Row group1 Row group3 Row group

Colum store index limitations (1) -Limited support for data types. Following data types are not supported: -No Decimals and numeric with precision > 18 digits -binary or varbinary -(N)text and image -(N)varchar(Max) -uniqueidentifier -rowversion (and timestamp) -sql_variant -blob -CLR types (hierarchyid and spatial types) -No Datetimeoffset with scale > 2 -xml

Colum store index limitations (2) -Other limitations -No Page or Row compression on CSIX -Cannot be unique -Can’t act as primary- or foreign key -Tables and Columns using Change Data Capture -No columns with Filestream support -No Replication technology -No computed columns or sparse columns -No filtered CSIX -# Columns <= Not on indexed view -No include -, ASC or DESC keywords

ColumsStore index performance (1) Higher Query speed, why? data organized in a column, shares many more similar characteristics than data organized across rows this result in higher level of compression Use of VertiPaq compression algorithm technology also more superior than SQL compression algorithm (available in Analyse server for PowerPivot in SQL 2008R2) Less I/O transferred from disk to Mry Fetched data only for columns needed by query (Bitmap filter optimization) Algorithms are optimized to take better advantage of modern hardware (More Cores, more Ram,..) Batch mode process

ColumsStore index performance (2) ColumnStore Compression Encoding – convert to integers Value based encoding Dictionary (hash) encoding Row reordering Find optimal ordering of rows Proprietary algorithm (Vertipaq) Compression Run length encoding Bit packing 1.8 X better compression than SQL’s page compression

ColumsStore index performance (3) Batch-mode processing: is a new, highly-efficient vector technology that works with columnstore indexes. Check Query plan for execution mode. A batch is stored as a vector in a separate area of memory and represent +/ rows of data Operators that can run in batch mode Scan Filter Hash aggregate Hash join Batch hash table build

ColumsStore index performance (4) IO and caching New large object cache Cache for columns segments and dictionaries Aggressive read ahead At segment level At page level within a segment Early segment elimination based on segment metadata Min and Max values stored in segment metadata

ColumnStore Index additional info (1) Some Considerations Creation takes +/- 1.5 times normal index More Mry needed Memory grant request in MB = [(4.2 *Number of columns in the CS index) + 68]*DOP + (Number of string cols * 34)

ColumnStore Index additional info (1) The command Idem as other create index statements but with less options CREATE [ NONCLUSTERED ] COLUMNSTORE INDEX index_name ON ( column [,...n ] ) [ WITH DROP_EXISTING = { ON | OFF } [ MAXDOP = x ] ) ] [ ON { { partition_scheme_name ( column_name ) } | filegroup_name | "default" } ]

ColumnStore Index additional info (2) Some DMV’s sys.column_store_index_stats sys.column_store_segments sys.column_store_dictionaries

How to deal with CSX during CDR load -Conflict: Column store index is read only  Data need daily update -Solution: Split into fix part & updatable part -Remark: > 99% of data for a partition is loaded the first 3 days -After 3 days we can switch the data to the fix part and make use of column store index.

The loading

How loaded data is treated.

How to deal with CS index limitations (1) -Split done on fact tables -3 main fact tables with column store index and 1100 partitions -3 secondary fact table without column store index but with 1100 partitions -Loading happens into 6 different staging tables -Main staging tables with segment date = call date, one partition -Secondary staging tables with segment date <> call date, one partition

How to deal with CS index limitations (2) -All loaded data by default transferred to secondary fact tables -Main staging by switch in, Secondary by insert -Partition on secondary fact table with call date = today – 3 days is moved to main table -A clean-up jobs runs to check secondary fact tables. Rows are moved to main table partitions starting with the partitions containing most of the data.

How to deal with CS index limitations (3) Views are used to query both tables with call_reduced as ( select filed1, Field2,... from CDR_Schema.Tbl_ibis_Call_reduced_Prim Inner join.... on... Where a and b union all select filed1, Field2,... from CDR_Schema.Tbl_ibis_Call_reduced_Sec Inner join.... on... Where a and b) Select filed1, Field2,... From Call_reduced

Partition & File group Layout -Partitions based on call_dt (daily base) partitions (3 years of data) -For each fact table same partition function but other schema -Partitions divided over multiple file-groups in a round- robin way -For every file-group, one data file -10 file-groups created / main fact table

Filegroup layout -Possibility to extent nr of file groups, if needed

Database layout

Daily data flow

Clean up process

Clean up process.

What about the performance Some results 30 query’s executed

Results in detail

Some sample of the used data

Is there something you want to ask? Q & A