How In-Memory Affects Database Design

How In-Memory Affects Database Design
Louis Davidson

Who am I? Been in IT for over 19 years Microsoft MVP For 11 Years
Corporate Data Architect Written five books on database design Ok, so they were all versions of the same book. They at least had slightly different titles each time Basically: I love Database Design, and In-Memory technologies are changing the game Change it to planning the next version

Contact info Louis Davidson - Website – <-- Get slides here Twitter – SQL Blog Simple Talk Blog – What Counts for a DBA [twitter] Slides will be on drsql.org in the presentations area for this and the keynote as soon as I can get them out [/twitter]

Questions are Welcome Please limit questions to one’s I know the answer to.

A tasty allegory… Bacon is awesome
Bacon is awesome Bacon is an extremely powerful tool for rapid fat and calorie intake Even bacon isn't good for everything

Attention! This presentation was originally based on SQL Server 2014
SQL Server 2016 promises to greatly improve the feature set I will note where this does and does not affect your database design experience as I go along with asterisks *

The process I went through
Start with basic requirements Sales system Stream of customer and order data Apply In-Memory OLTP to see how it changed things Keep it very simple Learn a lot This presentation was borne out of what I learned from that process (and Kalen Delaney’s precon, whitepaper, and other reading that is linked throughout the slides) Build a test and apply what I have learned and morph until I get to what works Build something real in my day job, if applicable

Attention: There Is Homework (lots of it)
I can’t teach you everything about In-Memory in 1 mere hour, particularly the internals The code will be available/demonstrated, but it is still very rudimentary It will get you started, but is only just the tip of the iceberg

Introduction: What exactly is In-Memory OLTP in SQL Server 2014+?
A totally new, revamped engine for data storage, co-located in the same database with the existing engine Obviously Enterprise Only… Purpose built for certain scenarios* Terminology can be confusing Existing tables: Home - On-Disk, but ideally cached In-Memory In-Memory tables: Home - In-Memory: but backed up by On-Disk Structures If you have enough RAM, On-Disk tables are also in memory But the implementation is very very different In-Memory is both very easy, and very difficult to use

Design Basics (And no, I am not stalling for time due to lack of material)
Designing and Coding is Like the Chicken and the Egg Design is what you do before coding Coding patterns can greatly affect design Engine implementation can greatly affect design and coding patterns Developing software follows a natural process We will discuss how In-Memory technologies affect the entire design/development lifecycle I was first As if… Children Relics

Design Basics - Separate your design mind into (minimally) three phases
Conceptual/Logical (Overall data requirements in a data model format) Physical Implementation Choice Type of database system: Paper, Excel, Access, SQL Server, NoSQL, etc Engine choices: In-Memory, On-Disk, Compression, Partitioning, etc Note: Bad choices usually involve pointy hair and a magazine article with very little thinking and testing Physical (Relational Code) We will look at each of these phases and how in-mem may affect your design of each output

Conceptual/Logical Design (Though Not Everyone’s Is)
This is the easiest part of the presentation You still need to understand the customers needs and model Entities and Attributes Uniqueness Conditions General Predicates As I see it, nothing changes… …to type

Logical Data Model

Physical Implementation Overview
Client App TDS Handler and Session Management SQL Server.exe No improvements in communication stack, parameter passing, result set generation Key Memory-optimized Table Filegroup Engine for Memory_optimized Tables & Indexes Natively Compiled SPs and Schema Native Compiler Query Interop Existing SQL Component Parser, Catalog, Algebrizer, Optimizer Proc/Plan cache for ad-hoc T-SQL and SPs In-Memory OLTP Component Interpreter for TSQL, query plans, expressions 10-30x more efficient (Real Apps see 2-30x) Access Methods Buffer Pool for Tables & Indexes Reduced log bandwidth & contention. Log latency remains Reference how the demo took advantage of each of these areas of performance. Also note in the last build that if you spend all your time going through the TDS layers, you won’t get as much benefit from Hekaton as you might otherwise. Checkpoints are background sequential IO Transaction Log Data Filegroup

Physical Implementation (Technically it’s all software!)
Everything is different, and I am going to give just an overview of physical details… In-Mem data structures coexist in the database alongside On-Disk ones Data is housed in RAM, and backed up in Delta Files and Transaction Logs Delta files are stored as filestream storage The transaction log is the same one as you are used to (with lighter utilization) Tables and Indexes are extremely coupled MVCC (Multi-Valued Concurrency Control) used for all isolation

Physical Design (No, let’s not get physical)
Your physical design will almost certainly need to be altered from “normal” So much changes, even just changing the internal table structure In this section, we will discuss: Creating storage objects Table Creation Index Creation (which is technically part of the table creation)* Altering a Table’s Structure* Accessing (Modifying/Creating) data Using Normal T-SQL (Interop) Using Compiled Code (Native) Using a Hybrid Approach No Locks, No Latches, No Waiting

Creating Storage Objects - Tables
The syntax is the same as on-disk, with a few additional settings You have a durability choices Individual In-Mem Table: SCHEMA_ONLY or SCHEMA_AND_DATA Database level for transactions: Delayed (also for on-disk tables) Basically Asynchronous Log Writes Aaron Bertrand has a great article on this here: You also have less to work with... Rowsize limited to 8060 bytes (Enforced at Create Time) Not all datatypes allowed (LOB types,CLR,sql_variant, datetimeoffset, rowversion)* No check constraints * No foreign keys * Just one unique index per table * Every durable (SCHEMA_AND_DATA) table must have a unique index/ primary key Note: There are memory optimized temporary tables too: See Kendra Little’s article here:

Data quality…What if? Hello, my name is Fred Smith
A1046B Hey, we are the same person, why do I have two customer numbers? A1023B

Data quality…What if? Troublesome Extremely Troublesome
Two people are travelling to Indianapolis via train, and both order chicken from two different wait persons, but there is only one order of chicken still available Extremely Troublesome If Train A is given access to Location L on Track 1 at 11:30 AM, and Train B is given access to the same Location at the same time going in a different direction. Note: The “what if?” test ought to be applied to all of your designs

Dealing with Un-Supported Datatypes…
Say you have a table with 10 columns, but 1 is not allowed in a In-Memory table First: Ask yourself if the table really fits the criteria we aren’t done covering Second: If so, consider vertically partitioning CREATE TABLE In_Mem (KeyValue, Column1, Column2, Column3) CREATE TABLE On_Disk (KeyValue, Column4) It is likely that uses of disallowed LOB types wouldn’t be good for the OLTP aspects of the table in any case. Note: 2016 allows LOB (varbinary(max), nvarchar(max), varchar(max)) but it is still something you may need to consider, as memory isn’t free…

Creating Storage Objects - Index creation
Syntax is inline with CREATE TABLE Indexes are linked directly to the table 8 indexes max per table due to internals Only one unique index allowed (the primary key) * Indexes are never persisted, but are rebuilt on restart String index columns must be a binary collation (case AND accent sensitive)* Cannot index nullable column * Two types Hash Ideal for single row lookups Fixed size, you choose the number of hash buckets (approx 1-2 * # of unique values Bw Tree Best for range searches Very similar to a BTree index as you (hopefully) know it, but optimized for MVCC and pointer connection to table

A Taste of the Physical Structures
Basic data record for a row Record Header

Hash Index - Simplified
TableNameId Country OtherColumns 1 USA Values 2 3 Canada

Hash Index - Simplified
TableNameId Country OtherColumns 1 USA Values 2 Canada 3

Bw Tree Index – Even More Simplified

Do you want to know more? For more in-depth coverage
check Kalen Delaney's white paper ... Or for an even deeper (nerdier?) versions: “Hekaton: SQL Server’s Memory-Optimized OLTP Engine” or The Bw-Tree: A B-tree for New Hardware Platforms ( Books Online: TechDays Presentation: Buy Kalen Delaney’s Ebook: SQL Server 2016: In-Memory OLTP Enhancementss

Creating Storage Objects - Altering a Table *
The is the second easiest slide in the deck (to write!) No alterations allowed - Strictly Drop and Recreate* Cannot rename table

Demo In Slides – Preparing to (and actually) Creating tables

Setting the Database To Allow In-Mem
CREATE DATABASE HowInMemObjectsAffectDesign ON PRIMARY ( NAME = N'HowInMemObjectsAffectDesign', FILENAME = N‘Drive:\HowInMemObjectsAffectDesign.mdf' , SIZE = 2GB , MAXSIZE = UNLIMITED, FILEGROWTH = 10% ), FILEGROUP [MemoryOptimizedFG] CONTAINS MEMORY_OPTIMIZED_DATA ( NAME = N'HowInMemObjectsAffectDesign_inmemFiles', FILENAME = N'Drive:\InMemfiles' , MAXSIZE = UNLIMITED) LOG ON ( NAME = N'HowInMemObjectsAffectDesign_log', FILENAME = N'Drive:\HowInMemObjectsAffectDesign_log.ldf' , SIZE = 1GB , MAXSIZE = 2GB , FILEGROWTH = 10%); GO Add a filegroup to hold the delta files

Creating a Memory Optimized Permanent Table
CREATE TABLE Customers.Customer ( CustomerId integer NOT NULL IDENTITY ( 1,1 ) , CustomerNumber char(10) COLLATE Latin1_General_100_BIN2 NOT NULL, CONSTRAINT XPKCustomer PRIMARY KEY NONCLUSTERED HASH ( CustomerId) WITH ( BUCKET_COUNT = 50000), INDEX CustomerNumber NONCLUSTERED ( CustomerNumber) ) WITH ( MEMORY_OPTIMIZED = ON , DURABILITY = SCHEMA_AND_DATA) go

Creating a Memory Optimized Permanent Table
CREATE TABLE Customers.Customer ( CustomerId integer NOT NULL IDENTITY ( 1,1 ) , CustomerNumber char(10) COLLATE Latin1_General_100_BIN2 NOT NULL, CONSTRAINT XPKCustomer PRIMARY KEY NONCLUSTERED HASH ( CustomerId) WITH ( BUCKET_COUNT = 50000), INDEX CustomerNumber NONCLUSTERED ( CustomerNumber) ) WITH ( MEMORY_OPTIMIZED = ON , DURABILITY = SCHEMA_AND_DATA) go Character column must be binary to index/compare in native code * Hash Index used for Primary Key. Estimated Rows in Table 25000 Bw Tree Index on Customer Number This table is memory optimized (ok, that was kind of obvious) This table is as durable as the database settings allow

Accessing the Data - Using Normal T-SQL (Interop)
Using typical interpreted T-SQL Most T-SQL will work with no change (you may need to add isolation level hints, particularly in explicit transaction) A few Exceptions that will not work TRUNCATE TABLE - This one is really annoying :) MERGE (In-Mem table cannot be the target) Cross Database Transactions (other than tempdb) Locking Hints

Accessing the Data using Compiled Code (Native)
Instead of being interpreted, the stored procedure is compiled to machine code Limited syntax (Like programming with both hands tied behind your back) Allowed syntax is listed in what is available, not what isn't Some really extremely annoying ones: SUBSTRING supported; LEFT, RIGHT, not so much No Subqueries * OR, NOT, IN, not supported in WHERE clause * String Comparisons must be with columns of Binary Collation * Can’t use on-disk objects (tables, sequences, views, etc) Can’t call a stored procedures from another stored procedure * So you may have to write some "interesting" code

Demo In Slides – Native Stored Procedure

Creating a Natively Optimized Procedure (I write my C# the new fashioned way, with T-SQL)
Works just like for views and functions. Can’t change the underlying object while this object references it CREATE PROCEDURE Customers.Customer$CreateAndReturn @Parameter1 Parameter1Type = 'defaultValue1', @Parameter2 Parameter2Type = 'defaultValue2', … @ParameterN ParameterNType = 'defaultValueN‘ WITH NATIVE_COMPILATION, SCHEMABINDING, EXECUTE AS OWNER AS BEGIN ATOMIC WITH ( TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = N'us_english' ) <code> END There is no Ownership chaining. All code executes as the procedure owner Alert parser that this will be a natively compiled object Procedures are atomic transactions

Accessing Data Using a Hybrid Approach
Native code is very fast but very limited (* Still true, but less so) Use Native code where it makes sense, and not where it doesn’t Example: Creating a sequential value In the demo code I started out by using RAND() to create CustomerNumbers and SalesOrderNumbers. Using a SEQUENCE is far more straightforward So I made one interpreted procedure that uses the SEQUENCE outside of native code, then calls the native procedure

Accessing the Data - No Locks, No Latches, No Waiting
On-Disk Structures use Latches and Locks to implement isolation In-Mem use Optimistic-MVCC You have 3 Isolation Levels: SNAPSHOT, REPEATABLE READ, SERIALIZABLE Evaluated before, or when the transaction is committed This makes manual data integrity checking "interesting" Essential difference, your code now must handle errors

Concurrency is the #1 difference you will deal with
Scenario1: 2 Connections - Update Every Row In 1 Million Rows Any Isolation Level On-Disk Either: 1 connection blocks the other Or: Deadlock In-Mem One connection will fail, saying: “the row you are trying to update has been updated since this transaction started” EVEN if it never commits.

Another slide on Concurrency (Because if I had presented it concurrently with the other one, you wouldn’t have liked that) Scenario2: 1 Connection Updates All Rows, Another Reads All Rows (In an explicit transaction) On-Disk Either: 1 connection blocks the other Or: Deadlock In-Mem Both Queries Execute Immediately In SNAPSHOT ISOLATION the reader will always succeed In REPEATABLE READ or SERIALIZABLE Commits transaction BEFORE updater commits: Success Commits transaction AFTER updater commits: Fails

The Difficulty of Data Integrity
With on-disk structures, we used constraints for most issues (Uniqueness, Foreign Key, Simple Predicates) With in-memory code, we have to implement in stored procedures Uniqueness on > 1 column set suffers from timing (If N connections are inserting the same data...MVCC will let them) * Foreign Key type checks can't reliably be done because: * In Snapshot Isolation Level, the row may have been deleted while you check In Higher Levels, the transaction will fail if the row has been updated Check constraint style work can be done in stored procedures for the most part. Note: Constraints in 2016 will often be more important that for on disk tables because of the lack of blocking operations

Problem: How to Implement Uniqueness on > 1 Column Set: INDEXED VIEW?
CREATE VIEW Customers.Customers$UniquenessEnforcement WITH SCHEMABINDING AS SELECT customerId, Address, customerNumber FROM customers.Customer GO CREATE UNIQUE CLUSTERED INDEX Address ON Customers.Customers$UniquenessEnforcement ( Address) GO Msg 10794, Level 16, State 12, Line 8 The operation 'CREATE INDEX' is not supported with memory optimized tables.

Problem: How to Implement Uniqueness on > 1 Column Set: Multiple Tables?
Wow, that seems messy… And what about duplicate customerId values in the two subordinate tables?

Problem: How to Implement Uniqueness on > 1 Column Set: Simple code
You can’t…exactly. But what if EVERY caller has to go through the following block: INT = CustomerId FROM Customers.Customer WHERE Address is null… Do your insert This will stop MOST duplication, but not all. Two inserters can check at the same time, and with no blocks, app locks, or constraints even available, you may get duplicates. Remember the term: Optimistic Concurrency Control Even still, this sort of code is reducing the value, isn’t it?

Foreign Keys and Unique Index/Constraints in 2016 (Pure conjecture based on how things work now)
In the traditional engine, these are implemented with locks In the in-mem engine, you have to expect that it will be implemented much like the isolation levels Basically, if two transactions do operations that would have blocked, the other connection will likely fail either: At COMMIT (Currently PRIMARY KEY Violations fail at COMMIT) At first sign of trouble (As is the case when you modify existing resources)

When Should You Make Tables In-Memory – Microsoft's Advice (2014)
From Implementation Scenario Benefits of In-Memory OLTP High data insertion rate from multiple concurrent connections. Primarily append-only store. Unable to keep up with the insert workload. Eliminate contention. Reduce logging. Read performance and scale with periodic batch inserts and updates. High performance read operations, especially when each server request has multiple read operations to perform. Unable to meet scale-up requirements. Eliminate contention when new data arrives. Lower latency data retrieval. Minimize code execution time. Intensive business logic processing in the database server. Insert, update, and delete workload. Intensive computation inside stored procedures. Read and write contention. Minimize code execution time for reduced latency and improved throughput. Low latency. Require low latency business transactions which typical database solutions cannot achieve. Low latency code execution. Efficient data retrieval. Session state management. Frequent insert, update and point lookups. High scale load from numerous stateless web servers. Optional IO reduction or removal, when using non-durable tables

When Should You Make Tables In-Memory Louis's Advice
Read Microsoft’s Opinion Here: Things to factor in High concurrency needs/Low chance of collisions Minimal uniqueness protection requirements * Minimal data integrity concerns (minimal key update/deletes) * Limited searching of data (binary comparisons only) * Limited need for transaction isolation/Short transactions You are able to answer all “What If?” scenarios successfully. Basically, the “very hot” tables in a strict OLTP workload... I don’t see this changing, but the scenarios where it fits will expand in 2016 NOT a way to “FIX” bad code… Not at all… In fact, most applications will need to be re-engineered to deal with MVCC.

The Choices I made Louis has improved his methods for estimating performance, but your mileage will still vary. Louis’ tests are designed to reflect only one certain usage conditions and user behavior, but several factors may affect your mileage significantly: How & Where You Put Your Logs Computer Condition & Maintenance CPU Variations Programmer Coding Variations Hard Disk Break In Therefore, Louis’ performance ratings are a minimally useful tool for comparing the performance of different strategies but may not accurately predict the average performance you will get. I seriously suggest you test the heck out of the technologies yourself using my code, your code, and anyone else’s code you can to make sure you are getting the best performance possible. The Choices (For Me) will differ in 2016…

Model Choices – Logical Model

Model Choices – Physical Model

Model Choices – Tables to Make In-Mem (First Try)

Model Choices – Tables to Make In-Mem (Final 2014 Thinking)

The Grand Illusion (So you think your life is complete confusion)
Performance gains are not exactly what you may expect, even when they are massive In my examples (which is available on my website), I discovered when loading rows (10 connections of 2000 each) (Captured using Adam Machanic's tool) On-Disk Tables with FK and Instead Of Trigger seconds per row - Total Time – 1:12 On-Disk Tables without FK or Instead Of Trigger seconds per row - Total Time – 0:51 In-Mem Tables using Interop code seconds per row - Total Time 0:44 In-Mem Tables with Native Code second per row - Total Time – 0:31 In-Mem Tables, Native Code, SCHEMA_ONLY – seconds per row - Total Time – 00:30 In-Mem Tables (except CustomerAddress), Hybrid code – – Total Time – 0:42 In-Mem Tables using 2016 enhancements – Coming soon to a SQLblog near you when enough features available But should it be a lot better? Don't forget the overhead... (And SQLQueryStress has extra for gathering stats)

Contact info Louis Davidson - Website – <-- Get slides here Twitter – SQL Blog Simple Talk Blog – What Counts for a DBA [twitter] Slides will be on drsql.org in the presentations area for this and the keynote as soon as I can get them out [/twitter]

As Much Code Review As We Have Time For!
Demo As Much Code Review As We Have Time For!

How In-Memory Affects Database Design

Similar presentations

Presentation on theme: "How In-Memory Affects Database Design"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

How In-Memory Affects Database Design

Similar presentations

Presentation on theme: "How In-Memory Affects Database Design"— Presentation transcript:

Similar presentations

About project

Feedback