SQL Server Internals & Architecture

SQL Server Internals & Architecture
Naomi Williams, SQL DBA LinkedIn

Free Training at SQL Saturday is not possible without our Sponsors.
Thank You Sponsors! Free Training at SQL Saturday is not possible without our Sponsors.

Tips Get a copy, study it and become a master! #SQLHELP

Why do I need to know SQL Internals?
Do you work in SQL DBA Developer BI Architect Atomicity is an all-or-none proposition. Consistency guarantees that a transaction never leaves your database in a half-finished state. Isolation keeps transactions separated from each other until they’re finished. Durability guarantees that the database will keep track of pending changes in such a way that the server can recover from an abnormal termination.

Agenda ACID SQLOS Components of database engine
Components of storage engine Cache aging Differences of Hekaton

Why is there ACID in my SQL Server?
ACID properties of Transactions Atomic Consistent Isolated Durable Atomicity is an all-or-none proposition. Consistency guarantees that a transaction never leaves your database in a half-finished state. Isolation keeps transactions separated from each other until they’re finished. Durability guarantees that the database will keep track of pending changes in such a way that the server can recover from an abnormal termination.

Welcome to SQL Server! Where does the path begin?

SELECT SNI (SQL Server Server Network Interface) SQL Server Interface
Protocols TCP/IP Shared Memory Named Pipes Virtual Interface Adapter (VIA) SQLOS SQL Server Interface aka API TDS packets SNI (SQL Server Client Network Interface) simple SELECT statement. Our guide will connect from his client to the protocol layer that is exposed in the API using TDS (tabular data stream) endpoints. TDS packets are encapsulated by the SNI on both the server and client computer. There’s one TDS for each protocol, plus one for the DAC. Protocol Layers: * Shared Memory, Named Pipes, TCP/IP and VIA The protocol layer will probably use TCP/IP, VIA (virtual interface adapter), or maybe Named Pipes to further connect to the SQL Server. The protocol layer reverses the work of the SNI, unwrapping your packet to see what it contains. The SNI on the server side are in the network libraries that are part of the database engine. The SNI on the client side are the network libraries that are part of the SQL Native Client.

SQLOS, Workers and Schedulers
SQLOS creates a set of schedulers upon start equal to the number of logical CPU’s. Each scheduler manages workers, for every task to execute it’s assigned a worker that is in an idle state. Workers do not move between schedulers, tasks are never moved between workers. However SQLOS can create child tasks and assign them different workers. Tasks can have one of six states Pending; task is waiting for an available worker Done; task is completed Running; task is currently executing Runnable; task is waiting for the scheduler Suspended; task is waiting for external event or resource Spinloop; task is processing a spinlock The main difference between Windows OS and SQLOS is the scheduling; Windows uses preemptive, SQLOS uses cooperative, meaning processes yield voluntarily. Each scheduler can either be ONLINE or OFFLINE stage based on process affinity and core based licensing model There is no strict one-to-one relationship between number of schedulers and CPU unless process affinity is enabled. Under heavy workload, it is possible to have more than one scheduler running on the same CPU. Keep in mind a spid isn’t the same as a task. SPID is a connection over which requests can be sent. A request is a unit of work, which is bound to a task that’s assigned a worker and that worker processes the entire request before handling any other requests. Pending; task is waiting for an available worker Done; task is completed Running; task is currently executing Runnable; task is waiting for the scheduler Suspended; task is waiting for external event or resource Spinloop; task is processing a spinlock: an internal mechanism that protects access to data structures.

SQLOS, Workers and Schedulers, cont
Time to go shopping! Cashier is representing the schedulers and customers are tasks in runnable queue Customer currently checking out is equal to a task in a running state If item is missing a UPC or needs a price check, the cashier suspends the checkout process and asks them to step aside which is a suspended queue. The worker comes back with the missing information and the customer who stepped aside moves to the END of the runnable line (queue) Sounds like fun right! Running state Suspended queue and suspended state Runnable queue and runnable state

SELECT Buffer Pool Database Engine
SNI (SQL Server Server Network Interface) Protocols TCP/IP Shared Memory Named Pipes Virtual Interface Adapter (VIA) SQLOS SQL Server Interface aka API TDS packets SNI (SQL Server Client Network Interface) Language Hash of execution plan Buffer Pool Language Event Data Cache If query hash does not exist in Plan Cache Database Engine simple SELECT statement. Our guide will connect from his client to the protocol layer that is exposed in the API using TDS (tabular data stream) endpoints. TDS packets are encapsulated by the SNI on both the server and client computer. There’s one TDS for each protocol, plus one for the DAC. Protocol Layers: * Shared Memory, Named Pipes, TCP/IP and VIA The protocol layer will probably use TCP/IP, VIA (virtual interface adapter), or maybe Named Pipes to further connect to the SQL Server. The protocol layer reverses the work of the SNI, unwrapping your packet to see what it contains. The SNI on the server side are in the network libraries that are part of the database engine. The SNI on the client side are the network libraries that are part of the SQL Native Client. The SELECT statement is marked as a “SQL Command” and sent to the Command Parser. The Cmd Parser’s role is to handle T-SQL language events: checks syntax, returns error codes when invalid. If valid, Cmd Parser generates an execution plan using a hash of the T-SQL then checks it against the plan cache to see if it already exists. If it finds a match, the plan is read from cache and passed to the Query Executor. Cmd Parser Optimizer Query Plan Plan Cache Query Tree Does the hash exist in the Plan Cache? Query Executor YES! Query has plan exists!

Project (*) Inner Join c.cid=o.cid Filter (o.date=‘2014-12-12’)
Logical Query Tree Select * from Customers c inner join Orders o on c.cid=o.cid where o.date=‘ ’; Project (*) Filter (o.date=‘ ’) Inner Join c.cid=o.cid Logical query tree consists of various logical algebraic operators such as, inner and outer joins, aggregations and others. Note: logical inner joins can be transformed to one of three physical joins including nested loop, merge and/or hash joins. Physical Query Execution Plan Generation When any query reaches SQL Server, the first place it goes to is the relational engine. Here, the query compilation process happens in three phases; Parsing, Binding and Optimization. Parsing, is a process to check the syntax--whether a query is written correctly or not. It doesn’t check whether a column used in a WHERE clause exists in any specified table in the FROM clause. The output of this process is a parse tree or sequence tree. Parse tree works as input for the next process and contains logical steps to execute the query. Binding, is a process done by algebrizer. It checks weather query semantics are correct or not; for example, whether the specified two tables joined in the FROM clause are really tables or not. The algebrizer produces a query processor tree, which works as input for query optimizer. Get (orders) as O Get (customers) as C

SELECT Buffer Pool Database Engine
SQLOS SNI (SQL Server Server Network Interface) Protocols TCP/IP Shared Memory Named Pipes Virtual Interface Adapter (VIA) SQL Server Interface aka API TDS packets SNI (SQL Server Client Network Interface) Buffer Pool Language Event Data Cache Database Engine The Optimizer is invoked to build a query plan on a “cost-based” method. Don’t forget – it’s not looking for the BEST PLAN, it’s looking for the MOST EFFICIENT PLAN (i.e. the best plan it can find in a split second). Once the SQL Server Query Executor knows what it needs to do, now it has to actually do it. So the QE hands over the retrieval of data to the Storage Engine (using OLEDB, in case you’re interested) to handle the data using its preferred Access Method. Cmd Parser Optimizer Query Plan Plan Cache Query Tree Query Executor

Query Optimizer Parse and logical query tree Binding, do they exist?
Here, the query compilation process happens in three phases; Parsing, Binding and Optimization. Parsing, is a process to check the syntax--whether a query is written correctly or not. It doesn’t check whether a column used in a WHERE clause exists in any specified table in the FROM clause. The output of this process is a parse tree or sequence tree. Parse tree works as input for the next process and contains logical steps to execute the query. Binding, is a process done by algebrizer. It checks weather query semantics are correct or not; for example, whether the specified two tables joined in the FROM clause are really tables or not. The algebrizer produces a query processor tree, which works as input for query optimizer. Optimization is the last step in the compilation process and generates the query execution plan. The optimizer attempts to determine the most efficient way to execute a given query. The optimizer always tries to find the way to process/execute a query in the minimum amount of time; it compares several options, with putting all possible permutations and combinations to generate the most effective query execution plan in a reasonable time, which does not deviate much from the best possible query execution plan. The optimizer uses a histogram made up by statistics, logical query tree, and heuristics to determine the most efficient plan. (heuristics is a method in computer science objective of a heuristic is to produce a solution in a reasonable time frame that is good enough for solving the problem at hand. This solution may not be the best of all the actual solutions to this problem, or it may simply approximate the exact solution. But it is still valuable because finding it does not require a prohibitively long time.) It can also perform multi-stage optimizations: Pre-optimization: a trivial plan that’s super simple Phase 0: Looks for simple nested loops w/o parallelization options. These are called transactional processing plans. Phase 1: Uses a quick subset of rules containing the most common patterns. These are called quick plans. Phase 2: This is for complex queries with parallelism and indexed views, called full plans. How much does it cost? Nuttin honey – cost is a made-up, abstract definition. Parse and logical query tree Binding, do they exist? Generate physical query plan Does it make sense and how can I get there? Do the objects exist? Create path in binary from algebrizer for optimizer. Most efficient plan (path) to the data.

Physical Query Plan Logical query tree consists of various logical algebraic operators such as, inner and outer joins, aggregations and others. Note: logical inner joins can be transformed to one of three physical joins including nested loop, merge and/or hash joins. The physical query plan is what gets stored in the plan cache as a hash or recipe.

SELECT Database Engine Storage Engine Buffer Pool SQL Server Interface
aka API SNI (SQL Server Server Network Interface) Protocols TCP/IP Shared Memory Named Pipes Virtual Interface Adapter (VIA) Database Engine Cmd Parser Optimizer Query Plan SQLOS Query Tree Query Executor TDS packets SNI (SQL Server Client Network Interface) Language Event Does Data exist in Cache? OLE DB Transaction Log File Storage Engine Buffer Pool Once the SQL Server Query Executor knows what it needs to do, now it has to actually do it. So the QE hands over the retrieval of data to the Storage Engine (using OLEDB, in case you’re interested) to handle the data using its preferred Access Method. Access Methods are a collection of code that figures out how to best get to the data stored in tables, indexes (and partitions) . However, it doesn’t do the actual work of retrieving data. That’s handled by the Buffer Manager. The Buffer Manager checks to see if the page(s) exist in cache. If not, BM gets the pages from the database (reading from disk in the process, creating physical reads) and puts them into data cache (creating logical reads). The key point to take away is that data is only actually ever read from cache! Access Methods Transaction Manager Data Cache Buffer Manager Data File Retrieves data from disk to cache Plan Cache

Insert, Update, Delete Database Engine Storage Engine Buffer Pool
SQL Server Interface aka API SNI (SQL Server Server Network Interface) Protocols TCP/IP Shared Memory Named Pipes Virtual Interface Adapter (VIA) Database Engine Cmd Parser Optimizer Query Plan SQLOS Query Tree Query Executor TDS packets SNI (SQL Server Client Network Interface) Language Event CheckPoint OLE DB Transaction Log File Storage Engine Buffer Pool We’re doing the exact same thing behind the scenes, up to the point were the Access Manager gets busy. In this case, we’ll need to persist our data changes to disk. So we must now involve the Transaction Manager. The Transaction Manager has two very important components: The Lock Manager: Maintains concurrency and the ACID properties of transactions according to the specified isolation level. The Log Manager: Controls writes to the transaction log, using a method called write-ahead logging. Once the transaction log confirms that it has physically written the data change and passed the confirmation back to the TM, the TM in turn confirms to the AM, and then passes the modification request back to the BM for completion But guess what, the BM has to confirm (as before) that the page is either in cache or on disk. And if it’s on disk, it must retrieve the page(s) to cache. A key point to remember is that the data is now changed, but only in cache and not on disk. This means the page is dirty and is not “cleaned” until it is flushed to disk. (A page is considered clean when it’s exactly the same on disk as in memory). Flushing to disk happens thru a process called checkpointing. Unlike the lazywriter, checkpointing flushes the pages to disk but it does not remove them from cache. Checkpointing also ensures that a database never has to recovery past its last checkpoint. On a default install of SQL Server, that happens every minute or so (as long as there’s more than 10mb of data to write). Transaction Manager Lock and Log manager Access Methods Data Cache Data File Buffer Manager Plan Cache

WAIT, WAIT WAIT PageIOLatch_x, Async_IO_Completion, IO_Completion
SOS_Scheduler_Yield Async_Network_IO Pagelatch_x, Latch_x, Resource_Semaphore LCK_x, LCK_M_x Locks Writelog, LogBuffer Latches BackupIO CXPacket OLEDB IO_Completion How many of you recognize these waits? Why does SQL need to wait? MSQL_DQ MSQL_XP PreEmptive_OS_ ThreadPool

WAIT, WAIT WAIT Database Engine Storage Engine Buffer Pool
Async_Network_IO SOS_Scheduler_Yield, CXPacket SQL Server Interface aka API SNI (SQL Server Server Network Interface) Protocols TCP/IP Shared Memory Named Pipes Virtual Interface Adapter (VIA) Database Engine Cmd Parser Optimizer Query Plan SQLOS Query Tree Query Executor TDS packets SNI (SQL Server Client Network Interface) Writelog, LogBuffer Pagelatch_x, Latch_x, Resource_Semaphore Language Event PageIOLatch_x, Async_IO_Completion, IO_Completion OLE DB Transaction Log File Storage Engine Latches Buffer Pool CPU PRESSURE CPU pressure: SOS_SCHEDULER_YIELD Parallelism: CXPACKET LOCKING Long term blocking: LCK_X, LCK_M_U, & LCK_M_X MEMORY Buffer latch: PAGELATCH_X Non-buffer latch: LATCH_X Memory grants: RESOURCE_SEMAPHORE I/O Buffer I/O latch: PAGEIOLATCH_X Tran log disk subsystem: WRITELOG & LOGBUFFER General I/O issues: ASYNC_IO_COMPLETION & IO_COMPLETION NETWORK PRESSURE Network I/O: ASYNC_NETWORK_IO Locks Transaction Manager Lock and Log manager Access Methods Data Cache Data File Buffer Manager LCK_x, LCK_M_x Plan Cache

Data Cache and LRU-K A B C A B A B C C Buffer Pool Free Buffer List
Data File Free Buffer List Every buffer has a header Last two times the page was referenced Status information A B C SQLOS Buffer Pool Lazy Writer Data Cache LAZY WRITER A B A B C C 1 lazy writer per each memory node in SQLOS Like a clock tick, 16 buffers each time Scan sequentially and repeat An LRU-K algorithm keeps track of the last K times a page was referenced and able to differentiate between data, index pages and level of frequency. SQL Server 2008 uses K value of 2 as such keeps track of the two most recent accesses of each page in buffer Data cache is scanned periodically from start to end. As buffer cache all in memory so these scans are very quick and there is no IO overhead. Each buffer is assigned a value based on its usage history and when the value gets low enough, dirty page indicator is checked. If the page found to be dirty, a write is scheduled to write the modifications to disk. SQL Server use WAL (write ahead log) so write of the dirty page is blocked while the log page recording the modification to disk first. Once modified page is flushed to disk or if the page is not dirty to start with, it is freed. The link between buffer page and data cache page that it contains is removed by deleting information about the buffer from hash table, and then buffer is put into free list. SQL Server uses a write-ahead log (WAL), which guarantees that no data modifications are written to disk before the associated log record is written to disk. This maintains the ACID properties for a transaction. LRU-K can discriminate better between frequently referenced and infrequently referenced pages. The development of the LRU-K algorithm was mostly motivated by OLTP applications, decision-support applications on large relational databases, and especially combinations of these two workload categories. The LRUK algorithm is such a self-tuning and adaptive buffering algorithm, even in the presence of evolving access patterns The buffer pool is periodically scanned from beginning to end. This task is carried out by individual Workers after they schedule an asynchronous read and by the Lazy Writer process (which wakes up periodically and checks the size of the free list) During the scan a value is associated with each buffer based on its usage history . When the value gets low enough the dirty page indicator checked and if required a write is scheduled to write the modification to disc. After the write (if required) the page is freed, the buffer is put on the free list. SQL Server uses an LRU-K/2 algorithm

Plan Cache and LRU 9 2 9 5 2 7 3 A B A B C A B C C Buffer Pool SQLOS
Resource Monitor SQLOS Buffer Pool Plan Cache 9 2 9 5 2 7 3 A B A B C A B C C The LRU is like a clock algorithm, as it sweeps it looks at every entry. It decreases the cost with each sweep. When the cost of the plan in the cache reaches 0 it is swept out! SQL Server uses memory dynamically and must constantly be aware of the amount of free memory available in the OS. The Resource Monitor component of SQLOS monitors the system OS via QueryMemoryResourceNotification API, depending on the system memory and different algorthims; a LowMemoryResourceNotification flag is set. That flag triggers the Resource Monitor to force the external clock “hands” on the internal caches, minus the data cache in the SQLOS to sweep the caches, clean up and reduce memory usage allowing memory to be returned to the system OS.

Hekaton Database Engine Storage Engine Buffer Pool
Natively compiled stored procedures and schema SQL Server Interface aka API SNI (SQL Server Server Network Interface) Protocols TCP/IP Shared Memory Named Pipes Virtual Interface Adapter (VIA) Database Engine Cmd Parser Optimizer Query Plan SQLOS Query Tree In-Memory OLTP Compiler Query Executor TDS packets SNI (SQL Server Client Network Interface) Language Event Memory Optimized Tables and Indexes OLE DB Transaction Log File Storage Engine Buffer Pool Hekaton can actually be really cool. Here’s some of its advantages: It stores data differently in a way that eliminates latch waits If your stored procedures use a subset of T-SQL commands, they can get compiled to native code and run much faster If you don’t need to persist the data to disk, you can skip disk writes altogether and just have some RAM-only tables But you need to understand the limitations. Click on the links to learn more about each restriction: Hekaton is designed for less than 250GB of data Your memory needs to be 2x the size of your data Hekaton is designed for no more than 4 CPU sockets or 60 cores Hekaton does not allow table alters Hekaton tables can’t have additional indexes added later Hekaton tables can’t have more than 7 non-clustered indexes Hekaton does not support cross-database queries Hekaton max row size is 8060 bytes Hekaton tables can’t have full text indexes Databases with Hekaton objects can’t have database snapshots Hekaton tables can’t be replicated Hekaton tables can’t have partitioning, computed columns, IDENTITY, foreign keys, check constraints, or unique indexes Based on TSQL Stored Procedure definition and tables metadata, MAT (Mixed Abstract Tree) is generated; MAT is then transformed into PIT (Pure Imperative Tree) where SQL-like data types are transformed into C-like data types (pointers, structs, arrays, etc.); PIT is then sourced to generate a “C” code text file which is written to the local file system, under SQL Server instance installation directories; “C” compiler (CL.EXE), distributed with SQL Server installation binaries and residing under “C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\Binn\Xtp\VC\bin” (my default location), is invoked using the previous artifact as input, then will generate an OBJ file; “C” linker (LINK.EXE), in the same compiler location as mentioned above, is invoked and based on previous OBJ file will generate the final DLL that will contain the native compiled code to execute the original Stored Procedure. natively compiled stored procedures are recompiled at each in-memory database startup, but it was discovered something un-expected, at least based on the official documentation: while in-memory tables are *all* immediately recompiled, before allowing data access, natively compiled stored procedures are only recompiled, and loaded into memory, only at first usage Full ACID compliance is ensured through optimistic MVCC (multi version concurrency control). Row-versioning and time-stamping is used rather than page locks to ensure ACID compliance A new migration aid, the Analysis, Migration and Reporting tool, is included as part of SQL Server Management Studio to assist the DBA in converting a schema to in-memory tables. Table data resides in memory: usage of algorithms that are optimized for accessing memory-resident data are used. No Locks or Latches: optimistic concurrency control that eliminates logical locks and physical latches. Native Code Compilation: tables and procedures are translated in “C” language and then compiled in assembly DLLs resulting in very efficient/fast code and reduced number of instructions to be executed. quite obvious that using compiled code, SQL Server will be more efficient since TSQL optimization, query execution, expression evaluations and access methods will be defined at compile time and will not require expensive processing using interpreted code anymore! Be aware that I’m not telling you that the SQL Server Parser, Algebrizer, and Query Optimizer components are not used anymore: they still play a key role in the Stored Procedure (SP) code generation, but they will be called into action only at creation time, not at execution time. Transaction Manager Lock and Log manager Access Methods Data Cache Data File Buffer Manager Plan Cache

Summary ACID SQLOS Components of database engine
Components of storage engine Cache aging Differences of Hekaton Atomic, Consistent, Isolated, Durable SQLOS used for schedulers, workers, threads, memory management DE: Cmd Parser, Query Tree, Query Optimizer, Query Executor SE: Access Methods, Buffer Manager, Transaction Manager Cache aging: LRU-k for data cache, LRU for other caches Hekaton: isn’t for everything!

SQL Server Internals & Architecture

Similar presentations

Presentation on theme: "SQL Server Internals & Architecture"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

SQL Server Internals & Architecture

Similar presentations

Presentation on theme: "SQL Server Internals & Architecture"— Presentation transcript:

Similar presentations

About project

Feedback