What Is a Latch? …and Why Do I Care? Eddie Wuerch, mcm

Slides:



Advertisements
Similar presentations
TempDB: Performance and Manageability
Advertisements

DAT 342 Advanced SQL Server Performance and Tuning Bren Newman Program Manager SQL Server Development Microsoft Corporation.
Module 2: Database Architecture
SQL Server Storage Engine.  Software architect at Red Gate Software  Responsible for SQL tools: ◦ SQL Compare, SQL Data Compare, SQL Packager ◦ SQL.
Buffer Cache Waits. #.2 Copyright 2006 Kyle Hailey Buffer Cache Waits Waits Disk I/O Buffer Busy Library Cache Enqueue SQL*Net Free Buffer Hot Blocks.
Oracle Architecture. Instances and Databases (1/2)
A HEAP OF CLUSTERS A look into heaps vs. clustered tables Ami Levin CTO, DBSophic X.
Backup & Recovery 1.
Administration etc.. What is this ? This section is devoted to those bits that I could not find another home for… Again these may be useless, but humour.
Module 8: Server Management. Overview Server-level and instance-level resources such as memory and processes Database-level resources such as logical.
Implementing Database Snapshot & Database Mirroring in SQL Server 2005 Presented by Tarek Ghazali IT Technical Specialist Microsoft SQL Server MVP Microsoft.
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
Data Recovery and Fixing Database Corruptions
TEMPDB Capacity Planning. Indexing Advantages – Increases performance – SQL server do not have to search all the rows. – Performance, Concurrency, Required.
Architecture Rajesh. Components of Database Engine.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
File System Implementation Chapter 12. File system Organization Application programs Application programs Logical file system Logical file system manages.
© 2008 Quest Software, Inc. ALL RIGHTS RESERVED. Perfmon and Profiler 101.
Troubleshooting SQL Server Performance: Tips &Tools Amit Khandelwal.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
NTFS Filing System CHAPTER 9. New Technology File System (NTFS) Started with Window NT in 1993, Windows XP, 2000, Server 2003, 2008, and Window 7 also.
TOP 10 Thinks you shouldn’t do with/in your database
Diagnosing Performance with Wait Statistics Robert L Davis Principal Database
October 15-18, 2013 Charlotte, NC Accelerating Database Performance Using Compression Joseph D’Antoni, Solutions Architect Anexinet.
Digging Out From Corruption Eddie Wuerch, MCM - Principal, Database Performance - Salesforce Marketing Cloud Data protection and loss recovery with SQL.
Storage Tuning for Relational Databases Philippe Bonnet – Spring 2015.
SQL Server Internals 101 AYMAN SENIOR MICROSOFT.
Data Manipulation Language Deep Dive into Internals of DML Uwe Ricken MCM:Microsoft Certified Master – SQL 2008 MVP:Most Valued Professional – SQL Server.
You Inherited a Database Now What? What you should immediately check and start monitoring for. Tim Radney, Senior DBA for a top 40 US Bank President of.
No more waiting. Sponsors About me  Database Technology Specialist  MVP  Blogger  Author 3
Ayman El-Ghazali Senior Microsoft.
SQL Server Magic Buttons! What are Trace Flags and why should I care? Steinar Andersen, SQL Service Nordic AB Thanks to Thomas Kejser for peer-reviewing.
SQL Server Storage Inside. About Hemantgiri S. Goswami Hemantgiri S. Goswami is a Lead Database Consultant for Pythian, a company head quartered in Ottawa,
An introduction to Wait Statistics
You Inherited a Database Now What?
Jonathan Walpole Computer Science Portland State University
Module 11: File Structure
What Is a Latch? …and Why Do I Care? Eddie Wuerch, mcm
A Day in the Life of a Row Eddie Wuerch, mcm
CS522 Advanced database Systems
Finding more space for your tight environment
SQL Server Monitoring Overview
Database Management Systems (CS 564)
Chapter Overview Understanding the Database Architecture
Introduction to SQL Server Management for the Non-DBA
Hustle and Bustle of SQL Pages
Database Administration for the Non-DBA
Lecture 10: Buffer Manager and File Organization
SQL Server May Let You Do It, But it Doesn’t Mean You Should
Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.
Troubleshooting SQL Server Basics
The Vocabulary of Performance Tuning
The Vocabulary of Performance Tuning
Database Implementation Issues
Wellington, SQLSaturday#706
What Is a Latch? …and Why Do I Care? Eddie Wuerch, mcm
Module 11: Data Storage Structure
Shaving of Microseconds
The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited)
It’s TEMPDB Why Should You Care?
The Vocabulary of Performance Tuning
You Inherited a Database Now What?
Статистика ожиданий или как найти место "где болит"
The Vocabulary of Performance Tuning
Using wait stats to determine why my server is slow
Inside the Database Engine
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #03 Row/Column Stores, Heap Files, Buffer Manager, Catalogs Instructor: Chen Li.
Database Implementation Issues
Inside the Database Engine
Inside the Database Engine
Presentation transcript:

What Is a Latch? …and Why Do I Care? Eddie Wuerch, mcm Salesforce Marketing Cloud

Hi! I’m Eddie :) Over 15 years SQL Server Microsoft Certified Master Salesforce Marketing Cloud Trillions of rows … 10s billion tx/day … PBs data & indexes … 24x7, no downtimes

The Three ‘C’s of Performance Capacity Configuration Code Disk Performance Memory Capacity Disk Performance Disk Allocation Contention Scans Hotspotting Insert/Update/Delete Metadata Contention TempDB Abuse

PAGELATCH_* PAGEIOLATCH_*

Perf Monitoring: Waits A process is either waiting or working Not working? It’s waiting on something specific: A lock Data Memory CPU Something to do Reading Wait Statistics is the first step of the Waits and Queues Method See the Microsoft whitepaper “SQL Server 2005 Waits and Queues”

What is a ‘Wait’? Signal Wait Resource Wait Time (ms) Time (ms) Request for unavailable resource (processing stops, wait begins) Process signaled, resource available (wait continues) Scheduler available (wait ends, processing continues) Resource Wait Time (ms) Signal Wait Time (ms) Running Suspended Runnable Running Wait Time (ms)

Latch Waits Fall into two main categories PAGELATCH_* PAGEIOLATCH_* Similar names, different issues, different solutions So… what is a latch?

Latching – Why? Page header –page metadata Page ID Next Page ID Page header – 96 bytes Page header –page metadata Page ID Next Page ID Previous Page ID Owning object ID Index ID Index level Free space on page Next row offset …other stuff Page row data – starts at byte 97 Row-offset table – starts at last byte, moves backwards Data page (8KB)

Without Latching… Insert new row at offset 0x200 Process 1 Next row offset = 0x200 Insert new row at offset 0x200 0x080: Row 1 Data New Row Data – Process 1 0x140: Row 2 Data 0x200: Free New Row Data – Process 1 New Row Data – Process 2 Process 2 Insert new row at offset 0x200 New Row Data – Process 2 Data page 1:1000 (8KB) Source: Microsoft PSS Presentations

PAGELATCH_UP – Data Page Example Process 1 Next row offset = 0x260 Next row offset = 0x200 Insert new row at offset 0x200 0x080: Row 1 Data New Row Data – Process 1 0x140: Row 2 Data Process 1: LATCH_UP on page 1:1000 0x200: Free New Row Data – Process 1 New Row Data – Process 2 PAGELATCH_UP Wait Process 2: LATCH_UP on page 1:1000 Process 2 Insert new row at offset 0x260 New Row Data – Process 2 Data page 1:1000 (8KB) Source: Microsoft PSS Presentations

PAGELATCH_* - Data-in-Memory Modification Wait Usually indicates one of: Hotspotting GAM, SGAM, or PFS contention Creating and dropping lot of #temp tables (which is actually a combination of the previous two) Examples: PAGELATCH_SH – Waiting to get an shared latch (lock) on a memory page PAGELATCH_EX – Waiting to get an exclusive latch (lock) on a memory page Indicates: multiple possible issues, depending on the page on which the latch issues are occurring Inserts/updates/page splits outpacing allocation: There is one GAM, and SGAM page for each 4GB in each in each data file. Many processes requesting newly- allocated pages in the file are fighting over these pages. PFS pages exists every 8088 pages in a file, many BLOB and other shared-extent inserts can cause latch contention on these pages. Hotspotting: many separate threads inserting rows into a single data page, such as an index on an identity column Adding and dropping lots of #temp tables (the drop is the issue) Use DBCC PAGE(db_id:file_id:page_id:viewopts) on the page (displayed in the wait_resource column of the waits query) to determine the page type (see the m_type header field in the page header). Page types (m_type in page header): 1 – In-row data page – could be hotspotting in user databases, indicates problem with too many #temp table drops if object is sys.multiobjrefs 2 – Index page – All non-leaf clustered index pages and all non-clustered index pages 3 – Text mix page - Parts of LOB values plus internal parts of text trees 4 – Text tree page – Large chunks of LOB data from a single value 7 – Sort page 8 – GAM 9 – SGAM 10 – IAM – Index Allocation Map – A bitmap similar to the GAM and SGAM for tracking all extents within the 4GB file block for an index or allocation unit 11 – PFS 13 – Boot page – Database info, only one per database (file 1 page 9) 15 – File header 16 – Diff map page – (DCM or diff change map) – tracks which extents in the GAM interval have changed since last full backup 17 – ML map page – (BCM or bulk change map) – tracks which extents in the GAM interval have changed in bulk-logged mode since last full backup Who fixes it? For GAM, SGAM, and PFS contention issues: DBAs add more data files. Hotspotting: Tuning options include shrinking row sizes (varchar vs. char, char vs. nchar, tinyint/smallint/int/bigint, smalldatetime/datetime) and SQL Server compression so fewer pages are allocated to a table for the same amount of data For the tempdb table-drop issue, change SQL calls to not drop temp tables. Just let them go out of scope and be cleaned up by the deferred-drop mechanism More Info: Look in sys.dm_exec_requests to watch for statements waiting on PAGELATCH waits. Check the wait_resource columns for the page on which contention is occurring. It will look like 2:1:103, the format is db_id:file_id:page_id, and those three values are the first three parameters of DBCC PAGE(db_id, file_id, page_id[, printopts]). In the example, you can see the page header with: DBCC TRACEON(3604) DBCC PAGE (2, 1, 103, 0) See Paul Randal’s blog at www.sqlskills.com for much more detail

Classifying PAGELATCH_* Contention To view a data page, use the DBCC PAGE command: DBCC PAGE(dbname | dbid , fileid, pageid [, printopts]) [WITH TABLERESULTS]; Note: to see the output of this and some other DBCC commands, you must enable trace flag 3604 in the session to redirect output to the console: DBCC TRACEON(3604); printopts - output format: 0 - print buffer and page header only (default) 1 - print buffer and page headers, rows and offset table 2 - print buffer and page headers, hex dump of data and offset table 3 - print buffer and page headers, rows and offset table; each row is followed by each column value listed separately

PAGELATCH_* - Hotspotting Symptom: PAGELATCH_* waits on data pages (header m_type = 1) or index pages (m_type = 2) Often at the ‘end’ of a table or index (identity index, datetime index, etc.) How to address… Is this index or ordering scheme necessary? Can many single inserts be batched together? Edge case – could partition the index on a hash of the value (has its own problems, not a plug-n-play solution) Could also be a ‘hot’ page – small table, many inserts and deletes Examples: PAGELATCH_SH – Waiting to get an shared latch (lock) on a memory page PAGELATCH_EX – Waiting to get an exclusive latch (lock) on a memory page Indicates: multiple possible issues, depending on the page on which the latch issues are occurring Inserts/updates/page splits outpacing allocation: There is one GAM, and SGAM page for each 4GB in each in each data file. Many processes requesting newly- allocated pages in the file are fighting over these pages. PFS pages exists every 8088 pages in a file, many BLOB and other shared-extent inserts can cause latch contention on these pages. Hotspotting: many separate threads inserting rows into a single data page, such as an index on an identity column Adding and dropping lots of #temp tables (the drop is the issue) Use DBCC PAGE(db_id:file_id:page_id:viewopts) on the page (displayed in the wait_resource column of the waits query) to determine the page type (see the m_type header field in the page header). Page types (m_type in page header): 1 – In-row data page – could be hotspotting in user databases, indicates problem with too many #temp table drops if object is sys.multiobjrefs 2 – Index page – All non-leaf clustered index pages and all non-clustered index pages 3 – Text mix page - Parts of LOB values plus internal parts of text trees 4 – Text tree page – Large chunks of LOB data from a single value 7 – Sort page 8 – GAM 9 – SGAM 10 – IAM – Index Allocation Map – A bitmap similar to the GAM and SGAM for tracking all extents within the 4GB file block for an index or allocation unit 11 – PFS 13 – Boot page – Database info, only one per database (file 1 page 9) 15 – File header 16 – Diff map page – (DCM or diff change map) – tracks which extents in the GAM interval have changed since last full backup 17 – ML map page – (BCM or bulk change map) – tracks which extents in the GAM interval have changed in bulk-logged mode since last full backup Who fixes it? For GAM, SGAM, and PFS contention issues: DBAs add more data files. Hotspotting: Tuning options include shrinking row sizes (varchar vs. char, char vs. nchar, tinyint/smallint/int/bigint, smalldatetime/datetime) and SQL Server compression so fewer pages are allocated to a table for the same amount of data For the tempdb table-drop issue, change SQL calls to not drop temp tables. Just let them go out of scope and be cleaned up by the deferred-drop mechanism More Info: Look in sys.dm_exec_requests to watch for statements waiting on PAGELATCH waits. Check the wait_resource columns for the page on which contention is occurring. It will look like 2:1:103, the format is db_id:file_id:page_id, and those three values are the first three parameters of DBCC PAGE(db_id, file_id, page_id[, printopts]). In the example, you can see the page header with: DBCC TRACEON(3604) DBCC PAGE (2, 1, 103, 0)

PAGELATCH_* - GAM, SGAM, PFS Symptom: PAGELATCH_* waits on the following page types: GAM (header m_type = 8) SGAM (header m_type = 9) PFS (header m_type = 11) (more common in tempdb) These are allocation waits Usually solved by DBAs adding more data files Examples: PAGELATCH_SH – Waiting to get an shared latch (lock) on a memory page PAGELATCH_EX – Waiting to get an exclusive latch (lock) on a memory page Indicates: multiple possible issues, depending on the page on which the latch issues are occurring Inserts/updates/page splits outpacing allocation: There is one GAM, and SGAM page for each 4GB in each in each data file. Many processes requesting newly- allocated pages in the file are fighting over these pages. PFS pages exists every 8088 pages in a file, many BLOB and other shared-extent inserts can cause latch contention on these pages. Hotspotting: many separate threads inserting rows into a single data page, such as an index on an identity column Adding and dropping lots of #temp tables (the drop is the issue) Use DBCC PAGE(db_id:file_id:page_id:viewopts) on the page (displayed in the wait_resource column of the waits query) to determine the page type (see the m_type header field in the page header). Page types (m_type in page header): 1 – In-row data page – could be hotspotting in user databases, indicates problem with too many #temp table drops if object is sys.multiobjrefs 2 – Index page – All non-leaf clustered index pages and all non-clustered index pages 3 – Text mix page - Parts of LOB values plus internal parts of text trees 4 – Text tree page – Large chunks of LOB data from a single value 7 – Sort page 8 – GAM 9 – SGAM 10 – IAM – Index Allocation Map – A bitmap similar to the GAM and SGAM for tracking all extents within the 4GB file block for an index or allocation unit 11 – PFS 13 – Boot page – Database info, only one per database (file 1 page 9) 15 – File header 16 – Diff map page – (DCM or diff change map) – tracks which extents in the GAM interval have changed since last full backup 17 – ML map page – (BCM or bulk change map) – tracks which extents in the GAM interval have changed in bulk-logged mode since last full backup Who fixes it? For GAM, SGAM, and PFS contention issues: DBAs add more data files. Hotspotting: Tuning options include shrinking row sizes (varchar vs. char, char vs. nchar, tinyint/smallint/int/bigint, smalldatetime/datetime) and SQL Server compression so fewer pages are allocated to a table for the same amount of data For the tempdb table-drop issue, change SQL calls to not drop temp tables. Just let them go out of scope and be cleaned up by the deferred-drop mechanism More Info: Look in sys.dm_exec_requests to watch for statements waiting on PAGELATCH waits. Check the wait_resource columns for the page on which contention is occurring. It will look like 2:1:103, the format is db_id:file_id:page_id, and those three values are the first three parameters of DBCC PAGE(db_id, file_id, page_id[, printopts]). In the example, you can see the page header with: DBCC TRACEON(3604) DBCC PAGE (2, 1, 103, 0)

First Extent (8 pages – 64KB) Data File Structure The base unit of data storage is called a page. All pages are 8KB (8192 bytes) Pages are organized into 8-page extents of 64KB Page 0 Page 0 Head Page 1 Page 1 PFS Page 2 GAM Page 2 Page 3 SGAM Page 3 Page 4 empty Page 5 empty Page 6 DCM Page 7 BCM Page 8 Page 9* Page 10 Page 11 … Page 15 First Extent (8 pages – 64KB)

Bitmap Metadata Pages 8,096 bytes = 64,768 bits GAM, SGAM, BCM, DCM, IAM, …….. 96b Header 1101011000101010111010111… P.0 P.1 P.2 P.3 P.4 P.5 P.6 P.7 P.8 P.9 P.10 P.11 P.12 P.13 P.14 P.15 P.16 P.17 P.18 P.19 P.20 P.21 P.22 P.23 P.24 P.25 P.26 P.27 … First Extent Second Extent …. 8,096 bytes = 64,768 bits 1 bit/extent = 64,768 extents 64,768 extents * 64KB/extent = ~4GB of disk space per single GAM

File Space Allocation Proportional Fill in action Free Space File 1

File Space Allocation Proportional Fill in action File 1 Free Space

Multiple-File Benefits Multi-core systems introduce contention issues File 1 File 2 File 3 File 4 1 GAM and 1 SGAM for every 64,768 extents (4GB of file space). All allocations in that space affect the GAM. Page 0 File Header Page 1 PFS Page 1 PFS Page 2 GAM Page 3 SGAM 0 GB There are 64 PFS pages in the same 4GB (every 8,088 pages). Usually not an issue, except in TempDB 4 GB

PAGEIOLATCH_* - I/O Waits Examples: PAGEIOLATCH_SH, PAGEIOLATCH_EX Indicates: Scanning from disk, may also indicate low memory or slow read I/O Who fixes it? Developers fix scans, operations adds memory and checks disk performance More info: check the waits query for waiting statements and check the plans

Query Engine vs. Storage Engine What you normally talk to Runs queries Processes data Only operates on memory Physical storage - unaware Calls Storage Engine for data not in memory, writing changes to disk Storage Engine All disk activity Pulls data from disk and places in memory for the Query Engine Pulls changed data from memory to store on disk

PAGEIOLATCH – More Detail Release LATCH_EX Acquire Latch_SH Read Page 1:100 LATCH_EX The BUF Call Async IO Fetch LATCH_SH The BUF Read Page 1:100 PAGEIOLATCH_SH Wait PAGEIOLATCH_EX Wait BUF (64 bytes) Data cache: Page 1:100

Sublatches and SuperLatches Multi-core issue: several separate threads reading (LATCH_SH) the same page. Perf trick: give each core a copy of the BUF This is a sublatch. Need to coordinate them all with a single ‘master’ copy. This is a superlatch. Great for reads. Bad for writes.