Presentation is loading. Please wait.

Presentation is loading. Please wait.

What Is a Latch? …and Why Do I Care? Eddie Wuerch, mcm

Similar presentations


Presentation on theme: "What Is a Latch? …and Why Do I Care? Eddie Wuerch, mcm"— Presentation transcript:

1 What Is a Latch? …and Why Do I Care? Eddie Wuerch, mcm
Salesforce Marketing Cloud

2 Hi! I’m Eddie :) Over 15 years SQL Server Microsoft Certified Master
Salesforce Marketing Cloud Trillions of rows … 10s billion tx/day … PBs data & indexes … 24x7, no downtimes

3 The Three ‘C’s of Performance
Capacity Configuration Code Disk Performance Memory Capacity Disk Performance Disk Allocation Contention Scans Hotspotting Insert/Update/Delete Metadata Contention TempDB Abuse

4 PAGELATCH_* PAGEIOLATCH_*

5 Perf Monitoring: Waits
A process is either waiting or working Not working? It’s waiting on something specific: A lock Data Memory CPU Something to do Reading Wait Statistics is the first step of the Waits and Queues Method See the Microsoft whitepaper “SQL Server 2005 Waits and Queues”

6 What is a ‘Wait’? Signal Wait Resource Wait Time (ms) Time (ms)
Request for unavailable resource (processing stops, wait begins) Process signaled, resource available (wait continues) Scheduler available (wait ends, processing continues) Resource Wait Time (ms) Signal Wait Time (ms) Running Suspended Runnable Running Wait Time (ms)

7 Latch Waits Fall into two main categories PAGELATCH_* PAGEIOLATCH_*
Similar names, different issues, different solutions So… what is a latch?

8 Latching – Why? Page header –page metadata Page ID Next Page ID
Page header – 96 bytes Page header –page metadata Page ID Next Page ID Previous Page ID Owning object ID Index ID Index level Free space on page Next row offset …other stuff Page row data – starts at byte 97 Row-offset table – starts at last byte, moves backwards Data page (8KB)

9 Without Latching… Insert new row at offset 0x200
Process 1 Next row offset = 0x200 Insert new row at offset 0x200 0x080: Row 1 Data New Row Data – Process 1 0x140: Row 2 Data 0x200: Free New Row Data – Process 1 New Row Data – Process 2 Process 2 Insert new row at offset 0x200 New Row Data – Process 2 Data page 1:1000 (8KB) Source: Microsoft PSS Presentations

10 PAGELATCH_UP – Data Page Example
Process 1 Next row offset = 0x260 Next row offset = 0x200 Insert new row at offset 0x200 0x080: Row 1 Data New Row Data – Process 1 0x140: Row 2 Data Process 1: LATCH_UP on page 1:1000 0x200: Free New Row Data – Process 1 New Row Data – Process 2 PAGELATCH_UP Wait Process 2: LATCH_UP on page 1:1000 Process 2 Insert new row at offset 0x260 New Row Data – Process 2 Data page 1:1000 (8KB) Source: Microsoft PSS Presentations

11 PAGELATCH_* - Data-in-Memory Modification Wait
Usually indicates one of: Hotspotting GAM, SGAM, or PFS contention Creating and dropping lot of #temp tables (which is actually a combination of the previous two) Examples: PAGELATCH_SH – Waiting to get an shared latch (lock) on a memory page PAGELATCH_EX – Waiting to get an exclusive latch (lock) on a memory page Indicates: multiple possible issues, depending on the page on which the latch issues are occurring Inserts/updates/page splits outpacing allocation: There is one GAM, and SGAM page for each 4GB in each in each data file. Many processes requesting newly- allocated pages in the file are fighting over these pages. PFS pages exists every pages in a file, many BLOB and other shared-extent inserts can cause latch contention on these pages. Hotspotting: many separate threads inserting rows into a single data page, such as an index on an identity column Adding and dropping lots of #temp tables (the drop is the issue) Use DBCC PAGE(db_id:file_id:page_id:viewopts) on the page (displayed in the wait_resource column of the waits query) to determine the page type (see the m_type header field in the page header). Page types (m_type in page header): 1 – In-row data page – could be hotspotting in user databases, indicates problem with too many #temp table drops if object is sys.multiobjrefs 2 – Index page – All non-leaf clustered index pages and all non-clustered index pages 3 – Text mix page - Parts of LOB values plus internal parts of text trees 4 – Text tree page – Large chunks of LOB data from a single value 7 – Sort page 8 – GAM 9 – SGAM 10 – IAM – Index Allocation Map – A bitmap similar to the GAM and SGAM for tracking all extents within the 4GB file block for an index or allocation unit 11 – PFS 13 – Boot page – Database info, only one per database (file 1 page 9) 15 – File header 16 – Diff map page – (DCM or diff change map) – tracks which extents in the GAM interval have changed since last full backup 17 – ML map page – (BCM or bulk change map) – tracks which extents in the GAM interval have changed in bulk-logged mode since last full backup Who fixes it? For GAM, SGAM, and PFS contention issues: DBAs add more data files. Hotspotting: Tuning options include shrinking row sizes (varchar vs. char, char vs. nchar, tinyint/smallint/int/bigint, smalldatetime/datetime) and SQL Server compression so fewer pages are allocated to a table for the same amount of data For the tempdb table-drop issue, change SQL calls to not drop temp tables. Just let them go out of scope and be cleaned up by the deferred-drop mechanism More Info: Look in sys.dm_exec_requests to watch for statements waiting on PAGELATCH waits. Check the wait_resource columns for the page on which contention is occurring. It will look like 2:1:103, the format is db_id:file_id:page_id, and those three values are the first three parameters of DBCC PAGE(db_id, file_id, page_id[, printopts]). In the example, you can see the page header with: DBCC TRACEON(3604) DBCC PAGE (2, 1, 103, 0) See Paul Randal’s blog at for much more detail

12 Classifying PAGELATCH_* Contention
To view a data page, use the DBCC PAGE command: DBCC PAGE(dbname | dbid , fileid, pageid [, printopts]) [WITH TABLERESULTS]; Note: to see the output of this and some other DBCC commands, you must enable trace flag 3604 in the session to redirect output to the console: DBCC TRACEON(3604); printopts - output format: print buffer and page header only (default) print buffer and page headers, rows and offset table print buffer and page headers, hex dump of data and offset table print buffer and page headers, rows and offset table; each row is followed by each column value listed separately

13 PAGELATCH_* - Hotspotting
Symptom: PAGELATCH_* waits on data pages (header m_type = 1) or index pages (m_type = 2) Often at the ‘end’ of a table or index (identity index, datetime index, etc.) How to address… Is this index or ordering scheme necessary? Can many single inserts be batched together? Edge case – could partition the index on a hash of the value (has its own problems, not a plug-n-play solution) Could also be a ‘hot’ page – small table, many inserts and deletes Examples: PAGELATCH_SH – Waiting to get an shared latch (lock) on a memory page PAGELATCH_EX – Waiting to get an exclusive latch (lock) on a memory page Indicates: multiple possible issues, depending on the page on which the latch issues are occurring Inserts/updates/page splits outpacing allocation: There is one GAM, and SGAM page for each 4GB in each in each data file. Many processes requesting newly- allocated pages in the file are fighting over these pages. PFS pages exists every pages in a file, many BLOB and other shared-extent inserts can cause latch contention on these pages. Hotspotting: many separate threads inserting rows into a single data page, such as an index on an identity column Adding and dropping lots of #temp tables (the drop is the issue) Use DBCC PAGE(db_id:file_id:page_id:viewopts) on the page (displayed in the wait_resource column of the waits query) to determine the page type (see the m_type header field in the page header). Page types (m_type in page header): 1 – In-row data page – could be hotspotting in user databases, indicates problem with too many #temp table drops if object is sys.multiobjrefs 2 – Index page – All non-leaf clustered index pages and all non-clustered index pages 3 – Text mix page - Parts of LOB values plus internal parts of text trees 4 – Text tree page – Large chunks of LOB data from a single value 7 – Sort page 8 – GAM 9 – SGAM 10 – IAM – Index Allocation Map – A bitmap similar to the GAM and SGAM for tracking all extents within the 4GB file block for an index or allocation unit 11 – PFS 13 – Boot page – Database info, only one per database (file 1 page 9) 15 – File header 16 – Diff map page – (DCM or diff change map) – tracks which extents in the GAM interval have changed since last full backup 17 – ML map page – (BCM or bulk change map) – tracks which extents in the GAM interval have changed in bulk-logged mode since last full backup Who fixes it? For GAM, SGAM, and PFS contention issues: DBAs add more data files. Hotspotting: Tuning options include shrinking row sizes (varchar vs. char, char vs. nchar, tinyint/smallint/int/bigint, smalldatetime/datetime) and SQL Server compression so fewer pages are allocated to a table for the same amount of data For the tempdb table-drop issue, change SQL calls to not drop temp tables. Just let them go out of scope and be cleaned up by the deferred-drop mechanism More Info: Look in sys.dm_exec_requests to watch for statements waiting on PAGELATCH waits. Check the wait_resource columns for the page on which contention is occurring. It will look like 2:1:103, the format is db_id:file_id:page_id, and those three values are the first three parameters of DBCC PAGE(db_id, file_id, page_id[, printopts]). In the example, you can see the page header with: DBCC TRACEON(3604) DBCC PAGE (2, 1, 103, 0)

14 PAGELATCH_* - GAM, SGAM, PFS
Symptom: PAGELATCH_* waits on the following page types: GAM (header m_type = 8) SGAM (header m_type = 9) PFS (header m_type = 11) (more common in tempdb) These are allocation waits Usually solved by DBAs adding more data files Examples: PAGELATCH_SH – Waiting to get an shared latch (lock) on a memory page PAGELATCH_EX – Waiting to get an exclusive latch (lock) on a memory page Indicates: multiple possible issues, depending on the page on which the latch issues are occurring Inserts/updates/page splits outpacing allocation: There is one GAM, and SGAM page for each 4GB in each in each data file. Many processes requesting newly- allocated pages in the file are fighting over these pages. PFS pages exists every pages in a file, many BLOB and other shared-extent inserts can cause latch contention on these pages. Hotspotting: many separate threads inserting rows into a single data page, such as an index on an identity column Adding and dropping lots of #temp tables (the drop is the issue) Use DBCC PAGE(db_id:file_id:page_id:viewopts) on the page (displayed in the wait_resource column of the waits query) to determine the page type (see the m_type header field in the page header). Page types (m_type in page header): 1 – In-row data page – could be hotspotting in user databases, indicates problem with too many #temp table drops if object is sys.multiobjrefs 2 – Index page – All non-leaf clustered index pages and all non-clustered index pages 3 – Text mix page - Parts of LOB values plus internal parts of text trees 4 – Text tree page – Large chunks of LOB data from a single value 7 – Sort page 8 – GAM 9 – SGAM 10 – IAM – Index Allocation Map – A bitmap similar to the GAM and SGAM for tracking all extents within the 4GB file block for an index or allocation unit 11 – PFS 13 – Boot page – Database info, only one per database (file 1 page 9) 15 – File header 16 – Diff map page – (DCM or diff change map) – tracks which extents in the GAM interval have changed since last full backup 17 – ML map page – (BCM or bulk change map) – tracks which extents in the GAM interval have changed in bulk-logged mode since last full backup Who fixes it? For GAM, SGAM, and PFS contention issues: DBAs add more data files. Hotspotting: Tuning options include shrinking row sizes (varchar vs. char, char vs. nchar, tinyint/smallint/int/bigint, smalldatetime/datetime) and SQL Server compression so fewer pages are allocated to a table for the same amount of data For the tempdb table-drop issue, change SQL calls to not drop temp tables. Just let them go out of scope and be cleaned up by the deferred-drop mechanism More Info: Look in sys.dm_exec_requests to watch for statements waiting on PAGELATCH waits. Check the wait_resource columns for the page on which contention is occurring. It will look like 2:1:103, the format is db_id:file_id:page_id, and those three values are the first three parameters of DBCC PAGE(db_id, file_id, page_id[, printopts]). In the example, you can see the page header with: DBCC TRACEON(3604) DBCC PAGE (2, 1, 103, 0)

15 First Extent (8 pages – 64KB)
Data File Structure The base unit of data storage is called a page. All pages are 8KB (8192 bytes) Pages are organized into 8-page extents of 64KB Page 0 Page 0 Head Page 1 Page 1 PFS Page 2 GAM Page 2 Page 3 SGAM Page 3 Page 4 empty Page 5 empty Page 6 DCM Page 7 BCM Page 8 Page 9* Page 10 Page 11 Page 15 First Extent (8 pages – 64KB)

16 Bitmap Metadata Pages 8,096 bytes = 64,768 bits
GAM, SGAM, BCM, DCM, IAM, …….. 96b Header P.0 P.1 P.2 P.3 P.4 P.5 P.6 P.7 P.8 P.9 P.10 P.11 P.12 P.13 P.14 P.15 P.16 P.17 P.18 P.19 P.20 P.21 P.22 P.23 P.24 P.25 P.26 P.27 First Extent Second Extent …. 8,096 bytes = 64,768 bits 1 bit/extent = 64,768 extents 64,768 extents * 64KB/extent = ~4GB of disk space per single GAM

17 File Space Allocation Proportional Fill in action Free Space File 1

18 File Space Allocation Proportional Fill in action File 1 Free Space

19 Multiple-File Benefits
Multi-core systems introduce contention issues File 1 File 2 File 3 File 4 1 GAM and 1 SGAM for every 64,768 extents (4GB of file space). All allocations in that space affect the GAM. Page 0 File Header Page 1 PFS Page 1 PFS Page 2 GAM Page 3 SGAM 0 GB There are 64 PFS pages in the same 4GB (every 8,088 pages). Usually not an issue, except in TempDB 4 GB

20 PAGEIOLATCH_* - I/O Waits
Examples: PAGEIOLATCH_SH, PAGEIOLATCH_EX Indicates: Scanning from disk, may also indicate low memory or slow read I/O Who fixes it? Developers fix scans, operations adds memory and checks disk performance More info: check the waits query for waiting statements and check the plans

21 Query Engine vs. Storage Engine
What you normally talk to Runs queries Processes data Only operates on memory Physical storage - unaware Calls Storage Engine for data not in memory, writing changes to disk Storage Engine All disk activity Pulls data from disk and places in memory for the Query Engine Pulls changed data from memory to store on disk

22 PAGEIOLATCH – More Detail
Release LATCH_EX Acquire Latch_SH Read Page 1:100 LATCH_EX The BUF Call Async IO Fetch LATCH_SH The BUF Read Page 1:100 PAGEIOLATCH_SH Wait PAGEIOLATCH_EX Wait BUF (64 bytes) Data cache: Page 1:100

23 Sublatches and SuperLatches
Multi-core issue: several separate threads reading (LATCH_SH) the same page. Perf trick: give each core a copy of the BUF This is a sublatch. Need to coordinate them all with a single ‘master’ copy. This is a superlatch. Great for reads. Bad for writes.


Download ppt "What Is a Latch? …and Why Do I Care? Eddie Wuerch, mcm"

Similar presentations


Ads by Google