Download presentation
Presentation is loading. Please wait.
1
SQL Server Availability Groups
TechReady 23 3/28/2020 3:36 PM March 13th, 2018 STL SQL Server User Group SQL Server Availability Groups Luke Newport Technical Specialist © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
2
Session objectives and takeaways
Tech Ready 15 3/28/2020 Session objectives and takeaways Assumes some basic familiarity with Availability Groups From the viewpoint of a Microsoft Support Engineer Overview of HA/DR in SQL Server AlwaysOn Availability Groups in SQL Server 2012 through 2017 Log Internals Log Pool design and behavior Log Block behavior in Synchronous and Asynchronous Mode Common customer support scenarios Troubleshooting Self – study and lab resources © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
3
Overview of HADR in SQL Server / Azure
AlwaysOn Availability Groups AlwaysOn Failover Clustering Database Mirroring SQL Replication Log Shipping Other Considerations High Availability and Disaster recovery Availability replicas running across multiple datacenters in private or public cloud scenarios for high availability or disaster recovery. Link for details High Availability only Failover Clustering running WSFC using shared storage architecture. Protects SQL Server at the instance level, does not protect for data loss, data corruption. Link for details Provides HA/DR Database mirroring is a solution for increasing the availability of a SQL Server database. Mirroring is implemented on a per-database basis and works only with databases that use the full recovery model. This feature will be removed in a future version of Microsoft SQL Server. Use Always On Availability Groups instead. Link for details Not commonly used for HA/DR scenarios Typically complemented by another AlwaysOn technologies 3 Types: Transactional / P2P Transactional Merge Snapshot Link for details Provides DR scenarios DR for a single primary database and one or more secondary databases, each on a separate instance of SQL Server. User-specified delay between when the primary server backs up the log of the primary database and when the secondary servers must restore (apply) the log backup. Link for details Azure Site Recovery Site Recovery makes it easy to handle replication, failover and recovery for your on-premises workloads and applications. Can replicate on- premises servers, Hyper-V virtual machines, and VMware virtual machines. Link for details Backup to Azure Backup SQL Server on premises or IAAS databases to Azure Blob Storage
4
Always On Availability Groups SQL 2012 through SQL 2017
TechReady 23 3/28/2020 3:36 PM Always On Availability Groups SQL 2012 through SQL 2017 © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
5
Availability Group Windows Server Failover Cluster
3/28/2020 3:36 PM Availability Group Windows Server Failover Cluster Replica is hosted by a SQL Server instance A Secondary Replica A Primary Replica A group of databases that fail over as a unit The AlwaysOn Availability Groups feature was introduced in SQL Server It is a high-availability and disaster-recovery solution that provides an enterprise-level alternative to database mirroring. AlwaysOn Availability Groups maximizes the availability of a set of user databases for an enterprise. The image shows a standalone SQL Server instance configured as a Primary Replica of an availability group (A) with a Synchronous Secondary Replica configured at another standalone SQL Server instance. An availability group is a Windows Server Failover Cluster (WSFC) resource group. It defines a set of failover partners known as replicas: a primary replica and one or more secondary replicas. A replica is the location of one or more availability databases that are defined in the availability group. Although availability groups use WSFC, they do not require the SQL Server instance to be clustered. Instead, they use WSFC resource health detection and failover capabilities to manage a group of databases. Related DMV: sys.availability_groups For additional information refer to “Overview of AlwaysOn Availability Groups (SQL Server)” ( Unit of High Availability AG1 (DB1, DB2) AG1 (DB1, DB2) © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
6
Availability Replica Windows Server Failover Cluster Primary Replica
3/28/2020 3:36 PM Availability Replica Windows Server Failover Cluster Primary Replica A Secondary Replica A Every other replica is Secondary Active replica is called Primary Secondary can be one of the following Each replica hosts a copy of the dbs in the AG Each availability group defines a set of two or more failover partners known as availability replicas. Each availability replica hosts a copy of the availability databases in the availability group. For a given availability group, the availability replicas must be hosted by separate instances of SQL Server residing on different nodes of a WSFC cluster. A given SQL Server instance can host only one availability replica per availability group. However, each SQL Server instance can be used for many availability groups. A given instance can be either a stand-alone SQL Server instance or a SQL Server failover cluster instance (FCI). Every availability replica is assigned an initial role—either the primary role or the secondary role, which is inherited by the availability databases of that replica. The role of a given replica determines whether it hosts read-write databases or read-only databases. One replica, known as the primary replica, is assigned the primary role and hosts read-write databases, which are known as primary databases. At least one other replica, known as a secondary replica, is assigned the secondary role. A secondary replica hosts read-only databases, known as secondary databases. Each of the secondary replicas can be configured with one of the 3 readable options: Not Readable – The replica is not available for read only workload when acting as Secondary Readable – The replica is available for all connections with read only workload when acting as Secondary Read-Intent – The replica is available for only those connections that specify the Applicationintent = ReadOnly property in their connection string. Related DMV: sys.availability_replicas For additional information refer to “Availability Replicas” ( us/library/ff aspx#AGsARsADBs). Readable / Non-Readable / Read-Intent AG1 AG1 © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
7
Availability Groups in SQL Server 2012
3/28/2020 Availability Groups in SQL Server 2012 HA/DR for groups of DBs DBs failover together from one SQL Server to another, up to 4 secondaries Depends on Windows Clustering Failovers can be automatic or manual Automatic: SQL Server failures (software/hardware/network) Manual: For patching or upgrade SQL Server replicates transactions from primary to secondaries Physical replication (log records are captured, delivered, stored, and applied) Don’t need shared storage (SAN) Replicas can be synchronous or asynchronous Synchronous: No data loss. Generally in same or close region (<20 ms RTT) Async: Potential data loss. Generally in remote region, or for read scale out © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
8
Availability Groups in SQL Server 2014
3/28/2020 Availability Groups in SQL Server 2014 Increased number of secondaries from 4 to 8 Read-scale out (7x faster than Replication) <2% performance impact Increased Readable Secondaries Availability Read workloads remain available despite disconnections to primary or quorum loss SSMS Wizard to add replicas in Azure VM Easy and cost-effective solution for disaster recovery 9 replicas, 1 primary 8 secondaries Local region copies Secondary used to freak out and disconnect connections if the secondary loses connections to the primary © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
9
Availability Groups in SQL Server 2016
3/28/2020 Availability Groups in SQL Server 2016 3 Automatic Failover Replicas Database-level Failure Detection Support GMSAs (Group Managed Service Accounts) Higher Availability Support DTC (Distributed Transaction Coordinator) Read tables with column-store indexes in secondaries Orthogonality Much higher Synchronization Throughput Scalability Replicas in different domains or domain-less Distributed Availability Groups Flexibility Load Balancing of Read Workloads Automatic Secondary Replicas Seeding Ease of Use Basic Availability Groups in SQL Standard Edition Licensing Overview slide, move on to details © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
10
Higher Availability : SQL Server 2016
3/28/2020 Higher Availability : SQL Server 2016 3 Automatic Failover Replicas Maintain availability despite 2 failed replicas Listener connects to primary as before Failover priority defined in Windows Cluster’s preferred owners AG_Listener Replica1 (Primary) DB Replica2 (Secondary) DB Replica3 (Secondary) DB Increases AG availability (Secondary) (Primary) (Secondary) (Primary) © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
11
Higher Availability : SQL Server 2016
3/28/2020 Higher Availability : SQL Server 2016 Database-level Failure Detection Automatic failover if any user database in the Availability Group goes offline unexpectedly (e.g. due to inaccessible database files) This is an option on top of the Availability Group Health Level (evaluates server health using sp_server_diagnostics) CREATE AVAILABILITY GROUP ag_name WITH ( FAILURE_CONDITION_LEVEL = { 1 | 2 | 3 | 4 | 5 }, DB_FAILOVER = { ON | OFF } ) Server level prior to 2016 Service is down Query level errors 5. Deadlocks, errors, etc MDF DB LDF Ensures database availability © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
12
Higher Availability : SQL Server 20176
3/28/2020 Higher Availability : SQL Server 20176 Support GMSAs (Group Managed Service Accounts) Active Directory handles password changes for service accounts without restarting SQL services 120-character passwords Requires Windows Server Avoids downtime Simplifies management © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
13
Orthogonality : SQL Server 2016
3/28/2020 Orthogonality : SQL Server 2016 Support DTC (Distributed Transaction Coordinator) Ensures atomicity of distributed transactions, even if active when AG fails over Distributed transactions between different SQL Servers; or SQL Server and other DTC-complaint servers (e.g. Oracle Database or WebSphere MQ) AG registers databases in DTC with a GUID Requires Windows Server 2012 R2 + Replica1 (Primary) DB Replica2 (Secondary) DB We couldn’t guarantee that a distributed transaction in flight would be atomic should a failover occur while in flight. CREATE AVAILABILITY GROUP ag_name WITH ( DTC_SUPPORT = { PER_DB | NONE } ) Important for legacy enterprise applications © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
14
Scalability : SQL Server 2016
3/28/2020 Scalability : SQL Server 2016 Much Higher Synchronization Throughput 50 MB/s to 500 MB/s (10X) Multi-threaded log compression/decompression New compression function: LZ4 Multi-threaded log redo Keeps secondaries synchronized irrespective of workload size (low RTO) Customer hit about a 100 mb/s limit in 2012 timeframe. Important for heavy workloads on fast I/O cards © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
15
Flexibility : SQL Server 2016
3/28/2020 Flexibility : SQL Server 2016 Replicas in different domains or domain-less SQL Server management doesn’t change Windows Cluster nodes are configured with certificate-based authentication (like Database Mirroring) Requires Windows Server 2016 Important for different organizational domains AG_Listener Domain A Domain B Replica1 (Primary) DB Replica2 (Secondary) DB Replica3 (Secondary) DB © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
16
Flexibility : SQL Server 2016
3/28/2020 Flexibility : SQL Server 2016 Distributed Availability Groups One Availability Group can synchronize to one or more Availability Groups Reduces primary replica’s network and CPU usage Availability Groups are in different Windows Clusters Flexible management of Windows Clusters (e.g. independent quorum) Uni-directional synchronization (only one master replica), uses different listeners CREATE AVAILABILITY GROUP dist_ag WITH (DISTRIBUTED) AVAILABILITY GROUP ON 'ag1' WITH ( LISTENER_URL = 'tcp://ag1-listener:5022', AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT, FAILOVER_MODE = AUTOMATIC), 'ag2' WITH ( FAILOVER_MODE = AUTOMATIC) groups Important for large-scale deployments © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
17
Important for customers with heavy read workloads
3/28/2020 Ease of Use : SQL Server 2016 Load Balancing of Read Workloads Read connections (specifying application_intent=‘read_only’) are load balanced between readable secondaries Integrated load balancing (instead of DNS-based or F5) Simple load balancing: round robin Possible to specify prioritized load balancing groups (e.g. local first) Important for customers with heavy read workloads © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
18
Ease of Use : SQL Server 2016 Load Balancing of Read Workloads DR Site
3/28/2020 Ease of Use : SQL Server 2016 Load Balancing of Read Workloads READ_ONLY_ROUTING_LIST= ((‘Replica2’,’Replica3’,’Replica4’), ’Replica5’) Replica2 (secondary) Replica5 (secondary) Replica1 (Primary) Replica3 (secondary) Sets, load balance across locals first, and if those are unavailable then go remote. Primitive, round robin. Replica4 (secondary) DR Site Primary Site © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
19
Useful for large databases in large-scale environments*
3/28/2020 Ease of Use : SQL Server 2016 Automatic Secondary Replicas Seeding Initialize the database copies in secondary replicas via log streaming (instead of backup/restore) Uses new much faster synchronization Can seed from a local Availability Group replica in a Distributed Availability Group (instead of having to seed from remote master replica) Useful for large databases in large-scale environments* © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
20
Critical to customers requiring HA for SMB apps
3/28/2020 Licensing : SQL Server 2016 Basic Availability Groups in SQL Standard Edition Basic HA solution in SQL Standard (replaces Database Mirroring) 2 replicas only Sync or Async (can be in Azure) Not Readable Automatic or Manual failover 1 DB per AG Everything else is the same (high synchronization throughput, domain-less support, streaming seeding, …) Critical to customers requiring HA for SMB apps © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
21
Availability Groups in SQL Server 2017
TechReady 23 3/28/2020 3:36 PM Availability Groups in SQL Server 2017 Clusterless Availability Groups – Manual Failover Cross-Platform Availability Groups – Clusterless Required Read Scale Availability Groups – Scale out Clusterless Secondaries Full DTC and Cross Database support REQUIRED_SYNCHRONIZED_SECONDARIES_TO_COMMIT Configuration only replica Cluster Type = WSFC, EXTERNAL, NONE* Read-scale availability groups AlwaysOn enhancements in SQL Server 2017 DTC Support for cross database transactions In SQL Server 2016 and before, cross-database transactions within the same SQL Server instance are not supported for availability groups. This means that no two databases in a cross-database transaction may be hosted by the same SQL Server instance. This is true even if those databases are part of the same availability group. SQL Server 2017 supports distributed transactions for databases in availability groups. This support includes cross-database transactions, for example, databases on the same instance of SQL Server. Specific cases where distributed transactions are not supported include: In SQL Server 2016 and prior, where more than one database involved in the transaction is in the same availability group. In SQL Server 2016 and prior, where at least one database is in an availability group and another database is on the same instance of SQL Server. Where the availability group was not created with enable distributed transaction. Database mirroring. Cluster Type None SQL Server 2017 introduces new features for availability groups. CLUSTER_TYPE Use with CREATE AVAILABILITY GROUP. Identifies the type of server cluster manager that manages an availability group. Can be one of the following types:+ WSFC Winows server failover cluster. On Windows, it is the default value for CLUSTER_TYPE. EXTERNAL A cluster manager that is not Windows server failover cluster - for example, on Linux with Pacemaker. NONE No cluster manager. Used for a read-scale availability group. For more information about these options, see CREATE AVAILABILITY GROUP or ALTER AVAILABILITY GROUP. Create an availability group without a cluster to support read-scale workloads. See Read-scale availability groups. Required secondaries to commit Use REQUIRED_COPIES_TO_COMMIT with CREATE AVAILABILITY GROUP or ALTER AVAILABILITY GROUP. When REQUIRED_COPIES_TO_COMMIT is set to a value higher than 0, transactions at the primary replica databases will wait until the transaction is committed on the specified number of synchronous secondary replica database transaction logs. If enough synchronous secondary replicas are not online, all connections to primary replica will be rejected until communication with sufficient secondary replicas resume. © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
22
Demo – AG Seeding, Cluster Types, ROR in GUI
TechReady 23 3/28/2020 3:36 PM Demo – AG Seeding, Cluster Types, ROR in GUI © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
23
Data Movement TechReady 23 3/28/2020 3:36 PM
© 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
24
Data Synchronization Internals
3/28/2020 3:36 PM Data Synchronization Internals How the replication of transaction log blocks from the primary replica to a secondary replica works in sync-commit and async-commit modes What happens when secondary replica goes offline What happens when primary replica goes offline How automatic, manual and forced failovers work © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
25
A Brief Transaction Log Introduction…
3/28/2020 A Brief Transaction Log Introduction… Captures changes that occur in the database Changes *mostly* occur here first Later are persisted in the data files Implements Atomicity & Durability ACID properties Written to sequentially, usually read sequentially You ONLY need one of these Memory Optimized Table logging semantics are a bit different © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
26
Log Blocks The atomic unit of physical commit to a log file
Header The atomic unit of physical commit to a log file Contains a Header, Log Records, and a Slot Array Each block ranges in size from 512 bytes to 60K noRecords, blockSize, prevBlockSize, ... Record 1 Record 2 Record 3 Slot Array
27
Log Records An atomic database change
3/28/2020 Log Records An atomic database change Uniquely identified by a Log Sequence Number (LSN) Not only associated with committed transactions Contains information to Redo or Undo a transaction Generated irrespective of recovery model © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
28
Log Flushes waiters Background Tasks LOG WRITER TLog Current Buffer
On hidden scheduler Waits on LOGMGR_QUEUE Signaled on a LC Flush Processes completed Log Writes Current Buffer waiters Current Buffer Current Buffer Background Tasks LOG WRITER TLog
29
Sys.dm_logpool_hashentries
MEMORYCLERK_SQLLOGPOOL Enhances performance of operations that read the Transaction Log Log blocks pushed into Log Pool based on consumers. Lookup done based on private cache and hashed entries Consumers: Repl, HADR, Recovery, Rollback, fn_dblog Log Pool ( Per Instance) - sys.dm_logpool_stats Sys.dm_logconsumer_privatecachebuffers ( PerDB) Hash Table Sys.dm_logpool_hashentries Cache Cache IOBuf Sys.dm_logconsumer_cachebufferref (Shared - Per DB) LB Log Block LB KB Out of Memory Errors Due to Log Pool Tlog Log Manager ( per DB)
30
Data Synchronization in Synchronous mode
3/28/2020 3:36 PM Data Synchronization in Synchronous mode Windows Server Failover Cluster 10X increase in synchronization throughput in SQL Server 2016 due to performance improvements Primary Replica Secondary Replica A A SQLINST1 SQLINST2 T1 New in SQL Server 2016: Parallel decompression and redo on log records on secondary Begin Tran Insert into Employee() values (2, ‘Bob’) Commit AG1 (DB1, DB2) AG1 (DB1, DB2) Redo Thread Log Capture Network Log Receive SQLINST1 Buffer Pool Log Cache Log Pool The above slide gives an overview about how the replication of transaction log blocks from the primary replica to a secondary replica works in synchronous-commit mode. Note: For synchronous commit to occur, both the current primary replica and the secondary replica in question must be configured for synchronous commit. Data synchronization in synchronous mode works as follows: Primary replica generates transaction log blocks. The secondary initiates a request to the primary, asking for the log blocks to be shipped. Log Block is a contiguous chunk of memory (512 bytes to 60k), maintained by the Log Manager. The secondary and primary will negotiate the proper LSN staring point and other necessary information. Primary replica’s log cache is being filled with these log blocks. When the log block becomes full or the primary replica issues a commit operation, the log block from the log buffer is flushed to the disk on the primary to make it persistent. Since we are running in an Always On Availability Group configuration, when the log block is being flushed to the disk on the primary replica, it is simultaneously copied to the log pool. The log blocks in the log pool are read by a thread called log capture and its job is to read the log blocks from log pool and send them to the secondary replica. If there are multiple secondary replicas, there is one log capture thread for each of those replicas which ensures that the log blocks are sent across multiple replicas in parallel. In SQL Server 2012, 2014 and 2016 asynchronous-commit mode, the log content gets compressed and encrypted before being sent over to the secondary replicas. In SQL Server 2016 synchronous-commit mode, the log blocks are not compressed by default - to enhance performance throughput. There is a thread called log receive that is running on the secondary replica. It receives the log blocks from the network and it starts writing to the log cache, and then flushed to disk. This completes the "hardening phase" and at this point the secondary will send an acknowledgement back to the primary indicating it has hardened the log block. While the log blocks are being written to the log cache, there's a redo thread that is always running on the secondary replica. It is reading those log blocks and applying those changes to the data pages and the index pages in the database on the secondary to bring it up to date with whatever has happened on the primary replica. The REDO phase is done after the acknowledgement is sent to the primary during the hardening phase. If the secondary replica is configured to run in synchronous mode, it will send an acknowledgement on the commit to the primary node indicating that it has hardened the transaction, and so it is safe to tell the user that the transaction is committed. And because the log has been hardened on the secondary, there is a guarantee that in case there is a failover, there's no data loss. It is important to note that REDO is done asynchronously from HARDENING. The redo thread is continuously applying those transaction log blocks. And it is running independently of how log blocks are being generated on the secondary or being copied and persisted. If the redo thread is running few minutes behind, those log blocks may not be available in the log cache. In that case, it will pick up those log blocks from the log disk, and that is what is shown in the dotted line on the right side of the slide. Prior to SQL Server 2016, the redo process was executed serially by a single thread and therefore bound to a single CPU core. However starting, SQL Server 2016, the redo is executed in parallel to make use of all the available CPU cores (up to 4 threads per database). This improvement increases the freshness of data on the secondary and improves the database recovery times on failover. Note: The synchronization throughput of availability groups has increased ~10x due to improvements in the data synchronization process. The performance improvements include parallel and faster compression of log blocks on the primary replica, an optimized synchronization protocol, and parallel decompression and redo of log records on the secondary replica. For more information, refer to “AlwaysON - HADRON Learning Series: - How does AlwaysON Process a Synchronous Commit Request” ( learning-series-how-does-alwayson-process-a-synchronous-commit-request.aspx). For more information, refer to “High Availability Enhancements Section” ( us/library/bb aspx ) Log Block Log Cache New in SQL Server 2016: Parallel and faster compression of log blocks on the primary Harden Harden Acknowledge Commit New in SQL Server 2016: Optimized synchronization protocol Tlog Tlog © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
31
Data Synchronization in Asynchronous mode
3/28/2020 3:36 PM Data Synchronization in Asynchronous mode Windows Server Failover Cluster 10X increase in synchronization throughput in SQL Server 2016 due to performance improvements Primary Replica Secondary Replica A A SQLINST1 SQLINST2 T1 New in SQL Server 2016: Parallel decompression and redo on log records on secondary Begin Tran Insert into Employee() values (2, ‘Bob’) Commit AG1 (DB1, DB2) AG1 (DB1, DB2) Redo Thread Log Capture Network Log Receive SQLINST1 Buffer Pool Log Cache Log Pool The above slide gives an overview of how the replication of transaction log blocks from the primary replica to a secondary replica works in an asynchronous mode. The process is similar to synchronous mode except that the acknowledge message of a successful commit is sent after the log blocks are persisted on the primary replica’s transaction log. Primary replica generates transaction log blocks. The secondary initiates a request to the primary, asking for the log blocks to be shipped. Primary’s log cache is being filled with these log blocks. When the log block becomes full or the primary replica issues a commit operation, the transaction log block from the log buffer are flushed to the disk to make it persistent. Since we are running in an AlwaysOn Availability Group configuration, when the log block is being flushed to the disk on the primary replica, those log blocks also get copied to log pool. If all secondary replicas are in an asynchronous availability mode, the success of this step is good enough to send an acknowledge message of a successful commit back to the application when the I/O to the local transaction log is successfully executed. The log blocks in the log pool are read by a thread called log capture. In SQL Server 2012, 2014 and 2016 asynchronous-commit mode, log blocks are compressed and encrypted before being sent over to the secondary replicas. In SQL Server 2016 synchronous-commit mode the log blocks are not compressed by default - to enhance performance throughput. There is a thread called log receive that is running on the secondary replica. It receives the log blocks from the network and it starts writing to the log cache and then flushed to disk. This completes the “hardening phase”. In asynchronous mode, periodically status messages are sent back to the primary to let the primary know to what point the secondary has hardened. An acknowledgement is not sent for every log block. While the log blocks are being written to the log cache, there's a redo thread that is always running on the secondary replica. It is reading those log blocks and applying those changes to the data pages and the index pages in the database on the secondary to bring it up to date with whatever has happened on the primary. The REDO phase is done after the hardening phase and independent of hardening. It is important to note that REDO is done asynchronously from hardening. The REDO thread is continuously applying those transaction log blocks. And it is running independently of how log blocks are being generated on the secondary or being copied and persisted. If the redo thread is running few minutes behind, those log blocks may not be available in the log cache. In that case, it will pick up those log blocks from the log disk, and that is what is shown in the dotted line on the right side of the slide. Prior to SQL Server 2016, the redo process was executed serially by a single thread and therefore bound to a single CPU core(up to 4 threads per database). However starting, SQL Server 2016, the redo is executed in parallel to make use of all the available CPU cores. This improvement increases the freshness of data on the secondary and improves the database recovery times on failover. Note: The synchronization throughput of availability groups has increased ~10x due to improvements in the data synchronization process. The performance improvements include parallel and faster compression of log blocks on the primary replica, an optimized synchronization protocol, and parallel decompression and redo of log records on the secondary replica. Log Block 1 Log Cache New in SQL Server2016: Parallel and faster compression of log blocks on the primary Harden Harden Acknowledge Commit New in SQL Server 2016: Optimized synchronization protocol Tlog Tlog © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
32
What happens when - Secondary Replica goes offline
3/28/2020 3:36 PM What happens when - Secondary Replica goes offline Primary replica will hold the transaction log blocks Transaction log on primary and other healthy secondary replicas will grow Solution Fix the issue and bring the secondary replica back online or Remove the replica from the AG Same rules apply to a secondary replica database if it becomes unhealthy While a secondary replica is unhealthy (i.e. it goes offline or gets disconnected), it is still part of the availability group. This means that any transaction log entries that are not hardened by all the replicas in an AG are retained by the primary replica database. This ensures that when the secondary replica comes back online, it can receive all the log blocks that it hasn’t hardened yet. This can cause the transaction log on the primary and other healthy secondary replicas to grow and fill the disk. If this happens, either you have to fix the secondary replica and bring it back online so it will start accepting those log blocks or you need to remove the replica from the availability group. Once the unhealthy replica is taken out of the availability group, the primary replica database doesn’t have to hold those log blocks anymore and can overwrite them, thus reusing the transaction log. The same rules apply to a secondary replica database if it becomes unhealthy. If a secondary replica database is taken out of the availability group or if a whole secondary replica is taken out of the availability group, while adding it back, every secondary replica database must have the latest full backup, latest differential log backup and all the transaction log backups since the last backup applied to it. This ensures that any transaction log entries that were truncated from the transaction log of the primary replica database after the secondary replica database was taken out of the availability group have been applied to the secondary replica database before it is introduced back in the availability group. For a very large database or a busy database with frequent transaction log backups, this could be difficult to achieve during business hours. In such cases, it may be beneficial to add the databases back during off hours. © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
33
Synchronous Secondary Replica goes offline
3/28/2020 3:36 PM Synchronous Secondary Replica goes offline Windows Server Failover Cluster Start waiting on acknowledgement from Secondary Primary Replica Secondary Replica Transaction Log Block Transaction Log Blocks A A SYNCHRONIZED SYNCHRONIZING NOT SYNCHRONIZED SQLINST2 Last Hardened LSN SQLINST1 End-Of-Log (EOL) LSN Last Redone LSN Harden Last Hardened LSN The above slide gives an overview of how a synchronous secondary replica re-synchronizes with the primary replica after it goes offline and comes back online. When the synchronous secondary replica goes offline, it’s status changes from synchronized to not synchronized. As soon as it changes it’s status, the primary replica stops waiting for an acknowledgement that the secondary has hardened a commit and starts treating it as an asynchronous replica. This ensures that commits on the primary replica won’t be delayed by an unhealthy synchronous secondary replica. Once the secondary replica is brought back online, it establishes a connection with the primary replica and sends its End of Log (EOL) LSN to the primary replica. On receiving this, the primary then starts sending it the log blocks that it hardened after the EOL LSN. As soon as the secondary starts receiving and hardening these log blocks, its status changes to Synchronizing. This indicates that the secondary replica is connected to the primary and is catching up (i.e. it is essentially behaving as an asynchronous replica). The secondary replica keeps hardening the log blocks, keeps applying the hardened transactions with the REDO thread and keeps sending this information back to the primary replica. This goes on until the Last Hardened (LH) LSN of the primary and secondary replica match. As soon as they do, the status of the secondary replica changes to Synchronized and from that point onwards, primary replica starts treating it as a synchronous replica. Primary replica starts waiting on an acknowledgement for the commit from the secondary replica before letting the user know that the transaction has been committed successfully. Last Redone LSN UNTIL Last Hardened LSN of primary = Last Hardened LSN of secondary AG1 (DB1, DB2) AG1 (DB1, DB2) Redo Tlog © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
34
Asynchronous Secondary Replica goes offline
3/28/2020 3:36 PM Asynchronous Secondary Replica goes offline Windows Server Failover Cluster Primary Replica Secondary Replica Transaction Log Block Transaction Log Blocks A A SYNCHRONIZING NOT SYNCHRONIZED SYNCHRONIZING SQLINST1 Last Hardened LSN SQLINST2 End-Of-Log (EOL) LSN Last Redone LSN Harden Last Hardened LSN The above slide gives an overview of how an asynchronous secondary replica re-synchronizes with the primary replica after it goes offline and comes back online. When the asynchronous secondary replica goes offline, it’s status changes from synchronizing to not synchronized. Primary replica responds in the same way. Once the secondary replica is brought back online, it establishes a connection with the primary replica and sends its End of Log (EOL) LSN to the primary replica. On receiving this, the primary then starts sending it the log blocks that it hardened after the EOL LSN. As soon as the secondary starts receiving and hardening these log blocks, its status changes to Synchronizing. This indicates that the secondary replica is connected to the primary and is catching up. Last Redone LSN AG1 (DB1, DB2) AG1 (DB1, DB2) Redo Tlog © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
35
What happens when - Primary Replica goes offline
3/28/2020 3:36 PM What happens when - Primary Replica goes offline Failover target takes over the primary role Three types of failover exist Automatic failover (without data loss) Planned manual failover (without data loss) Forced failover (with possible data loss) Starting in SQL Server2016, you can now configure Availability Groups to failover when a database goes offline. Asynchronous-commit mode Synchronous-commit mode with manual failover mode Synchronous-commit mode with automatic failover mode Automatic failover No Yes Planned manual failover Forced failover Yes* When a primary replica goes offline, the failover target takes over the primary role, recovers its databases, and brings them online as the new primary databases. The former primary replica, when available, switches to the secondary role, and its databases become secondary databases. Three forms of failover exist: Automatic failover (without data loss) Planned manual failover (without data loss) Forced failover (with possible data loss) The table in the above slide summarizes which forms of failover are supported under different availability and failover modes. * If you issue a forced failover command on a synchronized secondary replica, the secondary replica behaves the same as for a manual failover. Note: Starting SQL Server 2016, you can now configure Availability Groups to failover when a database goes offline. This change requires the setting of DB_FAILOVER option to ON. For more information, refer to “Failover and Failover Modes (AlwaysOn Availability Groups)” ( © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
36
Automatic Failover Conditions required
3/28/2020 3:36 PM Automatic Failover Conditions required Primary replica and a secondary replica are both configured for synchronous-commit mode and set to Automatic failover Secondary replica is Synchronized WSFC has quorum Primary replica is unavailable and failover-conditions levels have been met Automatic failover occurs only under the following conditions: An automatic failover set exists. This set consists of a primary replica and a secondary replica (the automatic failover target) that are both configured for synchronous-commit mode and set to Automatic failover. If the primary replica is set MANUAL failover, automatic failover cannot occur, even if a secondary replica is set to Automatic failover. The Windows Server Failover Clustering (WSFC) cluster has quorum. The primary replica has become unavailable, and the failover-condition levels defined by your the flexible failover policy have been met. For more information, refer to “Conditions Required for an Automatic Failover” ( us/library/hh aspx#RequiredConditions). © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
37
How does Automatic Failover Work?
3/28/2020 3:36 PM How does Automatic Failover Work? Windows Server Failover Cluster Primary Replica Secondary Replica 2 Health Check Detects Failure A Secondary Replica 1 A A Restart resource If both the secondary synchronous-commit replicas are SYNCHRONIZED and healthy and multiple automatic failover targets have been configured, which replica will the Cluster choose to failover to? Health Check Detects Health Resource fails Automatic Failover The above slides show an availability group solution that has two synchronous-commit replicas and one asynchronous- commit replica. Health check detects failure in the primary replica and causes automatic failover. An automatic failover initiates the following sequence of actions: If the server instance that is hosting the current primary replica is still running, it changes the state of the primary databases to DISCONNECTED and disconnects all clients. If any log records are waiting in recovery queues on the target secondary replica, the secondary replica applies the remaining log records to finish rolling forward the secondary databases. The former secondary replica transitions to the primary role. Its databases become the primary databases. The new primary replica rolls back any uncommitted transactions (the undo phase of recovery) as quickly as possible. Locks isolate these uncommitted transactions, allowing roll back to occur in the background while clients use the database. This process does not roll back any committed transactions. Until a given secondary database is connected, it is briefly marked as NOT_SYNCHRONIZED. Before the rollback recovery starts, secondary databases can connect to the new primary databases and quickly transition to the SYNCHRONIZED state. The best case is usually for a third synchronous-commit replica that remains in the secondary role after the failover. Later, when the server instance that is hosting the former primary replica restarts, it recognizes that another availability replica now owns the primary role. The former primary replica transitions to the secondary role, and its databases become secondary databases. The new secondary replica connects to the current primary replica and catches its database up to the current primary databases as quickly as possible. As soon as the new secondary replica has resynchronized its databases, failover is again possible, in the reverse direction. For more information, refer to “How Automatic Failover Works” ( us/library/hh aspx#HowAutoFoWorks). If both the secondary replicas are SYNCHRONIZED and healthy and multiple automatic failover targets have been configured, which replica will the Cluster choose to failover to? Windows Cluster will attempt to failover to the next replica in the preferred owner list of the availability group, which dictates attempted failover order. The preferred owner list of the availability group role in Cluster will list all the availability group replicas defined for automatic failover. There is no setting in SQL Server for configuring the preferred owner list settings of the availability group role. In addition, resetting the priority of the nodes in the preferred owner list of the availability group role in Failover Cluster Manager is not recommended. Tests have shown you can change the Preferred Owner list in Failover Cluster, and the next automatic failover will abide by the modified preferred owner list. However, once SQL Server resets the preferred owner list, subsequent automatic failovers will proceed with the preferred owner priority set by SQL Server. As a workaround, configure automatic failover priority by adding replicas to availability group in preferred automatic failover priority. SQL Server will set the preferred owner list priority based on the order in which your availability group replicas are added to the availability group. This will allow you some control over what automatic failover partner Cluster attempts to failover to first. For more information, refer to “Multiple Automatic Failover Targets” ( multiple-automatic-failover-targets/ ) AG1 AG1 AG1 Synchronous Data Movement Asynchronous Data Movement Note: In SQL Server 2016, both synchronous secondary replicas can be configured as automatic failover partners with the primary replica. © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
38
Planned Manual Failover
3/28/2020 3:36 PM Planned Manual Failover Conditions required Primary replica must be set to synchronous-commit mode Secondary replica must be configured for synchronous-commit mode and synchronized with the primary replica You can manually fail over an AG using SQL Server Management Studio T-SQL PowerShell Sequence of actions for manual failover are very similar to automatic failover To support a manual failover, the current primary replica must be set to synchronous-commit mode and a secondary replica must be: Configured for synchronous-commit mode. Currently synchronized with the primary replica. To manually fail over an availability group, you must be connected to the secondary replica that is to become the new primary replica. You can manually fail over an availability group using SQL Server Management Studio T-SQL PowerShell For more information, refer to “Planned Manual Failover” ( us/library/hh aspx#ManualFailover) © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
39
Forced Failover Possible Data Loss
3/28/2020 3:36 PM Forced Failover Possible Data Loss Intended strictly for disaster recovery method After forced failover Secondary dbs are suspended Tlog truncation on primary db is delayed Must manually resume suspended dbs You can perform forced failover using SQL Server Management Studio T-SQL PowerShell A forced failover is a form of manual failover that is intended strictly for disaster recovery, when a planned manual failover is not possible. If you force failover to an unsynchronized secondary replica, some data loss is possible. Therefore, we strongly recommend that you force failover only if you must restore service to the availability group immediately and you are willing to risk losing data. After a forced failover, the failover target to which the availability group was failed over becomes the new primary replica. The secondary databases in the remaining secondary replicas are suspended and must be manually resumed. When the former primary replica becomes available, it transitions to the secondary role, causing the former primary databases to become secondary databases and transition into the SUSPENDED state. Before you resume a given secondary database, you might be able to recover lost data from it. However, notice that transaction log truncation is delayed on a given primary database while any of its secondary databases are suspended. Note: Data synchronization with the primary database will not occur until the secondary database is resumed. For more information, refer to “Perform a Planned Manual Failover of an Availability Group “ ( © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
40
How can a Forced Failover can cause Data Loss?
3/28/2020 3:36 PM How can a Forced Failover can cause Data Loss? Windows Server Failover Cluster Primary Replica Asynchronous Secondary Replica A A Failed Primary Perform a manual forced failover Tlog Manually Resume Synchronization The above slide gives an overview of how a forced failover causes data loss on the primary replica and how it can propagate to a secondary replica. Before the primary replica goes offline, the last hardened LSN on the primary replica is 100 while that of the asynchronous secondary replica is 50. After the primary replica goes offline and a forced failover is initiated, the secondary replica becomes the new primary replica and marks it’s last hardened LSN as 50. Once the old primary replica is brought online, it shows its synchronization as suspended. If the synchronization on the old primary is resumed, it synchronizes with the new primary, sends it’s last hardened LSN as 100 and when it sees that the last hardened LSN of the new primary is 50, it rolls back it’s transaction log to LSN 50 and from that LSN onwards, starts accepting the transaction log blocks from the primary replica. Thus data loss is propagated from the primary to the secondary replica if the synchronization is resumed. For more information, refer to “Forced Failover” ( “Perform a Forced Manual Failover of an Availability Group ” ( us/library/ff aspx#FollowUp) Synchronizing Secondary DBs Suspended AG1 (DB1, DB2) AG1 (DB1, DB2) Data Loss Last Hardened LSN = 50 Last Hardened LSN = 100 Last Hardened LSN = 50 Asynchronous Data Movement © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
41
Common Issues and Troubleshooting
TechReady 23 3/28/2020 3:36 PM Credit : Trayce Jordan Common Issues and Troubleshooting © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
42
Most common issues for failover
Quorum loss Lease timeout HealthCheck timeout SQL Dumps User initiated
43
Most common reasons for not failing over
One or more DBs not sync’d Secondary not connected WSFC cannot connect to SQL AG set for manual failover Exceeded failover thresholds
44
SQL/Cluster architecture & interactions
AlwaysOn AGs require & depend on Windows Server Failover Clustering (WSFC)*** The RHS.EXE process monitors SQL health. The RHS.EXE process also establishes a “lease” with SQL Server on the AG primary. If the cluster service stops on the AG primary, the AG goes offline. *** Obviously excluding Linux and read-scale
45
Root Cause Analysis If needed, then review these:
Review AlwaysOn Health extended events Review SQL Server error logs Review “cluster logs” Review System Health extended events Review SQL Diagnostic extended events Application event logs System event logs Cluster event logs Network event logs If needed, then review these:
46
Review AlwaysOn_Health* XEL files
Look for failover DDL events Look for lease timeout events
47
Review AlwaysOn_Health* XEL files
Look at all state changes to get timelines
48
Correlate SQL XEL, Error, and Cluster logs
49
Cluster Log cluster /log g – cluster.log %windir%\Cluster\Reports
Anatomy Cluster Log cluster /log g – cluster.log %windir%\Cluster\Reports Get-ClusterLog (-destination parameter) Verbose log options
50
Demos AlwaysOn Extended Events System Health Extended Events SQLDIAG
TechReady 23 3/28/2020 3:36 PM Demos AlwaysOn Extended Events System Health Extended Events SQLDIAG Troubleshooting Script Library © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
51
3/28/2020 Q&A © 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.