Distributed Availability Groups

Distributed Availability Groups
Jennifer Brocato SQLSaturday#845 Atlanta Distributed Availability Groups

About Me Lead DBA for a fortune 50 transportation company 30 years experience with Enterprise Development Worked with SQL Server since 2000 My article on SQLServerCentral.com on SQLDAG

What we will cover Availability group refresher The HA/DR solution
VM AlwaysOn Architecture Distributed availability groups – what is it? Configuring a SQLDAG solution Failover a SQLDAG Monitoring and health

Availability Groups Refresher
Requires a Windows Server Failover Clustering (WSFC) cluster. An instance of SQL Server must reside on a WSFC node Each availability replica of a given availability group must reside on a different node of the same WSFC cluster. Relies on WSFC to monitor and manage the current roles of the availability replicas Overall health is based on all nodes in the cluster Quorum configuration – node and disk majority/file share sys.dm_hadr_cluster sys.dm_hadr_cluster_members

Example Configuration (GEO Cluster)

Configuring Quorum

Availability Group Basics
Contains one or more databases Database must be in Full Recovery Model One Unit of Failover SQL Server Agent Jobs, Logins, Linked Servers do not fail over Create Availability Groups then add database

Feature Comparison Standard Edition (basic AG)
Enterprise Edition (advanced AG) 2 Replicas – Primary and Secondary 9 replicas – Primary and 8 Secondaries No readable secondary 8 readable secondaries Synchronous/Asynchronous 3 Synchronous (SQL 2016, 2017) 5 Synchronous (SQL 2019) 1 Availability Database Unlimited No secondary backup Secondary backup No distributed availability group Distributed availability group Basic AG simulates mirroring CREATE AVAILABILITY GROUP WITH BASIC

Thread Usage By Availability Groups
Single threaded based on database ID. Multiple replicas on an instance share a thread pool (3-10 threads) Uses up to 100 threads for parallel redo, if exceeds then goes to single redo thread Each database uses ½ total number of cores, max 16 per database

Thread Usage By Availability Groups
Each primary database uses 1 unshared Log Capture thread Each secondary database uses 1 Log Send thread

Primary Replica with 4 Secondaries
2012 support max 4 secondary replicas, 2016 supports up to 8 (one primary and 2 sych secondarys)

Configuring the HA/DR Solution
Purpose: Recovery time in same location Separate location in the event of failure WSFC Setup Cluster Validation; Quorum; Install stand-alone or failover cluster instance Enable AlwaysOn Availability Group for each SQL Server Service

Enable AlwaysOn

Creating the HA/DR Solution
Add service account login and grant connect to SQL Create endpoint Create availability group Join availability group Create database or restore to primary instance Create full/log backup Restore full/log backup to each secondary Join database to availability group USE [master] GO CREATE LOGIN [US\DataCtrSQLSvc] FROM WINDOWS WITH DEFAULT_DATABASE=[master] use [master] GRANT CONNECT SQL TO [US\DataCtrSQLSvc] --Use AES encryption, RC4 is deprecated CREATE ENDPOINT [hadr_endpoint] STATE=STARTED AS TCP (LISTENER_PORT = 5022, LISTENER_IP = ALL) FOR DATA_MIRRORING (ROLE = ALL, AUTHENTICATION = WINDOWS NEGOTIATE , ENCRYPTION = REQUIRED ALGORITHM AES) GRANT CONNECT ON ENDPOINT::[hadr_endpoint] TO [US\DataCtrSQLSvc]

VM AlwaysOn Architecture
Location A ESX Host Cluster 1 SQL VM1 SQL VM2 ESX Host Cluster 2 SQL AlwaysOn Distributed Availability Group Location B ESX Host Cluster 1 SQL VM3 SQL VM4 ESX Host Cluster 2 SQL AlwaysOn

SQL Distributed Availability Group (SQLDAG)
One of the new features in SQL2016 is the ability to distribute availability groups across clusters. This solution makes high availability and disaster recovery geographically dispersed. Distributed Availability Groups allows you to associate availability groups on two different Windows Server Failover Clusters.

SQL Distributed Availability Group
Two or more clusters Mix of standalone and FCIs Secondary cluster only knows that it is a secondary and does not know which is primary (DMV is coming for visibility with this) No GUI No Alerts USE [master] GO CREATE LOGIN [US\DataCtrSQLSvc] FROM WINDOWS WITH DEFAULT_DATABASE=[master] use [master] GRANT CONNECT SQL TO [US\DataCtrSQLSvc] --Use AES encryption, RC4 is deprecated CREATE ENDPOINT [hadr_endpoint] STATE=STARTED AS TCP (LISTENER_PORT = 5022, LISTENER_IP = ALL) FOR DATA_MIRRORING (ROLE = ALL, AUTHENTICATION = WINDOWS NEGOTIATE , ENCRYPTION = REQUIRED ALGORITHM AES) GRANT CONNECT ON ENDPOINT::[hadr_endpoint] TO [US\DataCtrSQLSvc]

Terminology Primary Cluster Primary Instance Secondary Cluster
Secondary Cluster Primary Instance Secondary Instance

Creating the HA/DR Solution using SQLDAG (…continued)
Add service account login and grant connect to SQL Create endpoints on all instances Create availability group on primary cluster Join availability group on primary cluster (may contain just one replica) Create availability group on secondary (primary instance) cluster Create database or restore to primary instance Create full/log backup Restore full/log backup to each secondary including secondary cluster(s) USE [master] GO CREATE LOGIN [US\DataCtrSQLSvc] FROM WINDOWS WITH DEFAULT_DATABASE=[master] use [master] GRANT CONNECT SQL TO [US\DataCtrSQLSvc] --Use AES encryption, RC4 is deprecated CREATE ENDPOINT [hadr_endpoint] STATE=STARTED AS TCP (LISTENER_PORT = 5022, LISTENER_IP = ALL) FOR DATA_MIRRORING (ROLE = ALL, AUTHENTICATION = WINDOWS NEGOTIATE , ENCRYPTION = REQUIRED ALGORITHM AES) GRANT CONNECT ON ENDPOINT::[hadr_endpoint] TO [US\DataCtrSQLSvc] ALTER AVAILABILITY GROUP AG_1 GRANT CREATE ANY DATABASE (for automatic seeding)

Now for the FUN PART! WITH (DISTRIBUTED)
Go to the Primary Cluster Primary Instance and create SQLDAG CREATE AVAILABILITY GROUP [distributedag] WITH (DISTRIBUTED) AVAILABILITY GROUP ON 'AG_1' WITH ( LISTENER_URL = 'tcp://<virtualname>:5022', --Use listener name when there is a standalone AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT, FAILOVER_MODE = MANUAL ), 'AG_2' WITH LISTENER_URL = 'tcp://<virtualname>:5022', --Use SQLVNN not listener name when there is an FCI --check port to make sure it is 5022/5023/5024 etc ); GO USE [master] GO CREATE LOGIN [US\DataCtrSQLSvc] FROM WINDOWS WITH DEFAULT_DATABASE=[master] use [master] GRANT CONNECT SQL TO [US\DataCtrSQLSvc] --Use AES encryption, RC4 is deprecated CREATE ENDPOINT [hadr_endpoint] STATE=STARTED AS TCP (LISTENER_PORT = 5022, LISTENER_IP = ALL) FOR DATA_MIRRORING (ROLE = ALL, AUTHENTICATION = WINDOWS NEGOTIATE , ENCRYPTION = REQUIRED ALGORITHM AES) GRANT CONNECT ON ENDPOINT::[hadr_endpoint] TO [US\DataCtrSQLSvc]

Now for the FUN PART…again!
Go to the Secondary Cluster Primary Instance and join SQLDAG CREATE AVAILABILITY GROUP [distributedag] JOIN AVAILABILITY GROUP ON 'AG_1' WITH ( LISTENER_URL = 'tcp://<virtualname>:5022', --Use listener name when there is a standalone AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT, FAILOVER_MODE = MANUAL ), 'AG_2' WITH LISTENER_URL = 'tcp://<virtualname>:5022', --Use SQLVNN not listener name when there is an FCI --check port to make sure it is 5022/5023/5024 etc ); GO USE [master] GO CREATE LOGIN [US\DataCtrSQLSvc] FROM WINDOWS WITH DEFAULT_DATABASE=[master] use [master] GRANT CONNECT SQL TO [US\DataCtrSQLSvc] --Use AES encryption, RC4 is deprecated CREATE ENDPOINT [hadr_endpoint] STATE=STARTED AS TCP (LISTENER_PORT = 5022, LISTENER_IP = ALL) FOR DATA_MIRRORING (ROLE = ALL, AUTHENTICATION = WINDOWS NEGOTIATE , ENCRYPTION = REQUIRED ALGORITHM AES) GRANT CONNECT ON ENDPOINT::[hadr_endpoint] TO [US\DataCtrSQLSvc]

Now Let’s Get Synchronized
ALTER DATABASE D000DB1 SET HADR AVAILABILITY GROUP = xxxx Then WAIT…. Check the health of the distributed availability group to see if all replicas are synchronizing select r.replica_server_name, r.endpoint_url, rs.connected_state_desc, rs.last_connect_error_description, rs.last_connect_error_number, rs.last_connect_error_timestamp from sys.dm_hadr_availability_replica_states rs join sys.availability_replicas r on rs.replica_id=r.replica_id where rs.is_local=1 There is no GUI for SQLDAG so you will need to rely on DMVs USE [master] GO CREATE LOGIN [US\DataCtrSQLSvc] FROM WINDOWS WITH DEFAULT_DATABASE=[master] use [master] GRANT CONNECT SQL TO [US\DataCtrSQLSvc] --Use AES encryption, RC4 is deprecated CREATE ENDPOINT [hadr_endpoint] STATE=STARTED AS TCP (LISTENER_PORT = 5022, LISTENER_IP = ALL) FOR DATA_MIRRORING (ROLE = ALL, AUTHENTICATION = WINDOWS NEGOTIATE , ENCRYPTION = REQUIRED ALGORITHM AES) GRANT CONNECT ON ENDPOINT::[hadr_endpoint] TO [US\DataCtrSQLSvc] ALTER AVAILABILITY GROUP AG_1 GRANT CREATE ANY DATABASE (for automatic seeding)

Demo 3 - AG to DAG to Create DB.mp4 4 - Establish Replication.mp4 5 - Replication the Right Way.mp4 6 - Test.mp4 USE [master] GO CREATE LOGIN [US\DataCtrSQLSvc] FROM WINDOWS WITH DEFAULT_DATABASE=[master] use [master] GRANT CONNECT SQL TO [US\DataCtrSQLSvc] --Use AES encryption, RC4 is deprecated CREATE ENDPOINT [hadr_endpoint] STATE=STARTED AS TCP (LISTENER_PORT = 5022, LISTENER_IP = ALL) FOR DATA_MIRRORING (ROLE = ALL, AUTHENTICATION = WINDOWS NEGOTIATE , ENCRYPTION = REQUIRED ALGORITHM AES) GRANT CONNECT ON ENDPOINT::[hadr_endpoint] TO [US\DataCtrSQLSvc] ALTER AVAILABILITY GROUP AG_1 GRANT CREATE ANY DATABASE (for automatic seeding)

Gathering Information About Availability Groups
dm_hadr_automatic_seeding dm_hadr_physical_seeding_stats dm_hadr_availability_replica_cluster_nodes dm_hadr_availability_group_states dm_hadr_availability_replica_states USE [master] GO CREATE LOGIN [US\DataCtrSQLSvc] FROM WINDOWS WITH DEFAULT_DATABASE=[master] use [master] GRANT CONNECT SQL TO [US\DataCtrSQLSvc] --Use AES encryption, RC4 is deprecated CREATE ENDPOINT [hadr_endpoint] STATE=STARTED AS TCP (LISTENER_PORT = 5022, LISTENER_IP = ALL) FOR DATA_MIRRORING (ROLE = ALL, AUTHENTICATION = WINDOWS NEGOTIATE , ENCRYPTION = REQUIRED ALGORITHM AES) GRANT CONNECT ON ENDPOINT::[hadr_endpoint] TO [US\DataCtrSQLSvc] ALTER AVAILABILITY GROUP AG_1 GRANT CREATE ANY DATABASE (for automatic seeding)

Manual Failover of SQLDAG
ALTER AVAILABILITY GROUP of SQLDAG (secondary cluster) to SYNCHRONOUS_COMMIT Verify from dm_hadr_database_replica_states that status is SYNCHRONIZED and end_of_log_lsn matches from primarys in both clusters On instance that hosts primary SQLDAG, set SQLDAG role to Secondary ALTER AVAILABILITY GROUP xxx SET (ROLE = SECONDARY) At this point the SQLDAG is unavailable Verify again readiness to failover Issue failover – ALTER AVAILABILITY GROUP xxx FORCE_FAILOVER_ALLOW_DATA_LOSS After this step the SQLDAG is available again Set SQLDAG back to ASYNCHRONOUS_COMMIT USE [master] GO CREATE LOGIN [US\DataCtrSQLSvc] FROM WINDOWS WITH DEFAULT_DATABASE=[master] use [master] GRANT CONNECT SQL TO [US\DataCtrSQLSvc] --Use AES encryption, RC4 is deprecated CREATE ENDPOINT [hadr_endpoint] STATE=STARTED AS TCP (LISTENER_PORT = 5022, LISTENER_IP = ALL) FOR DATA_MIRRORING (ROLE = ALL, AUTHENTICATION = WINDOWS NEGOTIATE , ENCRYPTION = REQUIRED ALGORITHM AES) GRANT CONNECT ON ENDPOINT::[hadr_endpoint] TO [US\DataCtrSQLSvc] ALTER AVAILABILITY GROUP AG_1 GRANT CREATE ANY DATABASE (for automatic seeding)

Configuration Example
2 sets of clustered VMs Availability group in each VM cluster Automatic Failover Synchronous Distributed availability group between each cluster set ESX host cluster A in data center 1 ESX host cluster B in data center 1 ESX host cluster C in data center 2 ESX host cluster D in data center 2 VM1 on ESX host cluster A VM2 on ESX host cluster B VM3 on ESX host cluster C VM4 on ESX host cluster D Cluster VM1 and VM2 – Cluster X Cluster VM3 and VM4 – Cluster Y VM1 and VM2 set up with AG with automatic failover and synchronous VM3 and VM4 set up with AG with automatic failover and synchronous SQLDAG between cluster X and cluster Y

Monitor SQLDAG Replication Health State
SELECT * FROM (SELECT… FROM sys.availability_replicas r LEFT JOIN sys.availability_groups ds ON r.group_id = ds.group_id WHERE ds.name = '<AGName>' ) a inner join ( SELECT … FROM sys.availability_groups ag JOIN sys.dm_hadr_availability_group_states as ags ON ag.group_id=ags.group_id LEFT JOIN sys.dm_hadr_database_replica_states ds ON ds.group_id = ag.group_id ) b ON b.availability_group = a.replica_server_name select CAST(r.replica_id as VARCHAR(36)) as replica_id, CAST(r.group_id as VARCHAR(36)) as group_id, replica_server_name, endpoint_url, ISNULL(ds.is_distributed,0) as is_distributed, ds.name from sys.availability_replicas r left join sys.availability_groups ds on r.group_id = ds.group_id where ds.name = '<AGName>' ) a inner join ( select availability_group=cast(ag.name as varchar(30)), primary_replica=cast(ags.primary_replica as varchar(30)), primary_recovery_health_desc=cast(ags.primary_recovery_health_desc as varchar(30)), synchronization_health_desc=cast(ags.synchronization_health_desc as varchar(30)), CAST(ag.group_id as VARCHAR(36)) as group_id , CAST(ag.resource_id as VARCHAR(36)) as resource_id, CAST(ag.resource_group_id as VARCHAR(36)) as resource_group_id, ag.failure_condition_level, ag.health_check_timeout, ag.is_distributed, ds.recovery_lsn, ds.truncation_lsn, CAST(ISNULL(ds.last_sent_lsn,0) AS VARCHAR) last_sent_lsn, ISNULL(ds.last_sent_time,0) as last_sent_time, CAST(ISNULL(ds.last_received_lsn,0) AS VARCHAR) as last_received_lsn, ISNULL(ds.last_received_time,0) as last_received_time, CAST(ISNULL(ds.last_hardened_lsn,0) AS VARCHAR) as last_hardened_lsn, ISNULL(ds.last_hardened_time,0) as last_hardened_time, ds.last_redone_lsn, ds.last_redone_time, ds.log_send_queue_size, CAST(ISNULL(ds.log_send_rate,0) AS VARCHAR) as log_send_rate, ds.redo_queue_size, CAST(ISNULL(ds.redo_rate,0) AS VARCHAR) as redo_rate, CAST(ds.end_of_log_lsn AS VARCHAR) as end_of_log_lsn, CAST(ds.last_commit_lsn AS VARCHAR) as last_commit_lsn, ds.last_commit_time, automated_backup_preference_desc=cast(ag.automated_backup_preference_desc as varchar(10)) from sys.availability_groups ag join sys.dm_hadr_availability_group_states ags on ag.group_id=ags.group_id left join sys.dm_hadr_database_replica_states ds on ds.group_id = ag.group_id ) b on b.availability_group = a.replica_server_name

Health State Metric Collection timestamp, LSN information and Health State

Health State LSN Hardened, RedoRate, LSN Commit.

Alert on Unhealthy SQLDAG State
Record health state with query in a set interval using a job (5 minutes is a suggestion). Use a job to check latest health state in comparison to prior health state Repeated unhealthy SQLDAG replication to trigger alerting mechanism (such as ).

Read-Scale Availability Groups
Does not depend on any clustering technology Not for HA or DR, just for synchronization, no WSFC Single SQLDAG can have up to 17 readable secondaries. Availability group syntax: CLUSTER_TYPE = NONE

Worth Mentioning SQL2017 Read-scale availability groups
Not for HA or DR, just for synchronization, no WSFC The official abbreviation for distributed availability groups is not DAG. DAG is used for Exchange Database Availability Groups. Automatic Seeding – folder structure matters

Questions

Thank You

Distributed Availability Groups

Similar presentations

Presentation on theme: "Distributed Availability Groups"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Distributed Availability Groups

Similar presentations

Presentation on theme: "Distributed Availability Groups"— Presentation transcript:

Similar presentations

About project

Feedback