Introduction
When it is not possible to have SQL Server Enterprise and there is a need to set up an AlwaysOn cluster, it is possible to use SQL Server Standard and implement basic availability groups.
However, the Standard edition of SQL Server comes with certain limitations when it comes to availability groups.
Here are some of these limitations
- Limited to 2 replicas
- No read-only access on the secondary replica
- No possibility to perform backups on the secondary
- 1 database per availability group
However some clients come with the following constraints
- Install an application that deploys and installs multiple databases
- Use SQL Server Standard
- Use only one listener, as the applications will only use one listener to connect to the primary replica
Requirements
- 1 database = 1 availability group = 0 or 1 listener
- We need to install multiple databases
- Our applications are configured to use only one listener (in their connection string), such as SharePoint, EasyVista, Business Central, M3…
- Therefore we will need to create multiple availability groups in order to benefit from high availability
Here are the questions that arise
If I need to install 10 databases for my application, this requires creating 10 availability groups and therefore 0 or several listeners.
- How can I ensure that all databases are in high availability with only one listener ?
- How can I have only one listener for our applications, multiple availability groups, and one database per availability group ?
- How can this limitation be overcome ?
Here is a summary of the solution
- We create our cluster
- We enable the AlwaysOn feature on all replicas
- We create our availability groups
- 1 availability group per database. Each of these availability groups will not contain any listener.
- 1 availability group containing a “monitoring” database and a listener
- When this availability group fails over, all other availability groups fail over as well. This is the “reference” database. All availability groups must share the same primary replica as the availability group that contains the “monitoring” database
Use cases
Case 1
The availability groups share the same primary replica :
Case 2
The availability groups do not share the same primary replica :
Here are some details about the environment
Virtual machines :
Hostname | IP | Domain | Function | OS |
ad | 192.168.2.10 | lab.local | Domain controler | Windows Server 2022 |
sql-1 | 192.168.2.11 | lab.local | Replica 1 | Windows Server 2022 |
sql-2 | 192.168.2.12 | lab.local | Replica 2 | Windows Server 2022 |
Failover cluster details
- Failover cluster name : clu-1.lab.local
- IP : 192.168.2.13/24
SQL Server details
Replica | IP | SQL Server version |
sql-1 | 192.168.2.11/24 | SQL Server 2022 Standard |
sql-2 | 192.168.2.12/24 | SQL Server 2022 Standard |
AlwaysOn details
AG name | Endpoint port | Port | Listener | IP |
ag-monitoring | 5022 | 1433 | lst-1 | 192.168.2.14/24 |
ag-t1 | 5022 | N/A | N/A | N/A |
ag-t2 | 5022 | N/A | N/A | N/A |
ag-t3 | 5022 | N/A | N/A | N/A |
ag-t4 | 5022 | N/A | N/A | N/A |
Global architecture
Our SQL Server instances
We integrate our databases to the availability groups and we check their states
SELECT
dbs.name as dbs_name,
dhdrs.synchronization_state_desc,
ar.replica_server_name
FROM sys.dm_hadr_database_replica_states AS dhdrs
INNER JOIN sys.availability_replicas AS ar
ON dhdrs.replica_id = ar.replica_id
INNER JOIN sys.databases AS dbs
ON dhdrs.database_id = dbs.database_id
ORDER BY ar.replica_server_name ASC
We check the availability groups configuration
SELECT
ar.replica_server_name,
ar.availability_mode_desc,
ag.[name] AS dbs_name,
dhars.role_desc,
CASE
WHEN ag.basic_features = 1 THEN 'Basic AG'
ELSE 'Not Basic AG'
END AS ag_type
FROM sys.availability_replicas AS ar
INNER JOIN sys.availability_groups AS ag
ON ar.group_id = ag.group_id
INNER JOIN sys.dm_hadr_availability_replica_states AS dhars
ON ar.replica_id = dhars.replica_id
Solutions
We present two solutions to ensure that the availability groups share the same primary replica:
- Solution 1: a job that runs every 5 minutes and checks if the availability groups share the same replica. If they do not, a failover is performed. The “reference” availability group in our case is the group named “ag-monitoring” The other availability groups must share the same primary replica.
- Solution 2: an alert and a job that perform the following tasks:
- The alert is triggered when a failover is detected. The alert then calls a job.
- This job is divided into 3 steps:
- Save the failover date
- Wait for a certain duration before checking the state of the availability groups (the failover may take some time)
- Check the configuration of the availability groups (what their primary replica is)
- If the availability groups ag-t1, ag-t2, ag-t3, and ag-t4 do not share the same primary replica, a failover is performed
- This job is divided into 3 steps:
- The alert is triggered when a failover is detected. The alert then calls a job.
Implementation
Solution 1
The job is deployed on both replicas. We check if the database is open for read/write before performing any operations.
Job configuration :
Code
IF ((SELECT DATABASEPROPERTYEX('monitoring', 'Updateability')) = 'READ_WRITE')
BEGIN
DECLARE @replica_server NVARCHAR(100),
@ag_name NVARCHAR(100),
@failover_command NVARCHAR(100)
DECLARE cursor_failover_commands CURSOR FOR
WITH T_AG_Monitoring (replica_server, ag_name, role_name)
AS
(
SELECT
ar.replica_server_name,
ag.[name] as ag_name,
dhars.role_desc
FROM sys.availability_replicas AS ar
INNER JOIN sys.availability_groups AS ag
ON ar.group_id = ag.group_id
LEFT JOIN sys.dm_hadr_availability_replica_states as dhars
ON ar.replica_id = dhars.replica_id
WHERE ag.[name] = 'ag-monitoring'
),
T_AG_Databases (replica_server, ag_name, role_name)
AS
(
SELECT
ar.replica_server_name,
ag.[name] as ag_name,
dhars.role_desc
FROM sys.availability_replicas AS ar
INNER JOIN sys.availability_groups AS ag
ON ar.group_id = ag.group_id
LEFT JOIN sys.dm_hadr_availability_replica_states as dhars
ON ar.replica_id = dhars.replica_id
WHERE ag.[name] <> 'ag-monitoring'
) SELECT
T_AG_Databases.replica_server,
T_AG_Databases.ag_name,
CASE
WHEN T_AG_Databases.role_name IS NULL THEN 'ALTER AVAILABILITY GROUP [' + T_AG_Databases.ag_name + '] FAILOVER;'
END AS failover_command
FROM T_AG_Monitoring
RIGHT JOIN T_AG_Databases
ON T_AG_Monitoring.replica_server = T_AG_Databases.replica_server
AND T_AG_Monitoring.role_name = T_AG_Databases.role_name
OPEN cursor_failover_commands
FETCH cursor_failover_commands INTO @replica_server, @ag_name, @failover_command
WHILE @@FETCH_STATUS = 0
BEGIN
IF (LEN(@failover_command) >= 1)
BEGIN
EXEC(@failover_command);
END
FETCH cursor_failover_commands INTO @replica_server, @ag_name, @failover_command
END
CLOSE cursor_failover_commands
DEALLOCATE cursor_failover_commands
END
Solution 2
The alert and the job are present on both replicas.
Alert configuration:
Job configuration:
Code :
USE
GO
BEGIN TRANSACTION
DECLARE @ReturnCode INT
SELECT @ReturnCode = 0
IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'[Uncategorized (Local)]' AND category_class=1)
BEGIN
EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'[Uncategorized (Local)]'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
END
DECLARE @jobId BINARY(16)
EXEC @ReturnCode = msdb.dbo.sp_add_job @job_name=N'ag_monitoring_s2',
@enabled=0,
@notify_level_eventlog=0,
@notify_level_email=0,
@notify_level_netsend=0,
@notify_level_page=0,
@delete_level=0,
@description=N'No description available.',
@category_name=N'[Uncategorized (Local)]',
@owner_login_name=N'sa', @job_id = @jobId OUTPUT
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'ag_monitoring',
@step_id=1,
@cmdexec_success_code=0,
@on_success_action=3,
@on_success_step_id=0,
@on_fail_action=2,
@on_fail_step_id=0,
@retry_attempts=1,
@retry_interval=1,
@os_run_priority=0, @subsystem=N'TSQL',
@command=N'IF ((SELECT DATABASEPROPERTYEX(''monitoring'', ''Updateability'')) = ''READ_WRITE'')
BEGIN
MERGE monitoring.dbo.t_tracefailover_tfr AS t_target
USING (
SELECT
id,
last_occurrence_date,
last_occurrence_time
FROM msdb.dbo.sysalerts
) AS t_source
ON t_source.id = t_target.tfr_id
WHEN NOT MATCHED THEN
INSERT (tfr_id, tfr_last_occurence_date, tfr_last_occurence_time) VALUES (t_source.id, t_source.last_occurrence_date, t_source.last_occurrence_time)
WHEN MATCHED THEN
UPDATE SET t_target.tfr_last_occurence_date = t_source.last_occurrence_date, t_target.tfr_last_occurence_time = t_source.last_occurrence_time;
END
ELSE
BEGIN
PRINT ''Secondary replica'';
END',
@database_name=N'master',
@flags=0
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'ag_wait_before_check',
@step_id=2,
@cmdexec_success_code=0,
@on_success_action=3,
@on_success_step_id=0,
@on_fail_action=2,
@on_fail_step_id=0,
@retry_attempts=0,
@retry_interval=0,
@os_run_priority=0, @subsystem=N'TSQL',
@command=N'BEGIN
WAITFOR DELAY ''00:01'';
END;',
@database_name=N'master',
@flags=0
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'ag_check_state',
@step_id=3,
@cmdexec_success_code=0,
@on_success_action=1,
@on_success_step_id=0,
@on_fail_action=2,
@on_fail_step_id=0,
@retry_attempts=0,
@retry_interval=0,
@os_run_priority=0, @subsystem=N'TSQL',
@command=N'IF ((SELECT DATABASEPROPERTYEX(''monitoring'', ''Updateability'')) = ''READ_WRITE'')
BEGIN
DECLARE @replica_server NVARCHAR(100),
@ag_name NVARCHAR(100),
@failover_command NVARCHAR(100)
DECLARE cursor_failover_commands CURSOR FOR
WITH T_AG_Monitoring (replica_server, ag_name, role_name)
AS
(
SELECT
ar.replica_server_name,
ag.[name] as ag_name,
dhars.role_desc
FROM sys.availability_replicas AS ar
INNER JOIN sys.availability_groups AS ag
ON ar.group_id = ag.group_id
LEFT JOIN sys.dm_hadr_availability_replica_states as dhars
ON ar.replica_id = dhars.replica_id
WHERE ag.[name] = ''ag-monitoring''
),
T_AG_Databases (replica_server, ag_name, role_name)
AS
(
SELECT
ar.replica_server_name,
ag.[name] as ag_name,
dhars.role_desc
FROM sys.availability_replicas AS ar
INNER JOIN sys.availability_groups AS ag
ON ar.group_id = ag.group_id
LEFT JOIN sys.dm_hadr_availability_replica_states as dhars
ON ar.replica_id = dhars.replica_id
WHERE ag.[name] <> ''ag-monitoring''
) SELECT
T_AG_Databases.replica_server,
T_AG_Databases.ag_name,
CASE
WHEN T_AG_Databases.role_name IS NULL THEN ''ALTER AVAILABILITY GROUP ['' + T_AG_Databases.ag_name + ''] FAILOVER;''
END AS failover_command
FROM T_AG_Monitoring
RIGHT JOIN T_AG_Databases
ON T_AG_Monitoring.replica_server = T_AG_Databases.replica_server
AND T_AG_Monitoring.role_name = T_AG_Databases.role_name
OPEN cursor_failover_commands
FETCH cursor_failover_commands INTO @replica_server, @ag_name, @failover_command
WHILE @@FETCH_STATUS = 0
BEGIN
IF (LEN(@failover_command) >= 1)
BEGIN
PRINT ''Failover'';
EXEC(@failover_command);
END
FETCH cursor_failover_commands INTO @replica_server, @ag_name, @failover_command
END
CLOSE cursor_failover_commands
DEALLOCATE cursor_failover_commands
END',
@database_name=N'master',
@flags=0
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
COMMIT TRANSACTION
GOTO EndSave
QuitWithRollback:
IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION
EndSave:
GO
First solution test :
Initial configuration : the availability groups share the same primary replica.
We perform a failover of the availability group ag-t1.
We check the configuration:
The job is executing:
We check the configuration:
Conclusion
- Solution 1 has 2 drawbacks:
- The job could potentially be triggered during a failover
- The job runs every 5 minutes. Therefore, in the worst case, there could be a 5-minute delay in aligning the availability groups
- Solution 2 is more complex and has the following drawbacks:
- The current backup history is a copy of the dbo.sysalerts table. As a result, the data in this table does not use the GETDATE() function. We could add a column to store the date of the last failover
- The job can be executed multiple times because it is triggered by an alert (alt_failover)
It is also important to verify data synchronization before performing a failover.
Thank you, Amine Haloui.
Carel
25.10.2024Thanks for the post showing your solution.
I do think that it is kind of sad that MS still has the 1:1 ratio between AG and DB that would have been great and helped most people to have a simpler configuration to manage the high availability.