Introduction

When it is not possible to have SQL Server Enterprise and there is a need to set up an AlwaysOn cluster, it is possible to use SQL Server Standard and implement basic availability groups.

However, the Standard edition of SQL Server comes with certain limitations when it comes to availability groups.

Here are some of these limitations

  • Limited to 2 replicas
  • No read-only access on the secondary replica
  • No possibility to perform backups on the secondary
  • 1 database per availability group

However some clients come with the following constraints

  • Install an application that deploys and installs multiple databases
  • Use SQL Server Standard
  • Use only one listener, as the applications will only use one listener to connect to the primary replica

Requirements

  • 1 database = 1 availability group = 0 or 1 listener
  • We need to install multiple databases
  • Our applications are configured to use only one listener (in their connection string), such as SharePoint, EasyVista, Business Central, M3…
  • Therefore we will need to create multiple availability groups in order to benefit from high availability

Here are the questions that arise

If I need to install 10 databases for my application, this requires creating 10 availability groups and therefore 0 or several listeners.

  • How can I ensure that all databases are in high availability with only one listener ?
  • How can I have only one listener for our applications, multiple availability groups, and one database per availability group ?
  • How can this limitation be overcome ?

Here is a summary of the solution

  • We create our cluster
  • We enable the AlwaysOn feature on all replicas
  • We create our availability groups
    • 1 availability group per database. Each of these availability groups will not contain any listener.
    • 1 availability group containing a “monitoring” database and a listener
      • When this availability group fails over, all other availability groups fail over as well. This is the “reference” database. All availability groups must share the same primary replica as the availability group that contains the “monitoring” database

Use cases

Case 1

The availability groups share the same primary replica :

Case 2

The availability groups do not share the same primary replica :

Here are some details about the environment

Virtual machines :

HostnameIPDomainFunctionOS
ad192.168.2.10lab.localDomain controlerWindows Server 2022
sql-1192.168.2.11lab.localReplica 1Windows Server 2022
sql-2192.168.2.12lab.localReplica 2Windows Server 2022

Failover cluster details

  • Failover cluster name : clu-1.lab.local
  • IP : 192.168.2.13/24

SQL Server details

ReplicaIPSQL Server version
sql-1192.168.2.11/24SQL Server 2022 Standard
sql-2192.168.2.12/24SQL Server 2022 Standard

AlwaysOn details

AG nameEndpoint portPortListenerIP
ag-monitoring50221433lst-1192.168.2.14/24
ag-t15022N/AN/AN/A
ag-t25022N/AN/AN/A
ag-t35022N/AN/AN/A
ag-t45022N/AN/AN/A

Global architecture

Our SQL Server instances

We integrate our databases to the availability groups and we check their states

SELECT
	dbs.name as dbs_name,
	dhdrs.synchronization_state_desc,
	ar.replica_server_name
	FROM sys.dm_hadr_database_replica_states AS dhdrs
		INNER JOIN sys.availability_replicas AS ar
			ON dhdrs.replica_id = ar.replica_id
		INNER JOIN sys.databases AS dbs
			ON dhdrs.database_id = dbs.database_id
ORDER BY ar.replica_server_name ASC

We check the availability groups configuration

SELECT
	ar.replica_server_name,
	ar.availability_mode_desc,
	ag.[name] AS dbs_name,
	dhars.role_desc,
	CASE
		WHEN ag.basic_features = 1 THEN 'Basic AG'
		ELSE 'Not Basic AG'
	END AS ag_type
	FROM sys.availability_replicas AS ar
		INNER JOIN sys.availability_groups AS ag
			ON ar.group_id = ag.group_id
		INNER JOIN sys.dm_hadr_availability_replica_states AS dhars
ON ar.replica_id = dhars.replica_id

Solutions

We present two solutions to ensure that the availability groups share the same primary replica:

  • Solution 1: a job that runs every 5 minutes and checks if the availability groups share the same replica. If they do not, a failover is performed. The “reference” availability group in our case is the group named “ag-monitoring” The other availability groups must share the same primary replica.
  • Solution 2: an alert and a job that perform the following tasks:
    • The alert is triggered when a failover is detected. The alert then calls a job.
      • This job is divided into 3 steps:
        • Save the failover date
        • Wait for a certain duration before checking the state of the availability groups (the failover may take some time)
        • Check the configuration of the availability groups (what their primary replica is)
          • If the availability groups ag-t1, ag-t2, ag-t3, and ag-t4 do not share the same primary replica, a failover is performed

Implementation

Solution 1

The job is deployed on both replicas. We check if the database is open for read/write before performing any operations.

Job configuration :

Code

IF ((SELECT DATABASEPROPERTYEX('monitoring', 'Updateability')) = 'READ_WRITE')
BEGIN
DECLARE @replica_server NVARCHAR(100),
		@ag_name NVARCHAR(100),
		@failover_command NVARCHAR(100)

DECLARE cursor_failover_commands CURSOR FOR
   WITH T_AG_Monitoring (replica_server, ag_name, role_name)
AS
(
SELECT
	ar.replica_server_name,
	ag.[name] as ag_name,
	dhars.role_desc

	FROM sys.availability_replicas AS ar
		INNER JOIN sys.availability_groups AS ag
			ON ar.group_id = ag.group_id
		LEFT JOIN sys.dm_hadr_availability_replica_states as dhars
			ON ar.replica_id = dhars.replica_id
	
	WHERE ag.[name] = 'ag-monitoring'
),
T_AG_Databases (replica_server, ag_name, role_name)
AS
(
	SELECT
	ar.replica_server_name,
	ag.[name] as ag_name,
	dhars.role_desc

	FROM sys.availability_replicas AS ar
		INNER JOIN sys.availability_groups AS ag
			ON ar.group_id = ag.group_id
		LEFT JOIN sys.dm_hadr_availability_replica_states as dhars
			ON ar.replica_id = dhars.replica_id
	
	WHERE ag.[name] <> 'ag-monitoring'
) SELECT
	T_AG_Databases.replica_server,
	T_AG_Databases.ag_name,
	CASE
		WHEN T_AG_Databases.role_name IS NULL THEN 'ALTER AVAILABILITY GROUP [' + T_AG_Databases.ag_name + '] FAILOVER;'
	END AS failover_command
	FROM T_AG_Monitoring
		RIGHT JOIN T_AG_Databases
			ON T_AG_Monitoring.replica_server = T_AG_Databases.replica_server 
			AND T_AG_Monitoring.role_name = T_AG_Databases.role_name

OPEN cursor_failover_commands
FETCH cursor_failover_commands INTO @replica_server, @ag_name, @failover_command
 
WHILE @@FETCH_STATUS = 0
BEGIN
	IF (LEN(@failover_command) >= 1)
	BEGIN
		EXEC(@failover_command);	
	END
    FETCH cursor_failover_commands INTO @replica_server, @ag_name, @failover_command
END
 
CLOSE cursor_failover_commands
DEALLOCATE cursor_failover_commands
END

Solution 2

The alert and the job are present on both replicas.

Alert configuration:

Job configuration:

Code :

USE
GO

BEGIN TRANSACTION
DECLARE @ReturnCode INT
SELECT @ReturnCode = 0

IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'[Uncategorized (Local)]' AND category_class=1)

BEGIN
EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'[Uncategorized (Local)]'

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
END

DECLARE @jobId BINARY(16)

EXEC @ReturnCode =  msdb.dbo.sp_add_job @job_name=N'ag_monitoring_s2',

@enabled=0,

@notify_level_eventlog=0,
@notify_level_email=0,
@notify_level_netsend=0,
@notify_level_page=0,
@delete_level=0,
@description=N'No description available.',
@category_name=N'[Uncategorized (Local)]',

@owner_login_name=N'sa', @job_id = @jobId OUTPUT

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'ag_monitoring',
@step_id=1,
@cmdexec_success_code=0,
@on_success_action=3,
@on_success_step_id=0,
@on_fail_action=2,
@on_fail_step_id=0,
@retry_attempts=1,
@retry_interval=1,
@os_run_priority=0, @subsystem=N'TSQL',

@command=N'IF ((SELECT DATABASEPROPERTYEX(''monitoring'', ''Updateability'')) = ''READ_WRITE'')

BEGIN

MERGE monitoring.dbo.t_tracefailover_tfr AS t_target
USING (
SELECT
id,
last_occurrence_date,
last_occurrence_time
FROM msdb.dbo.sysalerts

) AS t_source

ON t_source.id = t_target.tfr_id

WHEN NOT MATCHED THEN

INSERT (tfr_id, tfr_last_occurence_date, tfr_last_occurence_time) VALUES (t_source.id, t_source.last_occurrence_date, t_source.last_occurrence_time)

WHEN MATCHED THEN

UPDATE SET t_target.tfr_last_occurence_date = t_source.last_occurrence_date, t_target.tfr_last_occurence_time = t_source.last_occurrence_time;

END
ELSE

BEGIN
PRINT ''Secondary replica'';
END',

@database_name=N'master',

@flags=0

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'ag_wait_before_check',

@step_id=2,

@cmdexec_success_code=0,
@on_success_action=3,
@on_success_step_id=0,
@on_fail_action=2,
@on_fail_step_id=0,
@retry_attempts=0,
@retry_interval=0,
@os_run_priority=0, @subsystem=N'TSQL',

@command=N'BEGIN 
    WAITFOR DELAY ''00:01''; 
END;',

@database_name=N'master',
@flags=0

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'ag_check_state',

@step_id=3,
@cmdexec_success_code=0,
@on_success_action=1,
@on_success_step_id=0,
@on_fail_action=2,
@on_fail_step_id=0,
@retry_attempts=0,
@retry_interval=0,
@os_run_priority=0, @subsystem=N'TSQL',

@command=N'IF ((SELECT DATABASEPROPERTYEX(''monitoring'', ''Updateability'')) = ''READ_WRITE'')

BEGIN

DECLARE @replica_server NVARCHAR(100),
@ag_name NVARCHAR(100),
@failover_command NVARCHAR(100)

DECLARE cursor_failover_commands CURSOR FOR
   WITH T_AG_Monitoring (replica_server, ag_name, role_name)
AS

(
SELECT
ar.replica_server_name,
ag.[name] as ag_name,
dhars.role_desc
FROM sys.availability_replicas AS ar
INNER JOIN sys.availability_groups AS ag
ON ar.group_id = ag.group_id
LEFT JOIN sys.dm_hadr_availability_replica_states as dhars
ON ar.replica_id = dhars.replica_id
WHERE ag.[name] = ''ag-monitoring''
),

T_AG_Databases (replica_server, ag_name, role_name)

AS
(

SELECT
ar.replica_server_name,
ag.[name] as ag_name,
dhars.role_desc
FROM sys.availability_replicas AS ar
INNER JOIN sys.availability_groups AS ag
ON ar.group_id = ag.group_id
LEFT JOIN sys.dm_hadr_availability_replica_states as dhars
ON ar.replica_id = dhars.replica_id
WHERE ag.[name] <> ''ag-monitoring''

) SELECT
T_AG_Databases.replica_server,
T_AG_Databases.ag_name,
CASE
WHEN T_AG_Databases.role_name IS NULL THEN ''ALTER AVAILABILITY GROUP ['' + T_AG_Databases.ag_name + ''] FAILOVER;''
END AS failover_command

FROM T_AG_Monitoring

RIGHT JOIN T_AG_Databases
ON T_AG_Monitoring.replica_server = T_AG_Databases.replica_server
AND T_AG_Monitoring.role_name = T_AG_Databases.role_name

OPEN cursor_failover_commands
FETCH cursor_failover_commands INTO @replica_server, @ag_name, @failover_command
WHILE @@FETCH_STATUS = 0

BEGIN

IF (LEN(@failover_command) >= 1)

BEGIN
PRINT ''Failover'';
EXEC(@failover_command);        

END
    FETCH cursor_failover_commands INTO @replica_server, @ag_name, @failover_command
END

CLOSE cursor_failover_commands
DEALLOCATE cursor_failover_commands
END',

@database_name=N'master',

@flags=0

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

COMMIT TRANSACTION

GOTO EndSave

QuitWithRollback:

    IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION

EndSave:

GO

First solution test :

Initial configuration : the availability groups share the same primary replica.

We perform a failover of the availability group ag-t1.

We check the configuration:

The job is executing:

We check the configuration:

Conclusion

  • Solution 1 has 2 drawbacks:
    • The job could potentially be triggered during a failover
    • The job runs every 5 minutes. Therefore, in the worst case, there could be a 5-minute delay in aligning the availability groups

  • Solution 2 is more complex and has the following drawbacks:
    • The current backup history is a copy of the dbo.sysalerts table. As a result, the data in this table does not use the GETDATE() function. We could add a column to store the date of the last failover
    • The job can be executed multiple times because it is triggered by an alert (alt_failover)

It is also important to verify data synchronization before performing a failover.

Thank you, Amine Haloui.