Let’s talk about a funny story that concerns an interesting issue that I faced a couple of months ago at one of my customers. Initially, the problem concerned only the creation of an availability group listener but after deleting the related availability group, he noticed quickly that the deletion failed but even more strange, he noticed that the virtual network name related to the listener corresponded to the virtual computer object of the cluster itself.
In order to expose the issue, I have simulated the same issue on my lab environment and here the initial picture of the scenario:
First let’s focus on the cluster name. Initially, the cluster name was WIN2012CLUST and you may notice that the name has been changed to LST_SHPT that corresponds to the listener of my shptgrp availability group. In addition, if we take a look at the cluster core resources section, both the cluster network name and the virtual IP address related to the cluster itself have effectively disappeared as shown below:
Next, the first attempt made by my customer was to try to drop the availability group from SQL Server Management Studio but he faced the following error:
So, let’s summarize the initial situation:
- The cluster has been renamed with a name that corresponds to the availability group listener name
- The cluster core resources (virtual network name + virtual ip address) have disappeared from the cluster core resources group
- An availability group exists and includes a listener name that corresponds to the cluster name and a specific virtual ip address
- Dropping the availability group seems to fail
- Dropping the listener name does’nt work because it is a cluster core resource
In addition, let’s precise that we were in a production environment with a more complex architecture compared to my lab environment that included multiple availability groups and critical applications on the same WSFC. In such context, attempting to perform dangerous manipulations at the Windows failover cluster layer mean compromising potentially the entire environment availability. Fortunately, others availability groups are not impacted by this weird situation, so we decided to wait a non-business day to fix this issue.
At this point, the first thing we wanted to do was to rename correctly the cluster and this is what the Microsoft support advised but unfortunately this action was not successful with the following error:
We guessed we can’t rename the cluster because we already had a resource with the same name. We tried to rename the virtual network related to the listener with a dummy name but once again we faced an error. It looked like a death-spiral …
But after analysing calmly the situation (working in a maintenance windows provides some advantages. Time is not your enemy in this case), we find out how to address this issue and the solution was in fact pretty simple (I know, a solution is always simple when you get it 🙂 ) So the solution consisted in using PowerShell cmdlets. Why PowerShell? In fact, we figured out that the both cluster network name LST_SHPT and its virtual IP address resources are initially member of the cluster core group. We verified our assumption by using the Get-ClusterResource cmdlet as following:
Indeed, their names put us on the right back. So we could assume safely that these resources have been moved accidentally from the cluster core resources group to the resource group related to the availability group. However, by using the cluster manager console, we quickly saw that it didn’t provide any way to move resources back to the cluster core resource group and this is precisely where PowerShell may help us. Let’s find out the right cmdlet to use for this specific tasks:
Well,using the Move-ClusterResource cmdlet was the good solution in your case:
And finally we have returned to a more normal situation. Let’s have a look at the resource core section from the cluster manager console:
What about dropping the availability group and renaming the Microsoft failover cluster now? Well, we dropped first the availability group from SQL Server Management Studio without encountering any issue. Then we also were able to rename the cluster name and to change its related virtual IP address after performing some active directory and DNS tricks (like enabling the disabled computer object related to the VCO and changing the permissions on the active directory and DNS records etc …).
It was finally a happy end with a high available environment that works properly and a good drink to celebrate this!
By David Barbarin