Availability Group - Delay with Listener Connectivity After Failover
The error is as it says, the Computer object does not have the appropriate permissions on the Domain to register the DNS Name Resource for the Listener. For the cluster to perform this operation smoothly "Authenticated Users" should have read/write all permissions on the Computer Object for the cluster, its nodes and the Listener DNS Name. To do this, Log in to the Active Directory ServerCluster network name resource 'Listener_DNS_NAME' failed registration of one or more associated DNS name(s) for the following reason: DNS operation refused. . Ensure that the network adapters associated with dependent IP address resources are configured with at least one accessible DNS server.
- Open Active Directory Users and Computers.
- On the View menu, select Advanced Features.
- Right-click the object and then click Properties.
- On the Security tab, click Advanced to view all of the permission entries that exist for the object.
- Verify that the Authenticated Users is in the list and has the permission to Read and Write All. Add the required permissions then Save the changes.
Ah, the often over-looked SPN. This should be part of your installation process - setting the SPN. To keep the story short and so we can get to the exciting part, you can refer here for the detailed instructions on how to configure the SPN for SQL Server. Aside from registering the SPN for each of the Nodes as specified on the MSDN Link, You'll also need to register the SPN for the Listener, as always 1433 is the port being used by your SQL Server:The Kerberos client received a KRB_AP_ERR_MODIFIED error from the server ComputerName$. The target name used was HTTP/ComputerName.Domain.com. This indicates that the target server failed to decrypt the ticket provided by the client. This can occur when the target server principal name (SPN) is registered on an account other than the account the target service is using. Ensure that the target SPN is only registered on the account used by the server. This error can also happen if the target service account password is different than what is configured on the Kerberos Key Distribution Center for that target service. Ensure that the service on the server and the KDC are both configured to use the same password. If the server name is not fully qualified, and the target domain (Domain.COM) is different from the client domain (Doamin.COM), check if there are identically named server accounts in these two domains, or use the fully-qualified name to identify the server.
setspn -A MSSQLSvc/Listener_DNS_NAME.Domain.com:1433 DOMAIN/SQLServiceAccountThis will enable Kerberos for the client connection to the Availability Group Listener and address the errors we received above. After configuring the SPN for the servers Automatic Fail over is now running smoothly, or so we thought. The client came back to us that it was taking some time for the application to connect to the Listener Name. Checking on the cluster logs and SQL Server's end, everything appears to be in order. No errors being thrown and Server is accessible. Now it get's interesting. Ping test within the Database sub net is successful but ping test outside of it was timing out. It was able to connect though after a substantial amount of time. After a few more test and checking the behavior is the same on both nodes. It takes more than thirty minutes for the Name to be reachable outside of the database sub net. After involving the Network Admin we found out that a MAC Address conflict is happening. That's our "Aha!" moment. Windows 2003 servers and later issues a Gratuitous ARP (GARP) requests during failover. There are some switches/devices that does not forward Gratuitous ARP by default. This causes the devices on the other end of the switch to not have the correct MAC address associated to the Name. This causes the failure. It often corrects itself when the router detects the failures and do a broadcast and gets the correct value. That's why it becomes accessible after some time. This KB details the issue. To address this, changes must be done on the configuration of the switches, you'll need to check with your hard ware vendor for this. Now, that could be the end of our problems, but after enabling the switch to forward GARP, we found out that the server itself is not sending a GARP request. This is a server configuration issue and requires some Registry changes. Open the Registry for the server and locate the key below:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
From there check if there is a key for
ArpRetryCount, if there is make sure that the Value is not set to 0. The value could be between 0-3. After changing this and restarting the servers Everything works perfectly. Last two issues are a bit rare, and something I wouldn't have experienced if the client wasn't using that particular hard ware and that particular standard configuration.
On this page
Share this
Share this
More resources
Learn more about Pythian by reading the following blogs and articles.
Let DataGuard Broker Do ALL The Work
Let DataGuard Broker Do ALL The Work
Feb 18, 2020 12:00:00 AM
2
min read
News and upates from Microsoft Azure, pt II
News and upates from Microsoft Azure, pt II
Sep 26, 2018 12:00:00 AM
8
min read
Automated Database Backups for Cloud Spanner Using Terraform

Automated Database Backups for Cloud Spanner Using Terraform
Dec 17, 2021 12:00:00 AM
5
min read
Ready to unlock value from your data?
With Pythian, you can accomplish your data transformation goals and more.