SQLServerWiki

“The Only Thing That Is Constant Is Change”

Changes made using SQL Server Configuration Manager to SQL Server 2005 Clustered Instance reverts back automatically.

Posted by database-wiki on March 6, 2011

Scenario:
======

We have installed a 2 Node Windows 2003 X64BIT Cluster on which we have
installed 32 bit sql server 2005 instance.The installation went through
successfully and sql server 2005 SP2 was applied successfully as well.

After applying sql server 2005 SP2 if we make any specific changes for the sql server instance from configuration manager with regard to changing the SQL Server Error Log location , Changing the SQL Server Agent Log location were not getting reflected after taking the resources offline and bring it online.

In our troubleshooting we have noticed that there is a change in the REGSYNC entries after we apply sql server 2005 sp2 for the sql server resource.

Pre-SP2 install:

[HKEY_LOCAL_MACHINE\Cluster\Resources\144c2a52-92ab-443a-b301-c6ff02c6e038\RegSync]
“00000001”=”Software\\Microsoft\\Microsoft SQL Server\\MSSQL.1\\Cluster”
“00000002”=”Software\\Microsoft\\Microsoft SQL Server\\MSSQL.1\\MSSQLServer”
“00000003”=”Software\\Microsoft\\Microsoft SQL Server\\MSSQL.1\\Replication”
“00000004”=”Software\\Microsoft\\Microsoft SQL Server\\MSSQL.1\\SQLServerAgent”
“00000005”=”Software\\Microsoft\\Microsoft SQL Server\\MSSQL.1\\PROVIDERS”
“00000006”=”Software\\Microsoft\\Microsoft SQL Server\\MSSQL.1\\SQLServerSCP”
“00000007”=”Software\\Microsoft\\Microsoft SQL Server\\MSSQL.1\\CPE”

Post-SP2 install:

[HKEY_LOCAL_MACHINE\Cluster\Resources\144c2a52-92ab-443a-b301-c6ff02c6e038\RegSync]

“00000001”=”Software\\Microsoft\\Microsoft SQL Server\\MSSQL.1\\Cluster”
“00000002”=”Software\\Microsoft\\Microsoft SQL Server\\MSSQL.1\\MSSQLServer”
“00000003”=”Software\\Microsoft\\Microsoft SQL Server\\MSSQL.1\\Replication”
“00000004”=”Software\\Microsoft\\Microsoft SQL Server\\MSSQL.1\\SQLServerAgent”
“00000005”=”Software\\Microsoft\\Microsoft SQL Server\\MSSQL.1\\PROVIDERS”
“00000006”=”Software\\Microsoft\\Microsoft SQL Server\\MSSQL.1\\SQLServerSCP”
“00000007”=”Software\\Microsoft\\Microsoft SQL Server\\MSSQL.1\\CPE”
“00000008”=”Software\\Wow6432Node\\Microsoft\\Microsoft SQL
Server\\MSSQL.1\\Replication”
“00000009”=”Software\\Wow6432Node\\Microsoft\\Microsoft SQL
Server\\MSSQL.1\\SQLserverAgent”
“0000000b”=”Software\\Wow6432Node\\Microsoft\\Microsoft SQL
Server\\MSSQL.1\\MSSQLSERVER”
“0000000c”=”Software\\Wow6432Node\\Microsoft\\Microsoft SQL
Server\\MSSQL.1\\SQLServerSCP”

We started to investigate why this behaviour is happening after applying sql server 2005 SP2 and wanted to get this issue fixed.

Analysis:
======

We installed a x64bit cluster in-house and installed sql server 2005 32 Bit and upgraded the same to sql server 2005 SP2 to verify if this is any specific
behaviour to the environment and below are our findings.

PRE-SP2 Install

C:\Documents and Settings\administrator.DC155891>cluster res “SQL Server (I2)”
/CHECKPOINTS

Listing registry checkpoints for resource ‘SQL Server (I2)’…

Resource Registry Checkpoint
——————– ——————————————————–
SQL Server (I2) ‘Software\Microsoft\Microsoft SQL Server\MSSQL.1\Cluster’
SQL Server (I2) ‘Software\Microsoft\Microsoft SQL
Server\MSSQL.1\MSSQLServer’
SQL Server (I2) ‘Software\Microsoft\Microsoft SQL
Server\MSSQL.1\Replication’
SQL Server (I2) ‘Software\Microsoft\Microsoft SQL
Server\MSSQL.1\SQLServerAgent’
SQL Server (I2) ‘Software\Microsoft\Microsoft SQL Server\MSSQL.1\PROVIDERS’
SQL Server (I2) ‘Software\Microsoft\Microsoft SQL
Server\MSSQL.1\SQLServerSCP’
SQL Server (I2) ‘Software\Microsoft\Microsoft SQL Server\MSSQL.1\CPE’
POST-SP2 Install

C:\Documents and Settings\administrator.DC155891>cluster res “SQL Server (I2)”  /CHECKPOINTS

Listing registry checkpoints for resource ‘SQL Server (I2)’…

Resource Registry Checkpoint
——————– ——————————————————–
SQL Server (I2) ‘Software\Microsoft\Microsoft SQL Server\MSSQL.1\Cluster’
SQL Server (I2) ‘Software\Microsoft\Microsoft SQL
Server\MSSQL.1\MSSQLServer’
SQL Server (I2) ‘Software\Microsoft\Microsoft SQL
Server\MSSQL.1\Replication’
SQL Server (I2) ‘Software\Microsoft\Microsoft SQL
Server\MSSQL.1\SQLServerAgent’
SQL Server (I2) ‘Software\Microsoft\Microsoft SQL Server\MSSQL.1\PROVIDERS’
SQL Server (I2) ‘Software\Microsoft\Microsoft SQL
Server\MSSQL.1\SQLServerSCP’
SQL Server (I2) ‘Software\Microsoft\Microsoft SQL Server\MSSQL.1\CPE’
SQL Server (I2) ‘Software\Wow6432Node\Microsoft\Microsoft SQL
Server\MSSQL.1\Replication’
SQL Server (I2) ‘Software\Wow6432Node\Microsoft\Microsoft SQL
Server\MSSQL.1\SQLserverAgent’
SQL Server (I2) ‘Software\Wow6432Node\Microsoft\Microsoft SQL
Server\MSSQL.1\Cluster’

If we have a close look we do see some extra registry hives on customer box and below are the same

“0000000b”=”Software\\Wow6432Node\\Microsoft\\Microsoft SQL
Server\\MSSQL.1\\MSSQLSERVER”
“0000000c”=”Software\\Wow6432Node\\Microsoft\\Microsoft SQL
Server\\MSSQL.1\\SQLServerSCP”

Also 1 Registy hive sync key is missing

SQL Server (I2) ‘Software\Wow6432Node\Microsoft\Microsoft SQL
Server\MSSQL.1\Cluster’

We wanted to confirm if this is a cluster level checkpoint issue, Hence we started sqlserver as a stand alone instance on one of the node and tried to make the changes in configuration manager and re-started the instance and it worked fine with out any issue.

Note: This issue is consistently reproducable on there cluster as
we have tried to install 2 instances and they endup with the same problem.

This issue looks to be specific to this cluster and we might have to fix the reg sync keysby removing them from checkpoint and add them back.

cluster res “SQL Server (I2)”
/REMOVECHECKPOINTS:”Software\Wow6432Node\Microsoft\Microsoft SQL  Server\MSSQL.2\MSSQLSERVER”

This command failed with the below error

failed to open key Software\Wow6432Node\Microsoft\Microsoft SQL
Server\MSSQL.2\MSSQLSERVER error 2

cluster res “SQL Server (I2)”
/REMOVECHECKPOINTS:”Software\Wow6432Node\Microsoft\Microsoft SQL Server\MSSQL.2\SQLServerSCP”

This command failed with the below error

failed to open key Software\Wow6432Node\Microsoft\Microsoft SQL
Server\MSSQL.2\SQLServerSCP error 2

But the command had removed the entries for sql server resource , went ahead and added the cluster regsync key back.

cluster res “SQL Server (I2)”
/ADDCHECKPOINTS:”Software\Wow6432Node\Microsoft\Microsoft SQL
Server\MSSQL.2\CLUSTER”

and it got added successfully with out any issues, We took
sql server resource offline and brought it online to verify if the entries are there.
Now we made required changes in sql server configuration
manager and verify if it stays. This time the changes were staying which confirmed that we had resolved the problem. It looks like there
were some problems with the regsync keys. Since sql server agent was also having the same problem removed the regsync key and added the same back.

cluster res “SQL Server (I2)”
/REMOVECHECKPOINTS:”Software\Wow6432Node\Microsoft\Microsoft SQL
Server\MSSQL.2\SQLserverAgent”
cluster res “SQL Server (I2)”
/ADDCHECKPOINTS:”Software\Wow6432Node\Microsoft\Microsoft SQL
Server\MSSQL.2\SQLserverAgent”

We then wanted to install a 3rd instance of sql server on the
cluster and see how it behaves, well we had the same problem.

Took all the setupfiles for further analysis as well as registry backup of HKLM
Hive to investigate the issue further. Set an expectation to the customer to use the workaround so that the work doesnt get stopped while we do research on this as it may take some time.

–We worked with instance 3, trying to discover exactly what is happening.
–Process we followed:

1. We attempted to make changes to the error log path through SSCM, taking the server offline and back online. In each instance, the value for the errorlog path reverted to the original value.

2. We added a checkpoint to the cluster key that we did not have originally
(hklm\software\wow6432node\microsoft\microsoft sql server\mssql.3\cluster) and again tried to change the value. No success.

3. We removed checkpointing on software\wow6432node\microsoft\microsoft sql
server\mssql.3\mssqlserver. In the course of this, we got OS error 2, but it was removed from the list of checkpointed keys in the registry. Trying again to change the value, this time we were successful.

4. Added this key back, but the problem did not reappear.

Conclusion: I think the OS error 2 is significant to this problem. It indicates
that there is an issue with this checkpoint as originally created. When removed and recreated, the checkpoint functions as expected. I will need the assistance of the Windows engineer to determine what is wrong with the checkpoint. We may well need to install another instance to get to the root of this problem.

Looking into the cluster log collected through MPSReports following our work with the thrid instance (I3), located at

CppRegisterNotify failed to open key Software\Wow6432Node\Microsoft\Microsoft SQL  Server\MSSQL.3\Replication error 2

–This same error appears for the MSSQLServer and the SQL Agent keys as well.
–Saw this sequence:

00000a8c.0000085c::2008/12/17-21:06:36.323 INFO [CP] CppDeleteAccessMask: Deleted
access mask for resource 34f090c9-b46c-49e6-8661-3541cda00c9b, id 0000000a
00000a8c.0000085c::2008/12/17-21:06:36.323 INFO [CP] CpDeleteDataFile: removing
checkpoint file for id 10 at quorum node 3
00000a8c.0000085c::2008/12/17-21:06:36.323 INFO [Qfs] QfsDeleteFile
Q:\MSCS\34f090c9-b46c-49e6-8661-3541cda00c9b000000a.CPT, status 0
00000a8c.0000085c::2008/12/17-21:06:36.323 INFO [Qfs] QfsRemoveDirectory
Q:\MSCS\34f090c9-b46c-49e6-8661-3541cda00c9b, status 145
00000a8c.0000085c::2008/12/17-21:06:36.323 WARN [CP] CppDeleteFile- unable to
remove directory Q:\MSCS\34f090c9-b46c-49e6-8661-3541cda00c9b, error 145
00000a8c.0000085c::2008/12/17-21:06:36.323 WARN [CP] CppRundownCheckpointById –
could not find checkpoint 10 in resource SQL Server (I3)

From the cluster log, there are errors reading the keys we’ve been working with, error 2 recorded when the attempt is made. In addition to the two keys we’ve been dropping the checkpointing on (mssqlserver and sqlagent), it appears there are
problems with “hkml\software\wow6432Node\microsoft\microsoft sql
server\mssql.3\replication”. The log also shows errors when trying to read this key. I suspect we’ve not run into any issues because you haven’t been using replication.

When we actually drop checkpointing on the mssqlserver key, an error is thrown as follows:

00000a8c.0000085c::2008/12/17-21:06:36.323 INFO [CP] CppDeleteAccessMask: Deleted
access mask for resource 34f090c9-b46c-49e6-8661-3541cda00c9b, id 0000000a
00000a8c.0000085c::2008/12/17-21:06:36.323 INFO [CP] CpDeleteDataFile: removing
checkpoint file for id 10 at quorum node 3
00000a8c.0000085c::2008/12/17-21:06:36.323 INFO [Qfs] QfsDeleteFile
Q:\MSCS\34f090c9-b46c-49e6-8661-3541cda00c9b000000a.CPT, status 0
00000a8c.0000085c::2008/12/17-21:06:36.323 INFO [Qfs] QfsRemoveDirectory
Q:\MSCS\34f090c9-b46c-49e6-8661-3541cda00c9b, status 145
00000a8c.0000085c::2008/12/17-21:06:36.323 WARN [CP] CppDeleteFile- unable to
remove directory Q:\MSCS\34f090c9-b46c-49e6-8661-3541cda00c9b, error 145
00000a8c.0000085c::2008/12/17-21:06:36.323 WARN [CP] CppRundownCheckpointById –
could not find checkpoint 10 in resource SQL Server (I3)

We had installed a new instance of SQL, we’ll call it SQL.4

We went into SQL Config manager and changed the SQLAgent log file to M:\Log, it was M:\Program Files\Microsoft SQL Server….
We checked the live hive that stores this value and it did indeed change in
Software\Wow6432Node\Microsoft\Microsoft SQL Server\MSSQL.3\SQLserverAgent

However the cluster had two values in the RegSync key for that SQL Server
resource.

00000004 Software\Microsoft\Microsoft SQL Server\MSSQL.3\SQLServerAgent
and
00000009 Software\Wow6432Node\Microsoft\Microsoft SQL
Server\MSSQL.3\SQLserverAgent

We checked the 00000004.cpt file in the {GUID} folder in the \MSCS directory and it
reflected the correct value there. However, 00000009.cpt showed the old value.

We offlined and onlined the SQL group and the Log file path reverted back to
M:\Program Files\Microsoft SQL Server……

Conclusion, the second checkpoint 00000009 is not needed. Due to the WOW 64 layer
in the registry, the path a client sees as ‘Software\Microsoft\Microsoft SQL
Server\MSSQL.3\SQLServerAgent’ actually gets translated to
Software\Wow6432Node\Microsoft\Microsoft SQL Server\MSSQL.3\SQLserverAgent

Therefore, we have to checkpoints pointing to the same spot. Problem is, the second checkpoint never changes so anytime a checkpoint happens, the value will revert back to whatever it was when SP2 was installed.

We did not install SP1 first, We went straight to SP2. We could probably just remove the checkpoints that show the Wow6432node values since they are not needed. Will see if I can repro locally. Most likely, they changed the checkpointing in SP1 and by going from RTM to SP2, we missed the change.

RESOLUTION:
==========

It appears we are running into a bug with Service Pack (SP) installs on clusters, one that does not affect all cluster installations, but you’re lucky enough to be working with one that does suffer from this issue. There are some assumptions made during the SP2 installation that,
while perfectly valid on a standalone system, cause cluster installs to display the symptoms you have seen in the course of this case.

There is no fix available for this issue, at least in the short term, but there are
2 workarounds available. These are:

1. Delete the checkpoints, which is what we’ve been doing in the course of this case. According to those in the know, there is no need to recreate them as we’ve been doing. This is probably the simplest way to address the problem.

2. Install SP1 before installing SP2. This has been shown to work in similar
cases, but you would need to test it to be sure it does work for you. So, you
would install the base SQL Server installation (RTM), then install SP1, then
install SP2. This might be a riskier approach, as it increases the chances you
will see an installation failure in the course of an SP installation.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: