SQLServerWiki

“The Only Thing That Is Constant Is Change”

Perfect SQL Database Corruption Scenario. Bad Old Days…

Posted by database-wiki on March 6, 2011

Database Corruption:

ISSUE OCCURS ON 1/29/2009

In the event log few errors pop up stating the raid and due to which cluster disk got some issue:

Event Type:        Warning

Event Source:    ql2300

Event Category:                None

Event ID:              118

Date:                     1/29/2009

Time:                     7:23:36 AM

User:                     N/A

Computer:          MAC97

Description:

The driver for device \Device\RaidPort1 performed a bus reset upon request.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

Data:

0000: 0f 00 10 00 01 00 66 00   ……f.

0008: 00 00 00 00 76 00 04 80   ….v..?

0010: 01 00 00 00 00 00 00 00   ……..

0018: 00 00 00 00 00 00 00 00   ……..

0020: 00 00 00 00 00 00 00 00   ……..

0028: 00 00 00 00 00 00 00 00   ……..

0030: 00 00 00 00 76 00 04 80   ….v..?

Event Type:        Warning

Event Source:    ql2300

Event Category:                None

Event ID:              118

Date:                     1/29/2009

Time:                     7:23:36 AM

User:                     N/A

Computer:          MAC97

Description:

The driver for device \Device\RaidPort0 performed a bus reset upon request.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

Data:

0000: 0f 00 10 00 01 00 66 00   ……f.

0008: 00 00 00 00 76 00 04 80   ….v..?

0010: 01 00 00 00 00 00 00 00   ……..

0018: 00 00 00 00 00 00 00 00   ……..

0020: 00 00 00 00 00 00 00 00   ……..

0028: 00 00 00 00 00 00 00 00   ……..

0030: 00 00 00 00 76 00 04 80   ….v..?

Event Type:        Error

Event Source:    ClusDisk

Event Category:                None

Event ID:              1209

Date:                     1/29/2009

Time:                     7:23:36 AM

User:                     N/A

Computer:          MAC97

Description:

The description for Event ID ( 1209 ) in Source ( ClusDisk ) cannot be found. The local computer may not have the necessary registry information or message DLL files to display messages from a remote computer. You may be able to use the /AUXSOURCE= flag to retrieve this description; see Help and Support for details. The following information is part of the event: \Device\ClusDisk0.

Data:

0000: 0e 00 00 00 01 00 5a 00   ……Z.

0008: 00 00 00 00 b9 04 00 00   ….¹…

0010: 41 52 73 74 00 00 00 00   ARst….

0018: 00 00 00 00 00 00 00 00   ……..

0020: 00 00 00 00 00 00 00 00   ……..

In the Application log when the instance fails over and come up, by design SQL server recovery all the database and detects an error in the data file of  RACE.

Index corruption some time can be fixed by dropping and recreating

the index but error message like 824 are caused by disk issues and cannot be fixed. Look below: 

Event Type:        Error

Event Source:    MSSQLSERVER

Event Category:                (2)

Event ID:              824 ( Was my concern even before getting hands on the logs)

Date:                     1/29/2009

Time:                     7:58:14 AM

User:                     S-1-5-21-3255160965-171093024-5714264-73844

Computer:          SQL1

Description:

SQL Server detected a logical consistency-based I/O error: torn page (expected signature: 0x55555555; actual signature: 0x155555). It occurred during a read of page (1:2206845) in database ID 14 at offset

0x000004358fa000 in file ‘D:\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\RACE.mdf’.  Additional messages in the SQL Server error log or system event log may provide more detail. This is a

severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more

information, see SQL Server Books Online.

For more information, see

Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

Data:

0000: 38 03 00 00 18 00 00 00   8…….

0008: 09 00 00 00 56 00 42 00   ……..

0010: 50 00 43 00 53 00 51 00   …S.Q.

0018: 4c 00 31 00 00 00 08 00   L.1…..

0020: 00 00 57 00 52 00 47 00   ……..

0028: 52 00 41 00 43 00 45 00   R.A.C.E.

After which we noted the corruption because our RACE must have go to suspect mode and hence we started to fix the issue.

The first time DBCC CHECKDB with REPAIR_REBUILD was run by RACE\bpadmin. This will salvage the data’s (240 errors) in the bad pages according to SQL Server.

2009-01-29 10:04:55.87 spid54      Setting database option SINGLE_USER to ON for database RACE.

2009-01-29 10:07:36.81 spid54      DBCC CHECKDB (RACE, repair_rebuild) executed by RACE\bpadmin found 240 errors and repaired 0 errors. Elapsed time: 0 hours 2 minutes 40 seconds.

2009-01-29 10:07:36.81 spid54      Using ‘dbghelp.dll’ version ‘4.0.5’

2009-01-29 10:07:36.84 spid54      **Dump thread – spid = 54, PSS = 0x00000001543A5BD0, EC = 0x00000001543A5BE0

2009-01-29 10:07:36.85 spid54      ***Stack Dump being sent to D:\Microsoft SQL Server\MSSQL.1\MSSQL\LOG\SQLDump0005.txt

2009-01-29 10:07:36.85 spid54      * *******************************************************************************

2009-01-29 10:07:36.85 spid54      *

2009-01-29 10:07:36.85 spid54      * BEGIN STACK DUMP:

2009-01-29 10:07:36.85 spid54      *   01/29/09 10:07:36 spid 54

2009-01-29 10:07:36.85 spid54      *

2009-01-29 10:07:36.85 spid54      * DBCC database corruption

2009-01-29 10:07:36.85 spid54      *

2009-01-29 10:07:36.85 spid54      * Input Buffer 346 bytes –

2009-01-29 10:07:36.85 spid54      *             PRINT ‘check and repair Object database’  exec sp_dboption ‘

2009-01-29 10:07:36.85 spid54      *  RACE’,single,true  DBCC CHECKDB (‘RACE’,REPAIR_REBUILD)  exec sp_db

2009-01-29 10:07:36.85 spid54      *  option ‘RACE’,single,false

2009-01-29 10:07:36.85 spid54      * 

2009-01-29 10:07:36.85 spid54      * *******************************************************************************

2009-01-29 10:07:36.85 spid54      * ——————————————————————————-

2009-01-29 10:07:36.85 spid54      * Short Stack Dump

2009-01-29 10:07:36.95 spid54      Stack Signature for the dump is 0x0000000000000081

2009-01-29 10:07:39.07 spid54      External dump process return code 0x20000001.

External dump process returned no errors.

2009-01-29 10:07:39.09 spid54      Setting database option MULTI_USER to ON for database RACE.

2009-01-29 10:12:06.25 Server

Since corrupted pages were removed from RACE database and mode changed to MULTI_USER,recovery happens and RACE comes online.

2009-01-29 10:48:45.38 spid54      Starting up database ‘RACE’.

2009-01-29 10:48:45.40 spid54      The database ‘RACE’ is marked RESTORING and is in a state that does not allow recovery to be run.

2009-01-29 10:48:52.23 spid54      Starting up database ‘RACE’.

2009-01-29 10:48:54.17 spid54      Recovery is writing a checkpoint in database ‘RACE’ (14). This is an informational message only. No user action is required.

2009-01-29 10:48:55.28 spid54      CHECKDB for database ‘RACE’ finished without errors on 2008-03-09 00:01:52.190 (local time). This is an informational message only; no user action is required.

Then for some reason a backup of  RACE taken on 23-01-2009 was restored and a checkdb was run:

 Even backup taken on 23rd was corrupted. (26 errors)

 Why? The database and the backups were present in the same disk partition which is a very dangerous move on the desaster planning part.

 2009-01-29 10:48:55.28 Backup      Database was restored: Database: RACE, creation date(time): 2008/03/04(18:04:09), first LSN: 153235:20836:1, last LSN: 153235:25009:1, number of dump devices: 1, device information: (FILE=1, TYPE=DISK: {‘D:\Microsoft SQL Server\MSSQL.1\MSSQL\Backup\RACE\RACE_backup_200901230030.bak’}). Informational message. No user action required.

2009-01-29 10:53:25.61 spid52      DBCC CHECKDB (RACE) executed by RACE\bpadmin found 17 errors and repaired 0 errors. Elapsed time: 0 hours 3 minutes 29 seconds.

2009-01-29 10:53:25.61 spid52      Using ‘dbghelp.dll’ version ‘4.0.5’

2009-01-29 10:53:25.61 spid52      **Dump thread – spid = 52, PSS = 0x00000000822AFBF0, EC = 0x00000000822AFC00

2009-01-29 10:53:25.61 spid52      ***Stack Dump being sent to D:\Microsoft SQL Server\MSSQL.1\MSSQL\LOG\SQLDump0006.txt

2009-01-29 10:53:25.61 spid52      * *******************************************************************************

2009-01-29 10:53:25.61 spid52      *

2009-01-29 10:53:25.61 spid52      * BEGIN STACK DUMP:

2009-01-29 10:53:25.61 spid52      *   01/29/09 10:53:25 spid 52

2009-01-29 10:53:25.61 spid52      *

2009-01-29 10:53:25.61 spid52      * DBCC database corruption

2009-01-29 10:53:25.61 spid52      *

2009-01-29 10:53:25.61 spid52      * Input Buffer 70 bytes –

2009-01-29 10:53:25.61 spid52      *             dbcc checkdb (‘RACE’)

2009-01-29 10:53:25.61 spid52      * 

2009-01-29 10:53:25.61 spid52      * *******************************************************************************

2009-01-29 10:53:25.61 spid52      * ——————————————————————————-

2009-01-29 10:53:25.61 spid52      * Short Stack Dump

2009-01-29 10:53:25.63 spid52      Stack Signature for the dump is 0x0000000000000081

2009-01-29 10:53:27.05 spid52      External dump process return code 0x20000001.

External dump process returned no errors.

2009-01-29 10:58:40.38 spid52      Setting database option SINGLE_USER to ON for database WRGRACE.

2009-01-29 11:01:22.95 spid52      DBCC CHECKDB (RACE, repair_rebuild) executed by RACE\bpadmin found 26 errors and repaired 0 errors. Elapsed time: 0 hours 2 minutes 42 seconds.

2009-01-29 11:01:22.95 spid52      Using ‘dbghelp.dll’ version ‘4.0.5’

2009-01-29 11:01:22.95 spid52      **Dump thread – spid = 52, PSS = 0x00000000822AFBF0, EC = 0x00000000822AFC00

2009-01-29 11:01:22.95 spid52      ***Stack Dump being sent to D:\Microsoft SQL Server\MSSQL.1\MSSQL\LOG\SQLDump0007.txt

2009-01-29 11:01:22.95 spid52      * *******************************************************************************

2009-01-29 11:01:22.95 spid52      *

2009-01-29 11:01:22.95 spid52      * BEGIN STACK DUMP:

2009-01-29 11:01:22.95 spid52      *   01/29/09 11:01:22 spid 52

2009-01-29 11:01:22.95 spid52      *

2009-01-29 11:01:22.95 spid52      * DBCC database corruption

2009-01-29 11:01:22.95 spid52      *

2009-01-29 11:01:22.95 spid52      * Input Buffer 350 bytes –

2009-01-29 11:01:22.97 spid52      *             PRINT ‘check and repair Object database’  exec sp_dboption ‘W

2009-01-29 11:01:22.97 spid52      *  RGRACE’,single,true  DBCC CHECKDB (‘RACE’,REPAIR_REBUILD)  exec sp_db

2009-01-29 11:01:22.97 spid52      *  option ‘RACE’,single,false 

2009-01-29 11:01:22.97 spid52      * 

2009-01-29 11:01:22.97 spid52      * *******************************************************************************

2009-01-29 11:01:22.97 spid52      * ——————————————————————————-

2009-01-29 11:01:22.97 spid52      * Short Stack Dump

2009-01-29 11:01:22.99 spid52      Stack Signature for the dump is 0x0000000000000081

2009-01-29 11:01:23.52 spid52      External dump process return code 0x20000001.

External dump process returned no errors.

Once REPAIR_REBUILD was run the database RACE Came online after setting to MULTI_USER mode but we still get few I/O messages.

(Remember like 824 are caused due to disk failure or some bad sector in the disk holding the data files and backup files of the database)

2009-01-29 11:01:23.55 spid52      Setting database option MULTI_USER to ON for database RACE.

2009-01-29 11:38:35.94 spid58      Error: 824, Severity: 24, State: 2.

2009-01-29 11:38:35.94 spid58      SQL Server detected a logical consistency-based I/O error: incorrect pageid (expected 1:501453; actual 1:501439). It occurred during a read of page (1:501453) in database ID 14 at offset 0x000000f4d9a000 in file ‘D:\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\RACE.mdf’.  Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

2009-01-29 12:35:29.48 spid55      Error: 824, Severity: 24, State: 2.

2009-01-29 12:35:29.48 spid55      SQL Server detected a logical consistency-based I/O error: incorrect pageid (expected 1:501453; actual 1:501439). It occurred during a read of page (1:501453) in database ID 14 at offset 0x000000f4d9a000 in file ‘D:\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\RACE.mdf’.  Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

To fix this Error 824 dbcc repair_all_data_loss was and is the only option till now.

2009-01-29 15:35:07.99 spid51      Setting database option SINGLE_USER to ON for database RACE.

2009-01-29 15:47:30.52 spid51      DBCC CHECKDB (RACE, repair_allow_data_loss) executed by RACE\bpadmin found 26 errors and repaired 26 errors. Elapsed time: 0 hours 12 minutes 22 seconds.

2009-01-29 15:47:30.52 spid51      Using ‘dbghelp.dll’ version ‘4.0.5’

2009-01-29 15:47:30.52 spid51      **Dump thread – spid = 51, PSS = 0x00000000D7575BE0, EC = 0x00000000D7575BF0

2009-01-29 15:47:30.52 spid51      ***Stack Dump being sent to D:\Microsoft SQL Server\MSSQL.1\MSSQL\LOG\SQLDump0008.txt

2009-01-29 15:47:30.52 spid51      * *******************************************************************************

2009-01-29 15:47:30.52 spid51      *

2009-01-29 15:47:30.52 spid51      * BEGIN STACK DUMP:

2009-01-29 15:47:30.52 spid51      *   01/29/09 15:47:30 spid 51

2009-01-29 15:47:30.52 spid51      *

2009-01-29 15:47:30.52 spid51      * DBCC database corruption

2009-01-29 15:47:30.52 spid51      *

2009-01-29 15:47:30.52 spid51      * Input Buffer 366 bytes –

2009-01-29 15:47:30.52 spid51      *             PRINT ‘check and repair Object database’  exec sp_dboption ‘

2009-01-29 15:47:30.52 spid51      *  RACE’,single,true  DBCC CHECKDB (‘RACE’,REPAIR_ALLOW_DATA_LOSS)  ex

2009-01-29 15:47:30.52 spid51      *  ec sp_dboption ‘RACE’,single,false 

2009-01-29 15:47:30.52 spid51      * 

2009-01-29 15:47:30.52 spid51      * *******************************************************************************

2009-01-29 15:47:30.52 spid51      * ——————————————————————————-

2009-01-29 15:47:30.52 spid51      * Short Stack Dump

2009-01-29 15:47:30.53 spid51      Stack Signature for the dump is 0x0000000000000081

2009-01-29 15:47:35.56 spid51      External dump process return code 0x20000001.

External dump process returned no errors.

2009-01-29 15:47:35.58 spid51      Setting database option MULTI_USER to ON for database RACE.

26 error got fixed and WRGRACE COMES ONLINE. Why dbcc checkdb fail to remove 26 errors for the first time but succeed the next time.

To be honest, it very internal to SQL Server itself. I Have worked with several corruption cases, runing the dbcc checkdb

with repair_allow_data_loss for several times will gradually bring down the number of error messages and finally end up with 0 errors.

eg: (final DBCC CHECKDB with REPAIR_ALLOW_DATA_LOSS will look like below message)

CHECKDB found 0 allocation errors and 0 consistency errors in database ‘RACE’.

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

This is how the RACE corruption was fixed.

Till today we are using the database that was restored from 23rd of JAN 2009.

 DISK ISSUE HISTORY:

================

The first disk issue occurred on 2/9/2008 but looking at the event log we always had come inconsistent behaviour of Disks (HP) and RAID that were not very destructive.

Event Type:        Error

Event Source:    Disk

Event Category:                None

Event ID:              15

Date:                     2/9/2008

Time:                     6:26:09 AM

User:                     N/A

Computer:          MAC97

Description:

The device, \Device\Harddisk3\DR4, is not ready for access yet.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

Data:

0000: 0e 00 80 00 01 00 d2 00   ..?…Ò.

0008: 00 00 00 00 0f 00 04 c0   …….À

0010: 04 01 00 00 9d 00 00 c0   …...À

0018: 00 00 00 00 10 48 2d 00   …..H-.

0020: 00 00 00 00 00 00 00 00   ……..

0028: 12 08 bc 00 00 00 00 00   ..¼…..

0030: ff ff ff ff 00 00 00 00   ÿÿÿÿ….

0038: 58 00 00 0a 00 00 00 00   X…….

0040: 00 20 06 12 0a 01 00 00   . ……

0048: 00 00 00 00 3c 00 00 00   ….<…

0050: 00 00 00 00 00 00 00 00   ……..

0058: f0 41 1c 38 df fa ff ff   ðA.8ßúÿÿ

0060: 00 00 00 00 00 00 00 00   ……..

0068: 70 59 79 34 df fa ff ff   pYy4ßúÿÿ

0070: 00 00 00 00 00 00 00 00   ……..

0078: 00 00 00 00 00 00 00 00   ……..

0080: 16 00 00 00 00 00 00 00   ……..

0088: 00 00 00 00 00 00 00 00   ……..

0090: 00 00 00 00 00 00 00 00   ……..

0098: 00 00 00 00 00 00 00 00   ……..

00a0: 00 00 00 00 00 00 00 00   ……..

Event Type:        Warning

Event Source:    Ftdisk

Event Category:                Disk

Event ID:              57

Date:                     2/9/2008

Time:                     6:26:09 AM

User:                     N/A

Computer:          MAC97

Description:

The system failed to flush data to the transaction log. Corruption may occur.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

Data:

0000: 00 00 00 00 01 00 16 01   ……..

0008: 02 00 00 00 39 00 04 80   ….9..?

0010: 00 00 00 00 10 00 00 80   …….?

0018: 00 00 00 00 00 00 00 00   ……..

0020: 00 00 00 00 00 00 00 00   ……..

I am not clear about the below error message:

Event Type:        Warning

Event Source:    l2nd

Event Category:                None

Event ID:              4

Date:                     2/5/2008

Time:                     10:03:43 PM

User:                     N/A

Computer:          MAC97

Description:

The description for Event ID ( 4 ) in Source ( l2nd ) cannot be found. The local computer may not have the necessary registry information or message DLL files to display messages from a remote computer. You may be able to use the /AUXSOURCE= flag to retrieve this description; see Help and Support for details. The following information is part of the event: \Device\{69EB1231-4535-46F4-9DFF-E61219067B6A, HP NC371i.

Data:

0000: 00 00 00 00 02 00 4e 00   ……N.

0008: 00 00 00 00 04 00 05 80   …….?

0010: 00 00 00 00 00 00 00 00   ……..

0018: 00 00 00 00 00 00 00 00   ……..

0020: 00 00 00 00 00 00 00 00   ……..

Event Type:        Warning

Event Source:    mpio

Event Category:                None

Event ID:              20

Date:                     2/6/2008

Time:                     3:46:05 AM

User:                     N/A

Computer:          MAC97

Description:

The description for Event ID ( 20 ) in Source ( mpio ) cannot be found. The local computer may not have the necessary registry information or message DLL files to display messages from a remote computer. You may be able to use the /AUXSOURCE= flag to retrieve this description; see Help and Support for details. The following information is part of the event: \Device\MPIODisk0, HP MPIO DSM for XP family of Disk Arrays.

Data:

0000: 00 00 00 00 02 00 4e 00   ……N.

0008: 00 00 00 00 14 00 08 80   …….?

0010: 1e 00 00 00 85 01 00 c0   ….?..À

0018: 00 00 00 00 00 00 00 00   ……..

0020: 00 00 00 00 00 00 00 00   ……..

Event Type:        Warning

Event Source:    Ntfs

Event Category:                None

Event ID:              50

Date:                     2/9/2008

Time:                     6:26:09 AM

User:                     N/A

Computer:          MAC97

Description:

{Delayed Write Failed} Windows was unable to save all the data for the file . The data has been lost. This error may be caused by a failure of your computer hardware or network connection. Please try to save this file elsewhere.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

Data:

0000: 04 00 04 00 02 00 52 00   ……R.

0008: 00 00 00 00 32 00 04 80   ….2..?

0010: 00 00 00 00 10 00 00 80   …….?

0018: 00 00 00 00 00 00 00 00   ……..

0020: 00 00 00 00 00 00 00 00   ……..

0028: 10 00 00 80               …?   

Event Type:        Warning

Event Source:    Removable Storage Service

Event Category:                None

Event ID:              93

Date:                     2/14/2008

Time:                     12:17:55 AM

User:                     N/A

Computer:          MAC97

Description:

Neither copy of the RSM database is consistent: Reconstructing using the main datafile.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

 About 824:

========

 SQL Server detected a logical consistency-based I/O error: %ls. It occurred during a %S_MSG of page %S_PGID in database ID %d at offset %#016I64x in file ‘%ls’.  Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

 824 errors other than a bad checksum or a torn page (for example, a bad page ID)

 824 and 823 Errors

 In SQL Server 2000, 823 errors were used primarily to indicate problems with Reading and Writing data to the database data files. The error conditions would include Operating System problems and SQL Server detected consistency problems.

  1. Operating system errors would be returned when the API call used to make the IO request (Read or Write) failed and returned the appropriate Windows OS Error code. This error code would appear along with the 823 error message reported.
  2. On the other hand, the actual IO request would appear to succeed but there might have been problems detected by the SQL Server checks performed in the IO completion routines. Examples of this were, page id check and torn page check.

Because both OS and SQL Detected errors were reported using a single error message, it caused confusion among CSS and customers in getting to the root cause and identifying the source of the problem. 

In SQL Server 2005, there is some additional error detection present in IO Completion routines. In order to take care of the first issue and also to facilitate inclusion of the new error checks done, a new error message has been introduced in SQL Server 2005.

When a Operating System error is encountered during the IO request, this will be reported as a 823 error. When a SQL Server Detected consistency error is encountered, it will be reported using 824.

How to avoid Database Corruption like this:

1.) Microsoft has prescribed configurations:

Support Article:

http://support.microsoft.com/?id=309395

Check your cluster configuration:

http://www.windowsservercatalog.com/

Cluster Do’s and Don’ts

http://support.microsoft.com/kb/254321

NOTE:

If you do not implement a server cluster that is listed as a cluster solution or geographically dispersed cluster solution, you do not have a supported solution in the eyes of Microsoft. It is imperative

that you work with your hardware vendor to ensure that the hardware it is selling you matches an entry in the Windows Server Catalog. If the company cannot provide a link to the solution, do not buy it. For reference, keep

the following Microsoft Knowledge Base articles handy: 327518, The Microsoft SQL Server Support Policy for Microsoft Clustering (http://support.microsoft.com/kb/327518), and 303395, The Microsoft Support

Policy for Server Clusters, the Hardware Compatibility List, and the Windows Server Catalog (http://support.microsoft.com/kb/309395/).

2.) Our Guru talks about corruption detection and prevention:

http://www.microsoft.com/emea/teched2008/itpro/tv/default.aspx?vid=78

http://www.sqlskills.com/BLOGS/PAUL/post/CHECKDB-From-Every-Angle-Consistency-Checking-Options-for-a-VLDB.aspx

Crucial Database Maintenance Techniques for Databases of All Sizes

http://sqlblogcasts.com/blogs/tonyrogerson/archive/2007/02/11/6th-march-kimberly-tripp-and-paul-randal-crucial-database-maintenance-techniques-for-databases-of-all-sizes.aspx

3.) As far as I know,

Ensure your IO subsystem is stable and performing properly. Corruption is, in the vast majority of cases, a hardware problem.

References: ( Not Very related to Corruption)

• Books Online (May 2007) :

 http://www.microsoft.com/downloads/details.aspx?FamilyId=BE6A2C5D-00DF-4220-B133-29C1E0B6585F&displaylang=en

 • 915846 Best practices that you can use to set up domain groups and solutions to problems that may occur when you set up a domain group when you install a SQL Server 2005 failover cluster

 http://support.microsoft.com/default.aspx?scid=kb;EN-US;915846

  • 819546 SQL Server 2000 and SQL Server 2005 support for mounted volumes

http://support.microsoft.com/default.aspx?scid=kb;EN-US;819546

 • 913815 Error message when you install a SQL Server 2005 failover cluster on a node: “The drive specified cannot be used for program location”

http://support.microsoft.com/default.aspx?scid=kb;EN-US;913815

 • 922670 How to use the Add or Remove Programs item in Control Panel to add or remove components for stand-alone installations and clustered installations of SQL Server 2005

http://support.microsoft.com/default.aspx?scid=kb;EN-US;922670

• 910230 How to install SQL Server 2005 Analysis Services on a failover cluster

http://support.microsoft.com/default.aspx?scid=kb;EN-US;910230

• 910233 Migrate a SQL Server 2000 Analysis Services cluster to a SQL Server 2005 Analysis Services cluster

http://support.microsoft.com/default.aspx?scid=kb;EN-US;910233

 • 912397 The SQL Server service cannot start when you change a startup parameter for a clustered instance of SQL Server 2000 or of SQL Server 2005 to a value that is not valid

http://support.microsoft.com/default.aspx?scid=kb;EN-US;912397

 • 910851 You receive error messages when you try to set up a clustered instance of SQL Server 2005

http://support.microsoft.com/default.aspx?scid=kb;EN-US;910851

• 926621 Error message when you try to install SQL Server 2005 in a cluster environment: “SQL Server Setup could not validate the service accounts”

http://support.microsoft.com/default.aspx?scid=kb;EN-US;926621

• 327518 The Microsoft SQL Server support policy for Microsoft Clustering

http://support.microsoft.com/default.aspx?scid=kb;EN-US;327518

   • 254321 Clustered SQL Server do’s, don’ts, and basic warnings

http://support.microsoft.com/default.aspx?scid=kb;EN-US;254321

• 942176 Description of the SQL Server Integration Services (SSIS) service and of alternatives to clustering the SSIS service

http://support.microsoft.com/default.aspx?scid=kb;EN-US;942176

 • 922209 The SQL Server 2005 Setup program does not remove all IP address cluster resources when you uninstall SQL Server 2005

http://support.microsoft.com/default.aspx?scid=kb;EN-US;922209

 • 295732 How to create databases or change disk file locations on a shared cluster drive on which SQL Server was not originally installed

http://support.microsoft.com/default.aspx?scid=kb;EN-US;295732

• 263712 How to impede Windows NT administrators from administering a clustered instance of SQL Server

 http://support.microsoft.com/default.aspx?scid=kb;EN-US;263712

  • 932881 How to make unwanted access to SQL Server 2005 by an operating system administrator more difficult

http://support.microsoft.com/default.aspx?scid=kb;EN-US;932881

• 934749 BUG: Error message when you try to install SQL Server 2005 Service Pack 1 or SQL Server 2005 Service Pack 2 from the existing active node: “The product instance <InstanceName> been patched with more recent updates”

http://support.microsoft.com/default.aspx?scid=kb;EN-US;934749

• 283811 How to change the SQL Server or SQL Server Agent service account without using SQL Enterprise Manager in SQL Server 2000 or SQL Server Configuration Manager in SQL Server 2005

http://support.microsoft.com/default.aspx?scid=kb;EN-US;283811

 • 910070 FIX: The SQL Server 2005 Setup program may take a very long time to be completed

http://support.microsoft.com/default.aspx?scid=kb;EN-US;910070

 • 909967 How to uninstall an instance of SQL Server 2005 manually

http://support.microsoft.com/default.aspx?scid=kb;EN-US;909967

 Windows 2003 SP2 is better than SP1 because:

  • 918483 How to reduce paging of buffer pool memory in the 64-bit version of SQL Server 2005

http://support.microsoft.com/default.aspx?scid=kb;EN-US;918483

 • 922658 SQL Server 2000 or SQL Server 2005 may temporarily stop responding on a Windows Server 2003 Service Pack 1-based computer

http://support.microsoft.com/default.aspx?scid=kb;EN-US;922658

• 904160 Network performance is slower than expected in Windows Server 2003 SP1

http://support.microsoft.com/?id=904160

Cluster specific:

  • 923830 Recommended hot fixes for Windows Server 2003 Service Pack 1- based server clusters http://support.microsoft.com/default.aspx?scid=kb;EN-US;923830

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: