SQL storeng errors or SQL tempdb 'bad page id' errors logged while installing Solutions
search cancel

SQL storeng errors or SQL tempdb 'bad page id' errors logged while installing Solutions

book

Article ID: 176653

calendar_today

Updated On:

Products

IT Management Suite Asset Management Solution

Issue/Introduction

While installing Notification Server or solutions, I receive SQL storeng errors or SQL tempdb 'bad page id' errors:

The errors are not always the same, but are similar to the following:

Process: AeXConfig.exe (3340)
Thread ID: 2540
Module: AltirisNativeHelper.dll
Source: SecurityHierarchyManager
Description: Unable to set the security descriptor for the specified entity (Guid:<some guid here>). Inner:
Location:     p:\sql\ntdbms\storeng\drs\include\record.inl:1447
Expression:     m_SizeRec > 0 && m_SizeRec <= MAXDATAROW
SPID:         55
Process ID:     1984.

Process: AeXConfig.exe (3392)
Thread ID: 3160
Module: AltirisNativeHelper.dll
Source: SecurityHierarchyManager
Description: Unable to set the security descriptor for the specified entity (Guid:<some guid here>). Inner:I/O error (bad page ID) detected during read at offset 0x00000001cea000 in file 'C:\Program Files\Microsoft SQL Server\MSSQL\data\tempdb.mdf'

Unable to set the security descriptor for the specified entity (Guid:59e3f939-0ace-4fd5-b680-02e3bd30c68b). Inner:A possible database consistency problem has been detected on database 'Altiris'.  DBCC CHECKDB and DBCC CHECKCATALOG should be run on database 'Altiris'.

Cause

Testing appears to show that the issue only occurs on virtual machines using the LSI Logic SCSI controller included in VMware products, with the Altiris database hosted on SQL Server 2000 or SQL Server 2005 within the Virtual machine. I/O load testing indicates this problem would also occur on the older Bus Logic SCSI virtual drives.

The likelihood of the problem occurring appears to be directly related to lower disk I/O capacity of the physical storage device. For example, laptop 5,200 RPM drives are much slower than server class RAIDed drives and thus have a much higher failure rate. It's also worth noting that the actual data is never corrupted, but intensive SQL operations appear to expose a flaw in the virtual drive's write caching.

The problem appears to be the same as discussed in VMware's forum:
http://www.vmware.com/community/thread.jspa?threadID=58870&start=30&tstart=0

Microsoft has released a tool that simulates SQL disk I/O activity. It can be used to help determine if hardware (virtual or physical) is capable of handling the I/O load with write cache enabled. It can be obtained here:
http://download.microsoft.com/download/3/8/0/3804cb1c-a911-4d12-8525-e5780197e0b5/sqliosimx86.exe

After extracting the utility, use the default configuration file, change the number of intervals to 25, and start the simulation.  Virtual machines observed with the problem reported an error similar to the following:  "Error 0x80070467", "While accessing the hard disk, a disk operation failed even after retries", "Buffer validation failed on c:\sqliosim.mdx Page: 47806, offset 0x8"

Resolution

Workarounds:

  • Use a supported VMware product, such as ESX or ESXi
  • Use a virtual IDE disk drive, rather than a SCSI disk drive, and SQL Server needs to be within the virtual machine. Ensure that the SQL user and temp databases reside on the virtual IDE disk.
  • VMWare Workstation 6.0.1 and higher are less affected by this issue, but still not 100% reliable with the LSI Logic SCSI controller.
  • Use a SQL Server instance off-computer from the virtual machine to host the Altiris database (such as the virtual machine's host operating system).

Applies To

  1. Virtual machine created in VMware Workstation* 5.0, 5.5, or 6.x or ESX 3.5
  2. SQL 2005 within a virtual machine.

Attachments