Performance and stability issues with SWG VA and ProxySG VA due to disk errors on ESX server

book

Article ID: 166775

calendar_today

Updated On:

Products

Asset Management Solution SWG VA-100

Issue/Introduction

When running a Secure Web Gateway Virtual Appliance (SWG VA) or a ProxySG Virtual Appliance MACH5 Edition (ProxySG VA), the host ESX server suffers performance deteroriation and connectivity issues.

Resolution

Note: The information and steps in this resolution apply to both SWG VA and ProxySG VA; the term "VA" refers to both.

This issue can be caused by a bad disk on the ESX server. If there is a bad disk, the VA on which the ESX server is hosted interprets the issue as the disk going offline. The VA then attempts to select a new master disk, but if no other disks are initialized and valid, it generates a diagnostics file and reboots. Upon reboot, the VA has unrecoverable read errors. 

If a disk on the ESX server has errors, you must re-create the VA on a disk that has no errors; refer to the Initial Configuration Guide for your VA for installation and setup steps. To determine if the disk is causing this issue, see the following section, "Verifying the Cause of the Issue".

Note: Blue Coat recommends that you select thick provisioning when creating the VA on the host server. 

Verifying the Cause of the Issue

To verify that a bad disk is causing this issue, do the following:

  1. If needed, obtain privileges to view the ESX host and its associated datastore, and install vSphere Client or vSphere Web Client. 
     
  2. Locate the event log on the ESX host.
    1. Log in to vSphere.
    2. In the client, select Home > Inventory > Hosts and Clusters.
    3. In the left pane, navigate to the ESX host and select it.
    4. On the right, select the Tasks & Events tab.
    5. Beside the View: option, click the Events button.
      The event log for the selected host is displayed.
       
  3. In the event log, look for an error related to disk disconnection or reconnection that corresponds with the time that the problem occurred on the VA.
  4. Map the error codes against SCSI command specifications.
    As an example, you check the event log and identify the following as the disk error corresponding with the time of the problem:

    2013-08-22 14:58:56-04:00EDT  "On disk 1, read error has occurred at block 004dca07 (00010002,03110002)."  0 48007:64 Mailed ced.cpp:1281
  5. Refer to the second error code in the tuple (03110002) to determine the sense data. See the following section, "Determing Sense Data".

Determining Sense Data

In the example above, 03110002 is the error code. Expressed in hexadecimal notation, the error is composed of pairs of digits. Each pair corresponds to specific SCSI sense data:

  • 3 - SCSI sense key  

  • 11 - SCSI additional sense code

  • 00 - SCSI additional sense code qualifier 

  • 02 - SCSI status code

Note: SCSI errors always start with 0, so the sense key is one digit. Additional sense code and additional sense code qualifier are taken together in format ASC/ASCQ.

The SCSI errors in this example mean:

  • SCSI sense key (3) : Medium error

  • SCSI additional sense code/SCSI additional sense code qualifier (11/00) : Unrecovered read error

  • SCSI Status code (02) : Check condition

The source used for this information is from T10 Technical Committee: http://www.t10.org/lists/1spc-lst.htm