Understanding and Troubleshooting the Guest Introspection Service Virtual Machine (USVM)
search cancel

Understanding and Troubleshooting the Guest Introspection Service Virtual Machine (USVM)

book

Article ID: 309999

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

This document describes how to troubleshoot the Universal Service Virtual Machine (USVM) component of the NSX for vSphere Guest Introspection solution.
 

For information on troubleshooting the MUX component, see Collecting diagnostic information for the NSX Guest Introspection MUX VIB.

Environment

VMware NSX for vSphere 6.4.x

Resolution

Service Deployment Status

The Service Deployment tab(Network & Security > Installation > Service Deployments) reports Installation Status and Service Status. Installation Status reflects whether the backing EAM agency is present and whether the USVM is present and powered on. A failure of either of these will result in a warning status.
 
In addition, a warning for Installation Status has been seen due to internal timing dependencies. Although the EAM service initializes correctly, the EAM agencies are not loaded into memory immediately. If NSX queries EAM for agency status during this window, EAM may return a ManagedObjectNotFound Exception. Clicking on “Resolve All” in the NSX user interface prompts NSX to query EAM for the same agency, which is likely to be found. NSX then removes the failed status and changes Status to green.
 
Service Status reflects whether the service inside the USVM is up. For example, the service may have failed to contact NSX Manager, crashed, or sent a heart beat. Another reason can be a failure to establish the message bus connection between NSX Manager and the USVM. NSX uses rabbit MQ for the message bus. The USVM is an RMQ client and connects to the RMQ broker via a default username and password. The USVM then initiates a password change to create a unique password. During this process, authentication errors may be reported, although only a single error should be reported during normal conditions.
 
=INFO REPORT==== 20-Dec-2015::11:07:07 ===
Creating user 'usvm-admin-host-####'
=INFO REPORT==== 20-Dec-2015::11:07:07 ===
Changing password for 'usvm-admin-host-####'
=ERROR REPORT==== 20-Dec-2015::11:08:00 ===
closing AMQP connection <0.6052.16> (127.0.0.1:57642 -> 127.0.0.1:5672):
{handshake_error,starting,0,{amqp_error,access_refused, "PLAIN login refused: user 'usvm-admin-host-####' - invalid credentials",'connection.start_ok'}}
 
The following log messages capture closing the local connection from the manager to the broker (127.0.0.1:57642 -> 127.0.0.1:5672), a check which NSX performs for all USVM clients to determine the need for any password resets. An “invalid credentials” message indicates that the password for this USVM must be reset.
 
=ERROR REPORT==== 20-Dec-2015::11:08:00 ===
closing AMQP connection <0.6052.16> (127.0.0.1:57642 -> 127.0.0.1:5672):
{handshake_error,starting,0,
{amqp_error,access_refused,
"PLAIN login refused: user 'usvm-admin-host-####' - invalid credentials",
'connection.start_ok'}}

Collecting USVM Logs

If the password change process fails to complete after few hours, it times out and places the USVM in a warning state. The USVM must be redeployed to recover. Before redeploying, capture the following logs
  • NSX Manager logs
  • usvm.log (location: /var/log/usvm.log)
  • eventmanager.log (location: /usr/local/usvmmgmnt/log/eventmanager.log)
The USVM and event manager logs are collected from the root shell of the USVM. In some cases, we may require debug-level USVM logs. These logs are enabled via REST APIs, and USVM REST API logs are retrieved via the RMQ channel. As a first step, confirm that the RMQ channel is operational. If the RMQ channel is not operational, USVM logs can be connected by logging into the USVM root, disabling the firewall, and using SCP to collect the logs.
 
Use the following API to collect these logs:
 
On NSX Manager:
curl -i -k -H "content-type: application/xml" -u admin -X POST https://{VSM-IP}/api/1.0/services/debug/loglevel/com.vmware.vshield.vsm.messaging?level=DEBUG
 
On the hosts:
curl -ivs -k -u 'admin:default' -H 'Content-Type:application/xml' -X POST https://{NSX_IP}/api/1.0/usvmlogging/HOST_ID/changelevel -d @enable_debug
Where enable_debug is a file containing the below data
 
<?xml version="1.0" encoding="UTF-8" ?>
<logginglevel>
<loggerName>com.vmware.vshield.usvm</loggerName>
<level>DEBUG</level>
</logginglevel>
 
 
Query the content level to validate a working RMQ channel.
curl -ivs -k -u 'admin:default' -X GET https://{NSX_IP}/api/1.0/usvmlogging/host-id/root
 
Set debug for the above components:
curl -ivs -k -u 'admin:default' -H 'Content-Type:application/xml' -X POST https://{NSX_IP}/api/1.0/usvmlogging/host-##/changelevel -d @test
 
Where test is a file containing the below data
 
<?xml version="1.0" encoding="UTF-8" ?>
<logginglevel>
<loggerName>root</loggerName>
<level>DEBUG</level>
</logginglevel>
 
 
Collect logs:
  1. USVM:

    Get the location of the tech support:
    curl -ivs -k -u 'admin:default' -X GET https://{NSX_IP}/api/1.0/hosts/host-ID/techsupportlogs

    Get the URL from the location header in 303 response above & use it below:
    curl -k -u 'admin:default' -X GET -O https://{NSX_IP}//tech_support_logs/usvm/vshield_host_support_host-id_usvm_012616_072249GMT.log.gz
     
  2. NSX Manager technical support file.
     
  3. Host vm-support log for the MUX information.
 
USVM to EPSec Library Connection
 
The MUX creates a TCP socket connection via the EPSecLib to the USVM for every VM with a thin agent on the host.
java 6344 usvm 47u IPv4 1629961 0t0 TCP 169.254.1.24:48655->169.254.1.1:42089 (ESTABLISHED)
 
If the system cannot establish this connection, it times out
 
2016-02-17T20:18:42Z EPSecMux[3653778]: [ERROR] (EPSEC) [0x37c092] [0x5ca09c30] Error on socket to solution 169.254.1.24:48655: SocketError on sd 27, in recv: Connection timed out (110)
2016-02-17T20:18:42Z EPSecMux[3653778]: [WARNING] (EPSEC) [0x37c092] MuxSolutionHandler[0x5ca09c30] scheduling reconnect to solution[100] at 169.254.1.24:48655 in 30000 ms
 
To troubleshoot this issue, capture a tcpdump and netstat output on the USVM side.