NSX-T collection fails with timeout in Usage Meter 9.x
search cancel

NSX-T collection fails with timeout in Usage Meter 9.x

book

Article ID: 427139

calendar_today

Updated On:

Products

VMware Usage Meter

Issue/Introduction

In VMware Usage Meter 9.0.1, you may encounter incomplete data collection for NSX-T instances. The collection often times out after 180 minutes, and the logs report LONG_RUNNING_COLLECTOR errors. This typically occurs in environments with a large number of Distributed Firewall (DFW) Layer 3 sections and rules.

Credentials verification failing with a "timeout" message. 

Collection logs showing java.net.ConnectException: Failed to connect to localhost. Collector being canceled after exceeding the 180-minute limit.

YYYY-MM-DD HH:MM:SS ERROR --- [pool-x-thread-x] c.vmware.um.umcomponent.CommandRunner    : Collector of type NSX-T is running for more than 180 minutes. The Collector will be cancel. Please review the collector's products. If needed consider increasing the default timeout.
YYYY-MM-DD HH:MM:SS ERROR --- [collector-main-thread] c.v.u.n.d.PacketInspectionDetector       : IP Set API returned error for : xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
com.vmware.um.collector.UnexpectedResponseCode: API endpoint api/v1/ip-sets/ xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx on server xx returned HTTP status 404
        at com.vmware.um.collector.http.RequestBuilder.getResponse(RequestBuilder.java:202)
        at com.vmware.um.collector.http.RequestBuilder.as(RequestBuilder.java:235)
        at com.vmware.um.nsxtcollector.api.Client.fetchAs(Client.java:60)
        at com.vmware.um.nsxtcollector.detectors.PacketInspectionDetector.connectIPSet(PacketInspectionDetector.java:563)
.
.
        at com.vmware.um.umcomponent.CommandRunner.lambda$executeCommand$1(CommandRunner.java:204)
        at java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.base/java.lang.Thread.run(Unknown Source)
YYYY-MM-DD HH:MM:SS ERROR --- [pool-x-thread-x] c.v.um.common.platform.JournalClient     : Failed to connect to localhost/[x:x:x:x:x:x:x:1]:8051
java.net.ConnectException: Failed to connect to localhost/[x:x:x:x:x:x:x:1]:8051
        at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.kt:297)
        at okhttp3.internal.connection.RealConnection.connect(RealConnection.kt:207)
        at okhttp3.internal.connection.ExchangeFinder.findConnection(ExchangeFinder.kt:226)

Environment

VCF Usage Meter 9.x 

Cause

This issue is caused by slow NSX-T API response times when processing large DFW inventories. When using an LDAP or external identity source for the Usage Meter service account, the additional overhead can lead to collection timeouts if the NSX Manager is under heavy load or has a large amount of metadata to return.

Resolution

To resolve this issue, use a local NSX admin account for collection instead of an LDAP/AD account.

Steps:
  1. Log in to the Usage Meter Web UI.
  2. Navigate to the Products page.
  3. Locate the NSX-T instance and click Edit.
  4. Replace the existing LDAP credentials with a local NSX admin username and password. 
  5. Click save and verify the connection.

Additional Information

If the issue persists due to an extremely large environment, you may need to deploy a separate Usage Meter appliance to manage individual NSX-T instances to ensure the collection completes within the allowed 180-minute timeframe.