NSX-T Manager UI does not work intermittently, and Search API is intermittently failing with a CircuitBreaker exception having text as "Data too large".
search cancel

NSX-T Manager UI does not work intermittently, and Search API is intermittently failing with a CircuitBreaker exception having text as "Data too large".

book

Article ID: 373048

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • When there are too many concurrent queries at the same time, the search APIs fail intermittently if the OpenSearch/Elasticsearch query cache is occupying a large portion of allocated heap.
  • NSX-T Manager UI does not work intermittently, or certain data is not getting populated on UI.
  • For example, when accessing Networking->Network overview page from NSX UI, none of the component's information was loaded.
  • Error message seen on NSX UI:
    "Failed to get the report - An unknown error has occurred".
  • Validate search-manager.log under /var/log/search.
    2024-03-26T06:57:35.900Z WARN http-nio-127.0.0.1-7440-exec-284 IndexingMetadataHelper 4492 - [nsx@6876 comp="nsx-manager" level="WARNING" reqId="<UUID-redacted>" subcomp="manager" username="[email protected]"] Could not fetch indexing position from ES, error: ElasticsearchStatusException[Elasticsearch exception [type=circuit_breaking_exception, reason=[parent] Data too large, data for [<http_request>] would be [1862171586/1.7gb], which is larger than the limit of [1860491673/1.7gb], real usage: [1862169944/1.7gb], new bytes reserved: [1642/1.6kb]]

Environment

VMware NSX

Cause

Due to frequent searches on any entity having number of entities greater than 10k the queries are getting cached. As a result, the heap taken by OpenSearch is significantly increasing.

Resolution

This issue is resolved in VMware NSX 4.2.0

Workaround:
1) Enable SSH on NSX-T Managers following: https://knowledge.broadcom.com/external/article/373427/enable-ssh-on-nsx-managers.html
2) SSH to NSX-T Manager with admin user, and switch to the root use with: the command:
st e
2) Clear the Query cache of ElasticSearch/OpenSearch using command:
curl -X POST "localhost:9200/_cache/clear?query=true"

Note: On some occasions you may need to restart the search service first, to do that run the below command on each node and verify its status.
/etc/init.d/search restart
/etc/init.d/search status

Issue prevention:
To prevent the issue from reoccuring before you are able to upgrade to VMware NSX 4.2.0, perform the following:

1) Check the output of the following command:

curl 'localhost:9200/_cluster/settings?pretty&include_defaults' | grep -A 5 "queries"

It should be shown like this:

"queries" : {
        "cache" : {
          "count" : "10000",
          "size" : "10%",
          "all_segments" : "false"
        }

2)Open file /etc/elasticsearch/elasticsearch.yml and add the following line at the end of this file.

indices.queries.cache.count: 1000

3) Restart the search service using the command: service search restart

4) Once the search service is restarted completely (it might take a minute), check the output of the following command again:

curl 'localhost:9200/_cluster/settings?pretty&include_defaults' | grep -A 5 "queries"

It should be shown like this:

"queries" : {
        "cache" : {
          "count" : "1000",
          "size" : "10%",
          "all_segments" : "false"
        }