NSX Manager /var/log partition is very high or 100% full in version NSX-T 3.1.0
search cancel

NSX Manager /var/log partition is very high or 100% full in version NSX-T 3.1.0

book

Article ID: 319977

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Symptoms:

  • Services may not be functioning properly in an NSX-T 3.1.0 environment.
  • Checking disk utilization with command df -h shows that the /var/log partition usage is very high or full at 100%
  • Large files in the /var/log/corfu directory, ending with .csv, are taking up a very large amount of the total available space.

 

 

Environment

VMware NSX-T Data Center

Cause

The .csv files are dropwizard metrics which were enabled in NSX-T 3.1.0 but can be disabled. 

Log rotation for the below files is not enabled and are consuming a large amount of disk space under /var/log for /var/log/corfu:

/var/log/corfu/corfu.infrastructure.message-handler.update_committed_tail.csv
/var/log/corfu/corfu.runtime.client-router.update_committed_tail.csv
/var/log/corfu/corfu.runtime.client-router.layout_request.csv
/var/log/corfu/corfu.infrastructure.message-handler.layout_committed.csv
/var/log/corfu/corfu.infrastructure.message-handler.layout_prepare.csv
/var/log/corfu/corfu.infrastructure.message-handler.management_failure_detected.csv
/var/log/corfu/corfu.infrastructure.message-handler.layout_propose.csv
/var/log/corfu/corfu.runtime.client-router.layout_committed.csv
/var/log/corfu/corfu.runtime.client-router.node_state_request.csv
/var/log/corfu/corfu.infrastructure.message-handler.management_healing_detected.csv
/var/log/corfu/corfu.runtime.client-router.layout_prepare.csv
/var/log/corfu/corfu.runtime.client-router.layout_propose.csv
/var/log/corfu/corfu.runtime.client-router.keep_alive.csv
/var/log/corfu/corfu.runtime.client-router.log_address_space_request.csv
/var/log/corfu/corfu.infrastructure.message-handler.compact_request.csv
/var/log/corfu/corfu.infrastructure.message-handler.sequencer_trim_req.csv
/var/log/corfu/corfu.runtime.client-router.bootstrap_sequencer.csv
/var/log/corfu/corfu.infrastructure.message-handler.seal.csv
/var/log/corfu/corfu.infrastructure.message-handler.layout_request.csv
/var/log/corfu/corfu.runtime.client-router.seal.csv
/var/log/corfu/corfu.runtime.client-router.trim_mark_request.csv
/var/log/corfu/corfu.runtime.client-router.management_failure_detected.csv
/var/log/corfu/corfu.runtime.client-router.read_request.csv
/var/log/corfu/corfu.infrastructure.message-handler.streams_address_request.csv
/var/log/corfu/corfu.infrastructure.message-handler.bootstrap_sequencer.csv
/var/log/corfu/corfu.runtime.client-router.inspect_addresses_request.csv
/var/log/corfu/corfu.runtime.client-router.sequencer_metrics_request.csv
/var/log/corfu/corfu.runtime.client-router.write.csv
/var/log/corfu/corfu.infrastructure.message-handler.known_address_request.csv
/var/log/corfu/corfu.infrastructure.message-handler.read_request.csv
/var/log/corfu/corfu.runtime.client-router.range_write.csv
/var/log/corfu/corfu.infrastructure.message-handler.token_req.csv
/var/log/corfu/corfu.runtime.client-router.token_req.csv
/var/log/corfu/corfu.infrastructure.sequencer.query-token.csv
/var/log/corfu/corfu.infrastructure.message-handler.write.csv
/var/log/corfu/corfu.infrastructure.message-handler.reset_logunit.csv
/var/log/corfu/corfu.runtime.client-router.reset_logunit.csv
/var/log/corfu/corfu.runtime.client-router.known_address_request.csv
/var/log/corfu/corfu.infrastructure.stream-ops.compaction.csv
/var/log/corfu/corfu.infrastructure.message-handler.node_state_request.csv
/var/log/corfu/corfu.infrastructure.sequencer.tx-token.csv
/var/log/corfu/corfu.runtime.log-unit-client.20.20.0.15:9000-read.csv
/var/log/corfu/corfu.runtime.log-unit-client.20.20.0.14:9000-read.csv
/var/log/corfu/corfu.infrastructure.sequencer.multi-stream-token.csv
/var/log/corfu/corfu.infrastructure.message-handler.inspect_addresses_request.csv
/var/log/corfu/corfu.runtime.client-router.version_request.csv
/var/log/corfu/corfu.runtime.client-router.committed_tail_request.csv
/var/log/corfu/corfu.runtime.client-router.prefix_trim.csv
/var/log/corfu/corfu.runtime.log-unit-client.20.20.0.13:9000-read.csv
/var/log/corfu/corfu.infrastructure.message-handler.version_request.csv
/var/log/corfu/corfu.infrastructure.message-handler.keep_alive.csv
/var/log/corfu/corfu.runtime.client-router.orchestrator_request.csv
/var/log/corfu/corfu.infrastructure.message-handler.trim_mark_request.csv
/var/log/corfu/corfu.infrastructure.message-handler.sequencer_metrics_request.csv
/var/log/corfu/corfu.runtime.client-router.tail_request.csv
/var/log/corfu/corfu.infrastructure.message-handler.orchestrator_request.csv
/var/log/corfu/corfu.infrastructure.message-handler.prefix_trim.csv
/var/log/corfu/corfu.infrastructure.message-handler.ping.csv
/var/log/corfu/corfu.runtime.client-router.compact_request.csv
/var/log/corfu/corfu.infrastructure.message-handler.range_write.csv
/var/log/corfu/corfu.infrastructure.message-handler.tail_request.csv
/var/log/corfu/corfu.infrastructure.message-handler.committed_tail_request.csv
/var/log/corfu/corfu.runtime.client-router.ping.csv
/var/log/corfu/corfu.infrastructure.message-handler.log_address_space_request.csv
/var/log/corfu/corfu.infrastructure.message-handler.management_layout_request.csv
/var/log/corfu/corfu.infrastructure.message-handler.flush_cache.csv
/var/log/corfu/corfu.infrastructure.message-handler.management_bootstrap_request.csv
/var/log/corfu/corfu.infrastructure.message-handler.failure_detector_metrics_request.csv
/var/log/corfu/corfu.infrastructure.message-handler.reset.csv
/var/log/corfu/corfu.infrastructure.message-handler.restart.csv
/var/log/corfu/corfu.infrastructure.message-handler.layout_bootstrap.csv
/var/log/corfu/corfu.runtime.client-router.management_healing_detected.csv

 

 

Resolution

The issue is resolved in NSX-T in 3.1.1 and above, 3.2.x, and 4.x and above.


Workaround:

  • Download cleanup script attached to KB, upload to a directory like /image and execute with:
    • chmod +x script.py
    • python script.py
  • Run the script on each manager node in a cluster.
     

If the script does not work, you can use the manual process below:

  1. Connect via SSH to NSX manager as root user
  2. Go to directory /usr/tanuki/conf: cd /usr/tanuki/conf
  3. Make a copy of corfu-server-wrapper.conf: cp corfu-server-wrapper.conf corfu-server-wrapper.conf.cp
  4. Open corfu-server-wrapper.conf: vim corfu-server-wrapper.conf
  5. Change the parameter 'wrapper.java.additional.19=-Dcorfu.metrics.collection=True' to 'wrapper.java.additional.19=-Dcorfu.metrics.collection=False'
  6. Restart the NSX Manager appliance VM
  7. Make sure all services are up
  8. Delete all of .csv file from the corfu log directory: rm -rf /var/log/corfu/*.csv
  9. Repeat the above steps for other manager appliances in the cluster.

 

Additional Information

Impact/Risks:

The issue can drive up /var/log disk usage and cause 100% usage on the /var/log partition in NSX Manager. This creates significant issues like making all NSX-T Managers inaccessible by SSH and UI.

Also check for presence of another, similar issue that can be present in version 3.1.1 where Log rotate fails for /var/log/vmware on NSX-T Manager and Edge Nodes 

 

Attachments

script.py get_app