VCF Operations for Logs: Services Not Running and NFS Mount Inaccessible

Products

VCF Operations

Issue/Introduction

After an outage or instance restart in VCF Operations for Logs, one or more service instances may fail to come back online. When NFS external storage is configured, an inaccessible NFS mount can block archive, export, and import operations. In addition, inaccessible NFS mounts will block instance restarts indefinitely, preventing the system from recovering even after the underlying outage has been resolved.

This article describes how to determine whether the NFS mount symptom has been raised and provides guidance on restoring services to a running state.

Environment

VMware Cloud Foundation (VCF) Operations for Logs 9.1 with NFS v3 external storage support. NFS v4 is not supported.
NFS external storage configured for archive, export, or import operations

Cause

VCF Operations for Logs mounts configured NFS volumes into its instances (pods) at /mnt/<storageId>. These NFS volumes are defined in the Kubernetes StatefulSet specification and mounted at pod startup.

If a log processor or log store instance (i.e., pod) restarts while a NFS mount is not reachable, the instance will not start up. If the NFS server is unreachable at pod start time, the Kubernetes kubelet blocks indefinitely waiting for the NFS mount to succeed. This prevents the pod from starting, which in turn prevents all services running in that pod from becoming available.

At the same time, if some log processor and log store instances are still running, the log processors will check and report whether the NFS mounts are accessible. The periodic health check runs every 60s. It validates NFS connectivity by executing showmount -e <nfs-host> and verifying the configured export path is present. Validation failures are reported in the Log Management Health Overview dashboard, in the availability section. See the NFS Mount Failures metric.

You may also notice that new archive, export, or import operations targeting the inaccessible NFS storage will fail with errors such as:

"Failed to create directory. This may be due to insufficient disk space, permission issues, or the NFS mount being unavailable."
"Failed to write file. This may be due to insufficient disk space, permission issues, or the NFS mount being unavailable."
"NFS storage path not writable: <path>."
"Could not validate NFS connection, please verify that a supported version of NFS is being used and it is available."

Common root causes for NFS becoming inaccessible include:

NFS server is down or unreachable from the VCF Management Services runtime VMs. External storage network traffic egresses from the Management Service runtime VMs.
NFS export path has been removed or renamed on the NFS server
Firewall or network configuration changes blocking NFS traffic
NFS server has run out of disk space
NFS share permissions have changed (read-only or restricted access)

Resolution

Step 1: Check Log Processor Availability

Check the availability section of the Log Management Health Overview dashboard. If any instances of the log processor service are running, check the NFS Mount Failures metric. If mount failures are not reported, the log processor / log store instances may be unavailable for other reasons. Review the text box in the availability section of the dashboard for additional guidance.

Step 2: Verify NFS Server Connectivity

From the VCF Management Services runtime network, verify basic connectivity to the NFS server:

showmount -e <nfs-server-hostname>Confirm that:

The NFS server responds to the showmount command
The configured export path appears in the list of exports
A supported version of NFS is being used

Step 3: Verify NFS Mount Health

Check the NFS storage for:

Free disk space — Ensure the NFS server has sufficient free space available
Permissions — Ensure the NFS export allows write access from the VCF Operations for Logs nodes
Network — Ensure there are no firewall rules or network partitions blocking NFS traffic between VCF Operations for Logs and the NFS server

Step 4: Resolve the NFS Issue and Restart

After restoring NFS connectivity:

Verify the NFS export is accessible by running showmount -e <nfs-host> \
If pods are stuck in a pending or crash-loop state due to NFS mount failures, the pods should recover automatically within 5 minutes once the NFS server is reachable
If pods do not recover, a manual restart of the affected StatefulSet pods may be required

Step 5: If Services Are Still Not Running

If NFS connectivity is confirmed but services remain down:

Determine which specific services (log-store, log-processor, ops-logs) are not running using kubectl or the Diagnostics dashboard
Check pod status and logs for error messages
Refer to KB 424379 for guidance on resolving underlying management service cluster issues