This KB applies to the following situations:
Product Version: 2.9
When investigating why a nfs volume is failing to mount, start by looking at the /var/vcap/sys/log/nfsv3driver and rep logs on the Diego cell where the application container crashed.
To find the diego-cell where the application container crashed, the following api calls will help us in identifying this information.
First obtain the app guid of the crashing application:
cf app <app-name> --guid
Then substitute GUID in the following api call with the guid obtained above:
cf curl "/v2/events?q=actee:GUID&results-per-page=100&order-direction=desc&page=1" > /var/tmp/appevents.log
Open /var/tmp/appevents.log in a text editor and find the latest crash event:
"resources": [ { "metadata": { ## REMOVED FOR BREVITY ## }, "entity": { ## REMOVED FOR BREVITY ## "metadata": { ## REMOVED FOR BREVITY ## "cell_id": "e1f9aecb-240e-4300-b704-27b532f24efa", "exit_description": "failed to mount volume", "reason": "CRASHED" }, ## REMOVED FOR BREVITY ## },
cell_id is e1f9aecb-240e-4300-b704-27b532f24efa in this example.
If using NFS:
Obtain the logs for that diego cell and review the rep and/or nfsv3driver logs.
The error in the nfsv3driver logs will look like the following:
{"timestamp":"2021-03-16T14:22:53.478362393Z","level":"error","source":"nfs-driver-server","message":"nfs-driver-server.server.handle-mount.with-cancel.mount.mount.invoke-mount-failed","data":{"error":"exit status 32","session":"2.106082.1.1.5","volume":"<volume-dir>"}}
The error in the rep logs will look like the following:
{"timestamp":"2021-03-16T14:22:53.479122035Z","level":"error","source":"rep","message":"rep.executing-container-operation.ordinary-lrp-processor.process-reserved-container.run-container.containerstore-create.node-create.mount.mount.remoteclient-mount.failed-mounting-volume","data":{"container-guid":"<container-guid>","container-state":"reserved","error":"{\"SafeDescription\":\"exit status 32\"}","guid":"<guid>","lrp-instance-key":{"instance_guid":"<instance-guid>","cell_id":"e1f9aecb-240e-4300-b704-27b532f24efa"},"lrp-key":{"process_guid":"<proc-guid>","index":0,"domain":"cf-apps"},"mount_request":{"Name":"<nfs-dir>"},"session":"10269.1.1.3.2.1.2.1.2"}}
Both logs indicate that the volume mount failed with exit status 32.
One of the reasons this occurs is because the nfs URL is not resolvable from the diego cell.
For example, if your service instance points towards the remote nfs fs-********.efs.us-east-1.amazonaws.com, then that url must be resolvable for the nfsv3driver. Confirm it is resolvable by performing a nslookup on the url from the diego cell.
First ssh into the diego cell, then run the following:
nslookup fs-xxxxxxxx.efs.us-east-1.amazonaws.com
Example:
diego_cell/e1f9aecb-240e-4300-b704-27b532f24efa:~$ nslookup fs-xxxxxxxx.efs.us-east-1.amazonaws.com ;; Got recursion not available from 169.254.0.2, trying next server Server: 10.10.10.10 Address: 10.10.10.10#53 ** server can't find fs-xxxxxxxx.efs.us-east-1.amazonaws.com: NXDOMAIN
This confirms that the nfs url is not resolvable, therefore the nfsv3driver will also face the same issue when trying to mount a volume.
An additional troubleshooting step that may be helpful is to test the mount command manually on the diego cell. For example:
mkdir /var/vcap/data/volumes/nfs/local_test_dir mount -t nfs -o "rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,actimeo=0" fs-********.efs.us-east-1.amazonaws.com:/ /var/vcap/data/volumes/nfs/local_test_dir
In our example you would see the following from the manual mount:
diego_cell/e1f9aecb-240e-4300-b704-27b532f24efa:~$ sudo mount -t nfs -o "rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,actimeo=0" fs-xxxxxxxx.efs.us-east-1.amazonaws.com:/ /var/vcap/data/volumes/nfs/local_test_dir mount.nfs: Failed to resolve server fs-xxxxxxxx.efs.us-east-1.amazonaws.com: Name or service not known
If using SMB
Obtain the logs for that diego cell and review the rep and/or /var/vcap/sys/log/smbdriver logs.
The error in the smbdriver logs will look like the following:
{"timestamp":"2021-08-06T12:08:12.887919156Z","level":"error","source":"smb-driver-server","message":"smb-driver-server.server.handle-mount.with-cancel.mount.mount.mount-failed: ","data":{"error":"exit status 32","session":"2.22.1.1.4","source":"//smbserver/sharepoint","target":"/dir-mount-point","volume":"aea47d29-7323-4409-a30b-91737c22377c-692b950d4b0629b8d448ae1dfcbcf1aa_ee5c73da-ab95-483b-5580-3857"}}
The log above does not give much info so you would need to gather more info
First ssh into the diego cell, then run try to mount manually using the cli using the parameters you used when you created the smb service instance. Provide username and password credentials to connect to the smbserver
mount -t cifs --username <usermame> <smbserver/sharepoint> <dir-mount-point> Password: ***** Permission denied
On the above example mount failed since the credentials used is not correct or does not have the necessary permission to access smbserver
If you find that your volume mount is failing with exit status 32 but the above information does not give you detailed info as to why its failing, please contact Support.