When investigating why a nfs volume is failing to mount, we start by looking at the
/var/vcap/sys/log/nfsv3driver and rep logs on the Diego cell where the application container crashed.
To find the
diego-cell where the application container crashed, the following api calls will help us in identifying this information.
First obtain the
app guid of the crashing application:
cf app <app-name> --guid
Then substitute
GUID in the following api call with the guid obtained above:
cf curl "/v2/events?q=actee:GUID&results-per-page=100&order-direction=desc&page=1" > /var/tmp/appevents.log
Open
/var/tmp/appevents.log in a text editor and find the latest crash event:
"resources": [
{
"metadata": {
## REMOVED FOR BREVITY ##
},
"entity": {
## REMOVED FOR BREVITY ##
"metadata": {
## REMOVED FOR BREVITY ##
"cell_id": "e1f9aecb-240e-4300-b704-27b532f24efa",
"exit_description": "failed to mount volume",
"reason": "CRASHED"
},
## REMOVED FOR BREVITY ##
},
And now we can see that the
cell_id is
e1f9aecb-240e-4300-b704-27b532f24efa in this example.
If using NFS:Obtain the logs for that diego cell and review the rep and/or
nfsv3driver logs.
The error in the
nfsv3driver logs will look like the following:
{"timestamp":"2021-03-16T14:22:53.478362393Z","level":"error","source":"nfs-driver-server","message":"nfs-driver-server.server.handle-mount.with-cancel.mount.mount.invoke-mount-failed","data":{"error":"exit status 32","session":"2.106082.1.1.5","volume":"<volume-dir>"}}
The error in the rep logs will look like the following:
{"timestamp":"2021-03-16T14:22:53.479122035Z","level":"error","source":"rep","message":"rep.executing-container-operation.ordinary-lrp-processor.process-reserved-container.run-container.containerstore-create.node-create.mount.mount.remoteclient-mount.failed-mounting-volume","data":{"container-guid":"<container-guid>","container-state":"reserved","error":"{\"SafeDescription\":\"exit status 32\"}","guid":"<guid>","lrp-instance-key":{"instance_guid":"<instance-guid>","cell_id":"e1f9aecb-240e-4300-b704-27b532f24efa"},"lrp-key":{"process_guid":"<proc-guid>","index":0,"domain":"cf-apps"},"mount_request":{"Name":"<nfs-dir>"},"session":"10269.1.1.3.2.1.2.1.2"}}
Both logs indicate that the volume mount failed with exit status 32.
One of the reasons we found that this occurs is because the nfs URL is not resolvable from the diego cell.
For example, if your service instance points towards the remote nfs
fs-********.efs.us-east-1.amazonaws.com, then that url must be resolvable for the
nfsv3driver. Confirm it is resolvable by performing a
nslookup on the url from the diego cell.
First ssh into the diego cell, then run the following:
nslookup fs-xxxxxxxx.efs.us-east-1.amazonaws.com
Example:
diego_cell/e1f9aecb-240e-4300-b704-27b532f24efa:~$ nslookup fs-xxxxxxxx.efs.us-east-1.amazonaws.com
;; Got recursion not available from 169.254.0.2, trying next server
Server: 10.10.10.10
Address: 10.10.10.10#53
** server can't find fs-xxxxxxxx.efs.us-east-1.amazonaws.com: NXDOMAIN
This confirms that the nfs url is not resolvable, therefore the
nfsv3driver will also face the same issue when trying to mount a volume.
An additional troubleshooting step that may be helpful is to test the mount command manually on the diego cell. For example:
mkdir /var/vcap/data/volumes/nfs/local_test_dir
mount -t nfs -o "rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,actimeo=0" fs-********.efs.us-east-1.amazonaws.com:/ /var/vcap/data/volumes/nfs/local_test_dir
In our example we would see the following from the manual mount:
diego_cell/e1f9aecb-240e-4300-b704-27b532f24efa:~$ sudo mount -t nfs -o "rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,actimeo=0" fs-xxxxxxxx.efs.us-east-1.amazonaws.com:/ /var/vcap/data/volumes/nfs/local_test_dir
mount.nfs: Failed to resolve server fs-xxxxxxxx.efs.us-east-1.amazonaws.com: Name or service not known
If using SMBObtain the logs for that diego cell and review the rep and/or
/var/vcap/sys/log/smbdriver logs.
The error in the
smbdriver logs will look like the following:
{"timestamp":"2021-08-06T12:08:12.887919156Z","level":"error","source":"smb-driver-server","message":"smb-driver-server.server.handle-mount.with-cancel.mount.mount.mount-failed: ","data":{"error":"exit status 32","session":"2.22.1.1.4","source":"//smbserver/sharepoint","target":"/dir-mount-point","volume":"aea47d29-7323-4409-a30b-91737c22377c-692b950d4b0629b8d448ae1dfcbcf1aa_ee5c73da-ab95-483b-5580-3857"}}
The log above does not give much info so we need to gather more info
First ssh into the diego cell, then run try to mount manually using the cli using the parameters you used when you created the smb service instance for this example we need to provide username and password credentials to connect to our smbserver
mount -t cifs --username <usermame> <smbserver/sharepoint> <dir-mount-point>
Password: *****
Permission denied
On the above example mount failed since the credentials used is not correct or does not have the necessary permission to access smbserver
If you find that your volume mount is failing with exit status 32 but the above information does not give you detailed info as to why its failing, please contact Support.